Problem of Intervening Variables: Spurious Relationships
  1. x= ice cream sales; y=violent crime; z= heat waves
  2. x= ice cream sales; y=drownings; z= heat waves
  3. x= number of electrical appliances; y=decreased birth rates; z= industrialization
  4. x= smoking; y=lung cancer; z= tissue damage
  5. x= age; y=reading ability; z= education

Correlation versus Regression (Variance)

  1. rx,y = correlation between x and x
    • ex: x= SAT scores, y= first year first semester (FYFS) GPA: r=.30 to .40
  2. r2x,y = variance in y explained by x

Purposes/Goals: (from Hoyt, Imel, Chan, 2008)

  1. description, to provide a statistical summary of the relationship of the Xs to the Y;
  2. prediction, to provide an equation that generates predicted scores on some future outcome (Y, e.g., job performance) based on the observed Xs;
  3. explanation or theory testing: (the direction and magnitude of predicted relationships of Xs to Y can be tested using the actual observed data)


  1. The DV is continuous and free from outliers (EXPLORE vs. DESCRIPTIVES)
  2. The IVs (predictors) are continuous and free from intercorrelation/multicolinearity (CORRELATE. Tolerance/VIF)
  3. The subject-to-predictor ratio is not below 10:1... 15:1 is ideal
  4. The IVs are free from outliers

In order to control for "shrinkage" (reduction in the predictive power of the regression equations):

  1. Correlations between predictor variables should be inspected. When pairs of variables have correlations higher than .70, you should consider:
    1. removing one of the correlated predictors (usually the one with the lowest r with the DV) or
    2. combining the correlated predictors (average, sum, etc)
  2. the ratios of subjects-to-predictors in the main regression analyses should at least 15:1.
  3. and adjusted R2 coefficients should be used as a conservative estimate of explained variance (in all regression analyses).
  4. Default values for the probabilities of "F-to-enter" (.05) and "F-to-remove" (.10) should remain constant for all of the regression analyses.
  5. Dependent measures should be analyzed for "outliers" by inspecting Z-scores of residuals. No significant effects of outliers, as measures by Z-scores greater than three standard deviations from the mean, should be noted on the dependent variable. Outliers on predictor variables should be identified using Mahalanobis' Distance Formula, and analyzed for their influence on the regression equations using Cook's Distance Formula.