(Intercept) gestation
-10.0641842 0.4642626
IMS, Ch. 7
Smith College
Feb 9, 2026
An article reported that there was a 0.42 correlation between alcohol consumption and income among adults with a four-year college degree.
Is it reasonable to conclude that increasing one’s alcohol consumption will increase one’s income? Explain why or why not.
A college newspaper interviews a psychologist about student ratings of the teaching of faculty members.
The psychologist says, “The evidence indicates that the correlation between the research productivity and teaching rating of faculty members is close to zero.”
The paper reports this as “Prof. McDaniel said that good researchers tend to be poor teachers, and vice versa.”
Explain why the paper’s report is wrong.
Write a statement in plain language (don’t use the word correlation) to explain the psychologist’s meaning.
Goal: understand changes in a numerical response variable in terms of a numerical explanatory variable.
A simple linear regression model for \(y\) in terms of \(x\) takes the form: \[ y_i = \beta_0 + \beta_1 \cdot x_i + \epsilon_i \,, \text{ for } i=1,\ldots,n \]
lm() to fit the modelcoef() on the regression object to get the coefficientsgeom_smooth(method = "lm") to show on a plotRecall: Parameter vs. Statistic
\[ y_i = \beta_0 + \beta_1 \cdot x_i + \epsilon_i \,, \text{ for } i=1,\ldots,n \]
\[ \hat{y}_i = b_0 + b_1 \cdot x_i \,, \text{ for } i=1,\ldots,n \]
The model (almost) never fits perfectly
What is left over are the residuals (\(e_i = y_i - \hat{y}_i\))
Many of the assumptions we’ll make later involve the residuals
Residual analysis is important! (…more later)
RailTrail
The Pioneer Valley Planning Commission (PVPC) collected data north of Chestnut Street in Florence, MA for ninety days from April 5, 2005 to November 15, 2005. Data collectors set up a laser sensor, with breaks in the laser beam recording when a rail-trail user passed the data collection station. The data is captured in the RailTrail data set.
hightemp lowtemp avgtemp spring summer fall cloudcover precip volume weekday
1 83 50 66.5 0 1 0 7.6 0.00 501 TRUE
2 73 49 61.0 0 1 0 6.3 0.29 419 TRUE
3 74 52 63.0 1 0 0 7.5 0.32 397 TRUE
4 95 61 78.0 0 1 0 2.6 0.00 385 FALSE
5 44 52 48.0 1 0 0 10.0 0.14 200 TRUE
6 69 54 61.5 1 0 0 6.6 0.02 375 TRUE
dayType
1 weekday
2 weekday
3 weekday
4 weekend
5 weekday
6 weekday
cor() to compute the correlation coefficientgeom_smooth() to view the model in the data spacelm()Intercept and avgtemp termsCompare OLS regression line (right) with null model (left).
