Datasets for this assignment:
A random sample of 1,000 federal personnel records for March 1994:
library(dplyr)
library(MASS); data(Cars93) # MASS::Cars93
Cars93 <- Cars93 %>% filter(Cylinders != "rotary") # remove all cars with rotary engines from the dataset
Let’s regress MPG.city
on Origin
to answer this question:
lm(MPG.city ~ Origin, data = Cars93) %>% summary()
##
## Call:
## lm(formula = MPG.city ~ Origin, data = Cars93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0227 -3.9583 -0.9905 2.0417 21.9773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.9583 0.7848 26.7 < 2e-16 ***
## Originnon-USA 3.0644 1.1349 2.7 0.00828 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.437 on 90 degrees of freedom
## Multiple R-squared: 0.07494, Adjusted R-squared: 0.06467
## F-statistic: 7.291 on 1 and 90 DF, p-value: 0.008277
The regression output shows that in this sample, the average MPG.city
of American cars is 20.96, and the average MPG.city
of foreign cars is higher by 2.91.
Conclusion: The regression coefficient on Originnon-USA
is statistically significant on five-percent level (p = 0.0118 < 0.05), so in the papoulation of alll cars represented by this random sample, foreign cars appear to be more fuel efficient (higher MPG) than American cars.
Let’s print out confidence intervals for the regression coeffcients from the same regression model:
lm(MPG.city ~ Origin, data = Cars93) %>% confint()
## 2.5 % 97.5 %
## (Intercept) 19.399145 22.517522
## Originnon-USA 0.809811 5.318977
Conclusion: The confidence interval on Originnon-USA
(0.6594368, 5.15723) tells us that we can be 95 percent confident that in the population of all cars that this sample represents, foreign cars get on average 0.66 to 5.16 miles per galon more than American cars.
To answer this question, let’s regress MPG.city
on Origin
and other car characteristics that might influence car MPG, so that we could control for them:
lm(MPG.city ~ Origin + Weight + EngineSize + Cylinders + Horsepower, data = Cars93) %>% summary()
##
## Call:
## lm(formula = MPG.city ~ Origin + Weight + EngineSize + Cylinders +
## Horsepower, data = Cars93)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5673 -1.2119 0.1145 1.1237 14.3819
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.939306 2.293678 22.209 < 2e-16 ***
## Originnon-USA 0.727679 0.666239 1.092 0.277897
## Weight -0.005982 0.001115 -5.364 7.22e-07 ***
## EngineSize -0.359592 0.803737 -0.447 0.655752
## Cylinders4 -8.619592 1.748906 -4.929 4.18e-06 ***
## Cylinders5 -9.605933 2.809328 -3.419 0.000976 ***
## Cylinders6 -8.830470 2.276246 -3.879 0.000209 ***
## Cylinders8 -7.440803 2.983543 -2.494 0.014618 *
## Horsepower -0.008169 0.010172 -0.803 0.424227
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.624 on 83 degrees of freedom
## Multiple R-squared: 0.8013, Adjusted R-squared: 0.7822
## F-statistic: 41.85 on 8 and 83 DF, p-value: < 2.2e-16
Conclusion: As the regression output shows, in this sample, when comparing cars of the same weight, engine size, number of cylinders, and horsepower, the difference in the expected MPG.city
between American and foreign cars has shrunk to 0.7 MPG. In other words, in this sample, foreign cars are expected to get 0.7 miles per galon more than comparable American cars.
However, the difference (coefficient on Originnon-USA
) is no longer statistically significant (p=0.277897 > 0.05), so there is no evidence suggesting that fuel efficiency is related to origin in the population of cars.
Using a random sample of 1,000 federal personnel records for March 1994, let’s explore if the grade levels assigned to minority employees’ differs from the grade levels assigned to nonminority employees.
load("Datasets/OPM94.RData"); names(opm94)
## [1] "x" "sal" "grade" "patco" "major" "age"
## [7] "male" "vet" "handvet" "hand" "yos" "edyrs"
## [13] "promo" "exit" "supmgr" "race" "minority" "grade4"
## [19] "promo01" "supmgr01" "male01" "exit01" "vet01"
First, let’s regress grade
on minority
to explore if there is a relationship between an employee’s minority status and the grade level assigned to their position.
lm(grade ~ minority, data = opm94) %>% summary()
##
## Call:
## lm(formula = grade ~ minority, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0824 -3.0824 0.9176 2.9176 7.6838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.0824 0.1215 82.97 < 2e-16 ***
## minority -1.7662 0.2330 -7.58 7.88e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.279 on 998 degrees of freedom
## Multiple R-squared: 0.05444, Adjusted R-squared: 0.05349
## F-statistic: 57.46 on 1 and 998 DF, p-value: 7.881e-14
lm(grade ~ minority, data = opm94) %>% confint()
## 2.5 % 97.5 %
## (Intercept) 9.843943 10.320892
## minority -2.223494 -1.308988
As the regression output shows, in this sample, the mean grade level of minority employees is lower from the mean grade level of those who do not identify as minorities by 1.7. The coefficient on minority
is statistically significant ( p = 7.88*10^(-14) ), so the mean grade of minority employees is lower than the mean grade of nonminority employees in the entire population of federal employees that this sample represents. According to the confidence interval, we are 95% confident that the mean grade of minority employees is 2.22 to -1.31 less than the mean grade of nonminority employees in the population.
Now, let’s determine if minority status makes any difference for grade among employees that are comparable on other characteristics, including qualifications:
lm(grade ~ minority + yos + edyrs + male + patco + vet + age, dat = opm94) %>% summary()
##
## Call:
## lm(formula = grade ~ minority + yos + edyrs + male + patco +
## vet + age, data = opm94)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9791 -0.9732 0.0513 0.9574 6.1572
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.403227 0.512530 12.493 < 2e-16 ***
## minority -0.464511 0.119519 -3.887 0.000108 ***
## yos 0.062016 0.007718 8.035 2.65e-15 ***
## edyrs 0.275472 0.031629 8.709 < 2e-16 ***
## malemale 0.675422 0.124522 5.424 7.32e-08 ***
## patcoClerical -5.666986 0.183039 -30.960 < 2e-16 ***
## patcoOther -4.449826 0.317622 -14.010 < 2e-16 ***
## patcoProfessional -0.072975 0.152330 -0.479 0.632002
## patcoTechnical -3.635071 0.150396 -24.170 < 2e-16 ***
## vetyes -0.158621 0.141580 -1.120 0.262832
## age 0.000228 0.006723 0.034 0.972955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.63 on 989 degrees of freedom
## Multiple R-squared: 0.7684, Adjusted R-squared: 0.766
## F-statistic: 328.1 on 10 and 989 DF, p-value: < 2.2e-16
lm(grade ~ minority + yos + edyrs + male + patco + vet + age, dat = opm94) %>% confint()
## 2.5 % 97.5 %
## (Intercept) 5.39745648 7.40899712
## minority -0.69905027 -0.22997165
## yos 0.04687012 0.07716127
## edyrs 0.21340425 0.33754049
## malemale 0.43106377 0.91978068
## patcoClerical -6.02617626 -5.30779628
## patcoOther -5.07311646 -3.82653471
## patcoProfessional -0.37190170 0.22595142
## patcoTechnical -3.93020205 -3.33993919
## vetyes -0.43645347 0.11921128
## age -0.01296493 0.01342089
We can see that the difference in expected grades between the two groups has shrunk, but it still remains, and it is statistically significant. In other words, in the population, the expected grade of minority employees was still lower than that of nonminority employees with the same experience, education, occupational category, age, veteran status, and gender. More specifically, the expected grade of minorities is lower by 2.22 to -1.31 than the expected grade of comparable nonminorities.
Yuriy Davydenko 2020