Example 1: Car Fuel Efficiency

Load Libraries and Data

library(dplyr)
library(MASS); data(Cars93)   # MASS::Cars93
Cars93 <- Cars93 %>% filter(Cylinders != "rotary")   # remove all cars with rotary engines from the dataset

Are American or foreign cars more fuel efficient?

Let’s regress MPG.city on Origin to answer this question:

lm(MPG.city ~ Origin, data = Cars93) %>% summary()

## 
## Call:
## lm(formula = MPG.city ~ Origin, data = Cars93)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0227 -3.9583 -0.9905  2.0417 21.9773 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    20.9583     0.7848    26.7  < 2e-16 ***
## Originnon-USA   3.0644     1.1349     2.7  0.00828 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.437 on 90 degrees of freedom
## Multiple R-squared:  0.07494,    Adjusted R-squared:  0.06467 
## F-statistic: 7.291 on 1 and 90 DF,  p-value: 0.008277

The regression output shows that in this sample, the average MPG.city of American cars is 20.96, and the average MPG.city of foreign cars is higher by 2.91.

Conclusion: The regression coefficient on Originnon-USA is statistically significant on five-percent level (p = 0.0118 < 0.05), so in the papoulation of alll cars represented by this random sample, foreign cars appear to be more fuel efficient (higher MPG) than American cars.

What is the estimated difference in fuel efficicncy between American and foreign cars?

Let’s print out confidence intervals for the regression coeffcients from the same regression model:

lm(MPG.city ~ Origin, data = Cars93) %>% confint()

##                   2.5 %    97.5 %
## (Intercept)   19.399145 22.517522
## Originnon-USA  0.809811  5.318977

Conclusion: The confidence interval on Originnon-USA (0.6594368, 5.15723) tells us that we can be 95 percent confident that in the population of all cars that this sample represents, foreign cars get on average 0.66 to 5.16 miles per galon more than American cars.

Are foreign cars more fuel efficient than comparable American cars?

To answer this question, let’s regress MPG.city on Origin and other car characteristics that might influence car MPG, so that we could control for them:

lm(MPG.city ~ Origin + Weight + EngineSize + Cylinders + Horsepower, data = Cars93) %>% summary()

## 
## Call:
## lm(formula = MPG.city ~ Origin + Weight + EngineSize + Cylinders + 
##     Horsepower, data = Cars93)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5673 -1.2119  0.1145  1.1237 14.3819 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   50.939306   2.293678  22.209  < 2e-16 ***
## Originnon-USA  0.727679   0.666239   1.092 0.277897    
## Weight        -0.005982   0.001115  -5.364 7.22e-07 ***
## EngineSize    -0.359592   0.803737  -0.447 0.655752    
## Cylinders4    -8.619592   1.748906  -4.929 4.18e-06 ***
## Cylinders5    -9.605933   2.809328  -3.419 0.000976 ***
## Cylinders6    -8.830470   2.276246  -3.879 0.000209 ***
## Cylinders8    -7.440803   2.983543  -2.494 0.014618 *  
## Horsepower    -0.008169   0.010172  -0.803 0.424227    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.624 on 83 degrees of freedom
## Multiple R-squared:  0.8013, Adjusted R-squared:  0.7822 
## F-statistic: 41.85 on 8 and 83 DF,  p-value: < 2.2e-16

Conclusion: As the regression output shows, in this sample, when comparing cars of the same weight, engine size, number of cylinders, and horsepower, the difference in the expected MPG.city between American and foreign cars has shrunk to 0.7 MPG. In other words, in this sample, foreign cars are expected to get 0.7 miles per galon more than comparable American cars.

However, the difference (coefficient on Originnon-USA) is no longer statistically significant (p=0.277897 > 0.05), so there is no evidence suggesting that fuel efficiency is related to origin in the population of cars.

Example 2: Workplace Equity in Federal Service.

Using a random sample of 1,000 federal personnel records for March 1994, let’s explore if the grade levels assigned to minority employees’ differs from the grade levels assigned to nonminority employees.

Load Dataset

load("Datasets/OPM94.RData"); names(opm94)

##  [1] "x"        "sal"      "grade"    "patco"    "major"    "age"     
##  [7] "male"     "vet"      "handvet"  "hand"     "yos"      "edyrs"   
## [13] "promo"    "exit"     "supmgr"   "race"     "minority" "grade4"  
## [19] "promo01"  "supmgr01" "male01"   "exit01"   "vet01"

First, let’s regress grade on minority to explore if there is a relationship between an employee’s minority status and the grade level assigned to their position.

lm(grade ~ minority, data = opm94) %>% summary()

## 
## Call:
## lm(formula = grade ~ minority, data = opm94)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0824 -3.0824  0.9176  2.9176  7.6838 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.0824     0.1215   82.97  < 2e-16 ***
## minority     -1.7662     0.2330   -7.58 7.88e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.279 on 998 degrees of freedom
## Multiple R-squared:  0.05444,    Adjusted R-squared:  0.05349 
## F-statistic: 57.46 on 1 and 998 DF,  p-value: 7.881e-14

lm(grade ~ minority, data = opm94) %>% confint()

##                 2.5 %    97.5 %
## (Intercept)  9.843943 10.320892
## minority    -2.223494 -1.308988

As the regression output shows, in this sample, the mean grade level of minority employees is lower from the mean grade level of those who do not identify as minorities by 1.7. The coefficient on minority is statistically significant ( p = 7.88*10^(-14) ), so the mean grade of minority employees is lower than the mean grade of nonminority employees in the entire population of federal employees that this sample represents. According to the confidence interval, we are 95% confident that the mean grade of minority employees is 2.22 to -1.31 less than the mean grade of nonminority employees in the population.

Now, let’s determine if minority status makes any difference for grade among employees that are comparable on other characteristics, including qualifications:

lm(grade ~ minority + yos + edyrs + male + patco + vet + age, dat = opm94) %>% summary()

## 
## Call:
## lm(formula = grade ~ minority + yos + edyrs + male + patco + 
##     vet + age, data = opm94)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9791 -0.9732  0.0513  0.9574  6.1572 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.403227   0.512530  12.493  < 2e-16 ***
## minority          -0.464511   0.119519  -3.887 0.000108 ***
## yos                0.062016   0.007718   8.035 2.65e-15 ***
## edyrs              0.275472   0.031629   8.709  < 2e-16 ***
## malemale           0.675422   0.124522   5.424 7.32e-08 ***
## patcoClerical     -5.666986   0.183039 -30.960  < 2e-16 ***
## patcoOther        -4.449826   0.317622 -14.010  < 2e-16 ***
## patcoProfessional -0.072975   0.152330  -0.479 0.632002    
## patcoTechnical    -3.635071   0.150396 -24.170  < 2e-16 ***
## vetyes            -0.158621   0.141580  -1.120 0.262832    
## age                0.000228   0.006723   0.034 0.972955    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.63 on 989 degrees of freedom
## Multiple R-squared:  0.7684, Adjusted R-squared:  0.766 
## F-statistic: 328.1 on 10 and 989 DF,  p-value: < 2.2e-16

lm(grade ~ minority + yos + edyrs + male + patco + vet + age, dat = opm94) %>% confint()

##                         2.5 %      97.5 %
## (Intercept)        5.39745648  7.40899712
## minority          -0.69905027 -0.22997165
## yos                0.04687012  0.07716127
## edyrs              0.21340425  0.33754049
## malemale           0.43106377  0.91978068
## patcoClerical     -6.02617626 -5.30779628
## patcoOther        -5.07311646 -3.82653471
## patcoProfessional -0.37190170  0.22595142
## patcoTechnical    -3.93020205 -3.33993919
## vetyes            -0.43645347  0.11921128
## age               -0.01296493  0.01342089

We can see that the difference in expected grades between the two groups has shrunk, but it still remains, and it is statistically significant. In other words, in the population, the expected grade of minority employees was still lower than that of nonminority employees with the same experience, education, occupational category, age, veteran status, and gender. More specifically, the expected grade of minorities is lower by 2.22 to -1.31 than the expected grade of comparable nonminorities.

Yuriy Davydenko 2020

Inference for Linear Regression (Examples)

Yuriy Davydenko