EXECUTIVE SUMMARY

According to mtcars data and the regression model developed in the present analysis, car fuel efficiency mpg is statistically significantly influenced by a vehicle’s weigh wt and its horse power hp. At the same time, an mpg difference based on the type of a transmission can be observed in the sample only: among cars of the same weight and horse power, those with manual transmissions are expected to deliver two mpg more than those with automatic transmission. This effect appears to be statistically insignificant and cannot be genereralized to a larger population of cars - these data provide no evidence that either manual or automatic transmissions are better for MPG.

ANALYSIS

The provided dataset consists of 32 observations and 11 following variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb.

Variables

The outcome variable in this analysis fuel efficiecny mpg measured in Miles per Gallon. Its class is numeric, which makes it suitable for modeling with the linear OLS model. It ranges from 10.4 to 33.9 mpg with tha mean of 20.09 (see the distribution in the Appendix) .

The independent variable of interest is a transmission type am, measured as 0 for Automatic and 1 for Manual transmissions. In the dataset, the class of the variable is numeric, so it needs to be converted to a factor. There are 40.62 percent of cars with the manual transmission in the sample.

Other variables that can potentially affect fuel efficiency are the following:

  • cyl: Number of cylinders (numeric with values 4, 6 and 8). The distribution of the varibale is presented in Fig.2 in the Appendix. Can be modelled eiher as a numeric variable or as a factor.
  • disp: Displacement (cu.in). The distribution of the varibale is presented in Fig.3 in the Appendix.
  • hp: Gross horsepower, a numeric variable with the distribution provided in Fig.4
  • wt: Weight (1000 lbs), a numeric variable, Fig.5
  • gear: Number of forward gears, a numeric varible, Fig.6

Exploratory Analysis

Calcualting the means in mpg across the two types of tranmissions shows there is a substantial difference in fuel efficiency in cars with different types of transmissions (also see the boxplots in Fig.0 in the Appendix):

am Mean_MPG SD
Automatic 17.14737 3.833966
Manual 24.39231 6.166504

Modeling

The results of fitting a bivariate model with mpg as the autcome and am as the single predictors are as follows:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.147368 1.124602 15.247492 0.000000
amManual 7.244939 1.764422 4.106127 0.000285

They show that the expected mpg of cars with an automatic transmission is 17.1473684. Cars with a manual transmission are expected to deliver 7.2449393 mpg more than cars with an automatic transmission. This difference is statistically significant at the 95% confidence level with the following confidence intervals that outline the differences in the larger population of cars:

2.5 % 97.5 %
(Intercept) 14.85062 19.44411
amManual 3.64151 10.84837

However, there are other car characteristics that might influence mpg and might be correlated with am. As a result, the coefiicient on am might be biased:

mpg cyl disp hp wt gear
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 -0.8676594 0.4802848
cyl -0.8521620 1.0000000 0.9020329 0.8324475 0.7824958 -0.4926866
disp -0.8475514 0.9020329 1.0000000 0.7909486 0.8879799 -0.5555692
hp -0.7761684 0.8324475 0.7909486 1.0000000 0.6587479 -0.1257043
wt -0.8676594 0.7824958 0.8879799 0.6587479 1.0000000 -0.5832870
gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 -0.5832870 1.0000000

As the table above shows, many potential predictors of mpg are hightly correlated with each other and the predictor variable, so using all of them in the model might be not a good idea due to variance inflation. Therefore, the best parsimonious model will be selected using the Adjusted R-squared and ANOVA on nested models. Those variables that add little explanatory power to the model will be dropped.

One of the most essential differences among cars that influences mpg is their weight. Weight wt is higly correlated with the number of cylinders and displacement. The first adjustment to the model is adding the weight variable:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.3215513 3.0546385 12.2179928 0.0000000
amManual -0.0236152 1.5456453 -0.0152786 0.9879146
wt -5.3528114 0.7882438 -6.7908072 0.0000002
## Adjusted R-squared: 0.7357889
## anova(model1, model2): Pr(>F) 1.867415e-07

The adjusted model appears to be a significant improvement over the bivariate model. It explains 73.58 percent of the variation in the outcome as opposed to 33.85 percent of the variation in the first model - a substantial improvement along with a higly significant ANOVA test comparing the two models.

However, in this model that now controls for a car’s weight, the type of transmission doesn’t make difference in terms of mpg. The coefficient on am is small and insignificant.

Next adjustment is made by adding displacement disp to the model:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.6759109 3.2406089 10.7004306 0.0000000
amManual 0.1777241 1.4843159 0.1197347 0.9055483
wt -3.2790439 1.3275093 -2.4700723 0.0198666
disp -0.0178049 0.0093747 -1.8992613 0.0678774
## Adjusted R-squared: 0.757583
## anova(model2, model3): Pr(>F) 0.0678774

The coefficent on disp is small and statistically insignificant. The improvement in the Adjusted R-squared is minor: 75.76 - 73.58 = 2.18 percentage point and the ANOVA test suggest that adding the variable doesn’t significantly improve the model. Therefore, disp will be dropped from the model.

Adding hp to the model yields the following coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.0028751 2.6426593 12.866916 0.0000000
amManual 2.0837101 1.3764202 1.513862 0.1412682
wt -2.8785754 0.9049705 -3.180850 0.0035740
hp -0.0374787 0.0096054 -3.901830 0.0005464
## Adjusted R-squared: 0.8227357
## anova(model2, model4): Pr(>F) 0.0005464023

The coefficient on hp appears to be stattistically significant and adding the variable improves the Adjusted R-squred.

Fianlly, adjusting the model by adding cyl and gear does not significantly improve it:

## anova(model4, model5): Pr(>F) 0.2119166
## anova(model4, model6): Pr(>F) 0.7081449

The diagnostic plot (see Fig.12) shows that the residuals are evenly spred around zero and across the fitted values, which suggestions that model assumptions are met.

Summary

The resulting model has three predictors: Transmission Type am, Weight wt, and Horsepower hp. According to this model, in this sample, cars that have a manual transmission are expected to deliver 2.0837101 more mpg than the cars with the same weight and horsepower that have an automatic transmission. This diffeence, however, is not statistically significant, so the conclusion cannot be extended to the larger population of cars: we can be 95% confident that in the population of cars that the sample represents, cars with a manual transmission deliver between -0.7357587 less to 4.903179 more mpg than similar in terms of weight and horsepower cars with an automatic transmission. In other words, these data provide no convincing evidence that either manual or automatic transmissions are better for MPG.

APPENDIX

```