According to mtcars
data and the regression model developed in the present analysis, car fuel efficiency mpg
is statistically significantly influenced by a vehicle’s weigh wt
and its horse power hp
. At the same time, an mpg
difference based on the type of a transmission can be observed in the sample only: among cars of the same weight and horse power, those with manual transmissions are expected to deliver two mpg more than those with automatic transmission. This effect appears to be statistically insignificant and cannot be genereralized to a larger population of cars - these data provide no evidence that either manual or automatic transmissions are better for MPG.
The provided dataset consists of 32 observations and 11 following variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb.
The outcome variable in this analysis fuel efficiecny mpg
measured in Miles per Gallon. Its class is numeric, which makes it suitable for modeling with the linear OLS model. It ranges from 10.4 to 33.9 mpg with tha mean of 20.09 (see the distribution in the Appendix) .
The independent variable of interest is a transmission type am
, measured as 0 for Automatic and 1 for Manual transmissions. In the dataset, the class of the variable is numeric, so it needs to be converted to a factor. There are 40.62 percent of cars with the manual transmission in the sample.
Other variables that can potentially affect fuel efficiency are the following:
cyl
: Number of cylinders (numeric with values 4, 6 and 8). The distribution of the varibale is presented in Fig.2 in the Appendix. Can be modelled eiher as a numeric variable or as a factor.disp
: Displacement (cu.in). The distribution of the varibale is presented in Fig.3 in the Appendix.hp
: Gross horsepower, a numeric variable with the distribution provided in Fig.4wt
: Weight (1000 lbs), a numeric variable, Fig.5gear
: Number of forward gears, a numeric varible, Fig.6Calcualting the means in mpg
across the two types of tranmissions shows there is a substantial difference in fuel efficiency in cars with different types of transmissions (also see the boxplots in Fig.0 in the Appendix):
am | Mean_MPG | SD |
---|---|---|
Automatic | 17.14737 | 3.833966 |
Manual | 24.39231 | 6.166504 |
The results of fitting a bivariate model with mpg
as the autcome and am
as the single predictors are as follows:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 17.147368 | 1.124602 | 15.247492 | 0.000000 |
amManual | 7.244939 | 1.764422 | 4.106127 | 0.000285 |
They show that the expected mpg
of cars with an automatic transmission is 17.1473684. Cars with a manual transmission are expected to deliver 7.2449393 mpg more than cars with an automatic transmission. This difference is statistically significant at the 95% confidence level with the following confidence intervals that outline the differences in the larger population of cars:
2.5 % | 97.5 % | |
---|---|---|
(Intercept) | 14.85062 | 19.44411 |
amManual | 3.64151 | 10.84837 |
However, there are other car characteristics that might influence mpg
and might be correlated with am
. As a result, the coefiicient on am
might be biased:
mpg | cyl | disp | hp | wt | gear | |
---|---|---|---|---|---|---|
mpg | 1.0000000 | -0.8521620 | -0.8475514 | -0.7761684 | -0.8676594 | 0.4802848 |
cyl | -0.8521620 | 1.0000000 | 0.9020329 | 0.8324475 | 0.7824958 | -0.4926866 |
disp | -0.8475514 | 0.9020329 | 1.0000000 | 0.7909486 | 0.8879799 | -0.5555692 |
hp | -0.7761684 | 0.8324475 | 0.7909486 | 1.0000000 | 0.6587479 | -0.1257043 |
wt | -0.8676594 | 0.7824958 | 0.8879799 | 0.6587479 | 1.0000000 | -0.5832870 |
gear | 0.4802848 | -0.4926866 | -0.5555692 | -0.1257043 | -0.5832870 | 1.0000000 |
As the table above shows, many potential predictors of mpg
are hightly correlated with each other and the predictor variable, so using all of them in the model might be not a good idea due to variance inflation. Therefore, the best parsimonious model will be selected using the Adjusted R-squared and ANOVA on nested models. Those variables that add little explanatory power to the model will be dropped.
One of the most essential differences among cars that influences mpg is their weight. Weight wt
is higly correlated with the number of cylinders and displacement. The first adjustment to the model is adding the weight variable:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 37.3215513 | 3.0546385 | 12.2179928 | 0.0000000 |
amManual | -0.0236152 | 1.5456453 | -0.0152786 | 0.9879146 |
wt | -5.3528114 | 0.7882438 | -6.7908072 | 0.0000002 |
## Adjusted R-squared: 0.7357889
## anova(model1, model2): Pr(>F) 1.867415e-07
The adjusted model appears to be a significant improvement over the bivariate model. It explains 73.58 percent of the variation in the outcome as opposed to 33.85 percent of the variation in the first model - a substantial improvement along with a higly significant ANOVA test comparing the two models.
However, in this model that now controls for a car’s weight, the type of transmission doesn’t make difference in terms of mpg
. The coefficient on am
is small and insignificant.
Next adjustment is made by adding displacement disp
to the model:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 34.6759109 | 3.2406089 | 10.7004306 | 0.0000000 |
amManual | 0.1777241 | 1.4843159 | 0.1197347 | 0.9055483 |
wt | -3.2790439 | 1.3275093 | -2.4700723 | 0.0198666 |
disp | -0.0178049 | 0.0093747 | -1.8992613 | 0.0678774 |
## Adjusted R-squared: 0.757583
## anova(model2, model3): Pr(>F) 0.0678774
The coefficent on disp
is small and statistically insignificant. The improvement in the Adjusted R-squared is minor: 75.76 - 73.58 = 2.18 percentage point and the ANOVA test suggest that adding the variable doesn’t significantly improve the model. Therefore, disp
will be dropped from the model.
Adding hp
to the model yields the following coefficients:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 34.0028751 | 2.6426593 | 12.866916 | 0.0000000 |
amManual | 2.0837101 | 1.3764202 | 1.513862 | 0.1412682 |
wt | -2.8785754 | 0.9049705 | -3.180850 | 0.0035740 |
hp | -0.0374787 | 0.0096054 | -3.901830 | 0.0005464 |
## Adjusted R-squared: 0.8227357
## anova(model2, model4): Pr(>F) 0.0005464023
The coefficient on hp
appears to be stattistically significant and adding the variable improves the Adjusted R-squred.
Fianlly, adjusting the model by adding cyl
and gear
does not significantly improve it:
## anova(model4, model5): Pr(>F) 0.2119166
## anova(model4, model6): Pr(>F) 0.7081449
The diagnostic plot (see Fig.12) shows that the residuals are evenly spred around zero and across the fitted values, which suggestions that model assumptions are met.
The resulting model has three predictors: Transmission Type am
, Weight wt
, and Horsepower hp
. According to this model, in this sample, cars that have a manual transmission are expected to deliver 2.0837101 more mpg than the cars with the same weight and horsepower that have an automatic transmission. This diffeence, however, is not statistically significant, so the conclusion cannot be extended to the larger population of cars: we can be 95% confident that in the population of cars that the sample represents, cars with a manual transmission deliver between -0.7357587 less to 4.903179 more mpg than similar in terms of weight and horsepower cars with an automatic transmission. In other words, these data provide no convincing evidence that either manual or automatic transmissions are better for MPG.
```