For those who prefer to work with RCloud, a project with the same materials can be accessed using the following link:

INFERENCE FOR CATEGORICAL DATA

Load `gss98` dataset and library `descr`

library(dplyr)  # library with handy functions to manipulate data
library(descr)  # library with functions that construct user-friendly contingency tables
load("Datasets/gss98.RData"); names(gss98)

##  [1] "X.1"      "X"        "SEX"      "RACE"     "RELIG"    "FUND"    
##  [7] "MARITAL"  "ATTEND"   "PREMARSX" "XMARSEX"  "HOMOSEX"  "TEENSEX" 
## [13] "ABANY"    "CAPPUN"   "GUNLAW"   "GRASS"    "PRAYER"   "NATCITY" 
## [19] "NATHEAL"  "NATCRIME" "NATDRUG"  "NATEDUC"  "NATRACE"  "NATFARE" 
## [25] "NATROAD"  "NATMASS"  "CONCLERG" "CONEDUC"  "CONFED"   "CONPRESS"
## [31] "CONJUDGE" "CONLEGIS" "FECHLD"   "FEHELP"   "FEPRESCH" "FEFAM"   
## [37] "RACDIF1"  "LIVEBLKS" "MARBLK"   "DISCAFF"  "PARTY"    "IDEOLOGY"
## [43] "AGESUM"   "INCOME"   "EDUC2"    "REGION2"  "CITY"     "RURAL"   
## [49] "PROT"     "NEWFUND"

Variable NATHEAL measures American’s attitudes towards public spending on healthcare using the following survey question: Are we spending too much, too little, or about the right amount on improving and protecting the nation’s health?

American’s attitudes towards public spending on healthcare can be related to various individual characteristics. Let’s explore if these attitudes are associated with gender and political ideology.

The following function constructs a contingency table for NATHEAL as the response variable and SEX as the explanatory variable:

CrossTable(gss98$NATHEAL, gss98$SEX, prop.r = F, prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## =======================================
##                  gss98$SEX
## gss98$NATHEAL    female    male   Total
## ---------------------------------------
## Too little         208     133     341 
##                   73.2%   67.9%        
## ---------------------------------------
## About right         61      54     115 
##                   21.5%   27.6%        
## ---------------------------------------
## Too much            15       9      24 
##                    5.3%    4.6%        
## ---------------------------------------
## Total              284     196     480 
##                   59.2%   40.8%        
## =======================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 2.367944      d.f. = 2      p = 0.306 
## 
##         Minimum expected frequency: 9.8

Alternative way generate the same crosstab (without using package descr):

prop.table(table(gss98$NATHEAL, gss98$SEX), 2) %>% round(digits = 2)

##              
##               female male
##   Too little    0.73 0.68
##   About right   0.21 0.28
##   Too much      0.05 0.05

chisq.test(table(gss98$NATHEAL, gss98$SEX))

## 
##  Pearson's Chi-squared test
## 
## data:  table(gss98$NATHEAL, gss98$SEX)
## X-squared = 2.3679, df = 2, p-value = 0.3061

INTERPRETATION:

In the sample, 73.2 percent of women and 67.9 percent of men said that we are spending too little on improving and protecting the nation’s health. Women are 5.1 percent more likely to believe that we are spending too little on healthcare. However, the high p-value of the chi-squared statistics (p = 0.306 > 0.05) suggests that this sample is highly likely from a population with no relationship between the attitudes towards spending on public health and gender. In other words, there we cannot reject the null hypothesis that there is no relationship between the two variables, so we must tentatively accept it. Overall, our test does not provide evidence supporting that in the U.S. population, women have different views from men.

Let’s see if people that have different political ideologies also have different views on public health spending:

CrossTable(gss98$NATHEAL, gss98$IDEOLOGY, prop.r = F, prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==========================================================
##                  gss98$IDEOLOGY
## gss98$NATHEAL    liberal   moderate   conservative   Total
## ----------------------------------------------------------
## Too little          104        121            101     326 
##                    73.2%      76.6%          64.3%        
## ----------------------------------------------------------
## About right          32         34             43     109 
##                    22.5%      21.5%          27.4%        
## ----------------------------------------------------------
## Too much              6          3             13      22 
##                     4.2%       1.9%           8.3%        
## ----------------------------------------------------------
## Total               142        158            157     457 
##                    31.1%      34.6%          34.4%        
## ==========================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 9.858755      d.f. = 4      p = 0.0429 
## 
##         Minimum expected frequency: 6.835886

Alternative code (without using package descr):

prop.table(table(gss98$NATHEAL, gss98$IDEOLOGY), 2) %>% round(digits = 2)

##              
##               liberal moderate conservative
##   Too little     0.73     0.77         0.64
##   About right    0.23     0.22         0.27
##   Too much       0.04     0.02         0.08

chisq.test(table(gss98$NATHEAL, gss98$IDEOLOGY))

## 
##  Pearson's Chi-squared test
## 
## data:  table(gss98$NATHEAL, gss98$IDEOLOGY)
## X-squared = 9.8588, df = 4, p-value = 0.04288

INTERPRETATION:

The crosstab shows that in the sample, 73.2 percent of liberals, 76.6 percent of moderates, and only 64.3 percent of conservatives answered that we are spending too little on improving and protecting the nation’s health. In the sample, liberals and moderates are correspondingly 8.9 and 12.3 percentage points more likely than conservatives to believe that we are spending too little on healthcare.

Is there evidence that public attitudes towards spending on healthcare differ across groups with different political ideologies in the population of U.S. residents?

The chi-square statistic has a p-value of 0.04, which is less than the commonly accepted critical level of 0.05. This suggests that if there were no relationship between the two variables in the population, the probability of obtaining the chi-square of 9.86 would be 0.04 - a very unlikely outcome. Thus, the test provides evidence that allows us to reject the null hypothesis of no relationship, and we can conclude that the attitudes towards spending on public healthcare and political ideology are related in the population of U.S. residents.

Inference for Categorical Data

Yuriy Davydenko

May 11 2020

INFERENCE FOR CATEGORICAL DATA

Load `gss98` dataset and library `descr`

Inference for Categorical Data

Yuriy Davydenko

May 11 2020

INFERENCE FOR CATEGORICAL DATA

Load gss98 dataset and library descr

Load `gss98` dataset and library `descr`