## BEFORE STARTING, TYPE YOUR NAME INTO THE FIELD "author" ABOVE AND REMOVE THIS COMMENT ##


This week on RCloud: https://rstudio.cloud/project/976166


Datasets for this assignment:

A random sample of 1,000 federal personnel records for March 1994:


1. DATA

Load opm94.sav.

# your work


2. CREATING NEW VARIABLE

To see how changing the units of measurement affects the regression coefficient and the correlation coefficient, create a new variable (edyrs_months) that measures edyrs in months instead of years.

# your work


3. CORRELATION MATRIX

Create a correlation matrix with sal, grade, edyrs,edyrs_months,yos, age, male01, minority:

# your work

QUESTIONS

3a. Which variable is `grade` most strongly related to? Rank order the variables in terms of the strength of their relationship with grade

3b. Which variable is years of federal service most strongly related to? most weakly related to?

3c. Look at the correlations between `edyrs` and `edyrs`_month` and between these two variables and all other variables. What's going on?


4. REGRESSION WITH NUMERIC EXPLANATORY VARIABLES

Run four regressions. Regress:

  1. sal on grade
  2. grade on yos
  3. grade on edyrs
  4. yos on age
# your work

QUESTIONS

4a. For each regression, briefly explain the meaning of the y-intercept and the regression coefficient.

4b. Find the expected salary for someone in 16th grade

4c. Find the expected grade for someone with 5 years of service

4d. Find the expected grade for someone with 12 years of education

Run another regression grade on edyrs_months:

# your work

QUESTIONS

4e. Why is the regressin coeficient different from the coefficient on `edyrs`? How are they the same?


5. REGRESSION WITH DUMMY EXPLANATORY VARIABLES

Create a dummy variable nonvet, which should be the mirror image of variable vet (vet = 0, nonvet = 1)

# opm94 <- opm94 %>% mutate(nonvet01 = if_else(vet01 == 0, 1, 0 ))

Regress sal on vet01:

# your work

Regress sal on nonvet01:

# your work

Compute mean salaries for vets and nonvets:

# your work

QUESTIONS

5a. Find the mean grades of veterans and nonveterans from the two rgression outputs


5b. Interpret the Y-intercepts. Why do they differ?


5c. Interpret the regression coefficients. Why do they differ?

Yuriy Davydenko 2020