POLS/CSSS 503, University of Washington, Spring 2015

Setup

This lab uses the following libraries

library("dplyr")
library("ggplot2")

Saving to Files

Data

Use a dataset as an example and

  • save to csv
  • save to RData
  • save to dta

Plots

You can save plots from within RStudio in the Plots pane with the Export menu item.

You generally want to export to a vector format such as PDF or SVG if possible. Otherwise, use PNG. You do not want to use JPEG since that is a lossy compression format.

You can also use R commands to save a plot to a file. The default way to do this in R is to using R’s low level graphics functions: pdf, png.

pdf("carplot.pdf")
ggplot(mtcars, aes(wt, mpg)) + geom_point()
dev.off()

Note that the file does not save until you close the device using dev.off(). This is to allow devices to work with base R graphics which often require several commands to create the plot.

The dev functions works for all types of ggplot2 For ggplot2 objects, you can use the function ggsave:

mtcars_plot <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
ggsave(filename = "mtcars_plot.pdf", plot = mtcars_plot)

ggsave() will determine the file format of the file to save from the extension of the filename argument. There are options for adjusting the height, width, dpi, etc. See the documentation for more information.

Important when you run an R markdown file, plots are saved to {filename}_files. So you an use them without manually

Merging

rossoil <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/rossoildata.csv", 
    na.strings = "")
democracy <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/democracy.csv", 
    header = TRUE, stringsAsFactors = FALSE, na.strings = ".")

Merge dataframes, keeping all countries in each even if no match in the other. Because our data is organised by country-year, include each

new_data <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME", 
    "YEAR"), all.x = TRUE, all.y = TRUE)

how do the original and merged datasets compare?

dim(rossoil)
## [1] 4530   59
dim(democracy)
## [1] 4126   16
dim(new_data)
## [1] 6312   73
ncol(democracy) + ncol(rossoil) - 2
## [1] 73
summary(new_data)

We can also, keep all dataframe 1

new_data_allx <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME", 
    "YEAR"), all.x = TRUE, all.y = FALSE)

Let’s check what it did

filter(new_data_allx, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 73]
## 
##   cty_name  year     id   id1 year1     wdr6   wdr123 wdr135   wdr269
##     (fctr) (int) (fctr) (int) (int)    (dbl)    (dbl)  (int)    (dbl)
## 1  Algeria  1966    DZA    54  1966 2.364711 59.29279     NA 6.21e+08
## 2  Algeria  1967    DZA    54  1967 1.702917 77.03384     NA 7.24e+08
## 3  Algeria  1968    DZA    54  1968 1.291746 70.97992     NA 8.30e+08
## 4  Algeria  1969    DZA    54  1969 0.804877 67.61797     NA 9.34e+08
## 5  Algeria  1970    DZA    54  1970 0.508937 70.23643     NA 1.01e+09
## 6  Algeria  1971    DZA    54  1971 0.891620 74.85199     NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
##   (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
##   (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
##   (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
##   (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
##   (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
##   WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
##   (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
##   (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
##   (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl),
##   COUNTRY (int), REGION (chr), BRITCOL (int), CATH (dbl), CIVLIB (int),
##   EDT (dbl), ELF60 (dbl), GDPW (int), MOSLEM (dbl), NEWC (int), OIL (int),
##   POLLIB (int), REG (int), STRA (int)
filter(rossoil, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 59]
## 
##   cty_name     id   id1  year year1     wdr6   wdr123 wdr135   wdr269
##     (fctr) (fctr) (int) (int) (int)    (dbl)    (dbl)  (int)    (dbl)
## 1  Algeria    DZA    54  1966  1966 2.364711 59.29279     NA 6.21e+08
## 2  Algeria    DZA    54  1967  1967 1.702917 77.03384     NA 7.24e+08
## 3  Algeria    DZA    54  1968  1968 1.291746 70.97992     NA 8.30e+08
## 4  Algeria    DZA    54  1969  1969 0.804877 67.61797     NA 9.34e+08
## 5  Algeria    DZA    54  1970  1970 0.508937 70.23643     NA 1.01e+09
## 6  Algeria    DZA    54  1971  1971 0.891620 74.85199     NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
##   (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
##   (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
##   (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
##   (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
##   (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
##   WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
##   (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
##   (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
##   (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl)
filter(democracy, CTYNAME == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 16]
## 
##   COUNTRY CTYNAME REGION  YEAR BRITCOL  CATH CIVLIB   EDT ELF60  GDPW
##     (int)   (chr)  (chr) (int)   (int) (dbl)  (int) (dbl) (dbl) (int)
## 1       1 Algeria Africa  1962       0   0.5     NA 1.160  0.43  5012
## 2       1 Algeria Africa  1963       0   0.5     NA 1.250  0.43  6083
## 3       1 Algeria Africa  1964       0   0.5     NA 1.345  0.43  6502
## 4       1 Algeria Africa  1965       0   0.5     NA 1.450  0.43  6620
## 5       1 Algeria Africa  1966       0   0.5     NA 1.560  0.43  6612
## 6       1 Algeria Africa  1967       0   0.5     NA 1.675  0.43  6982
## Variables not shown: MOSLEM (dbl), NEWC (int), OIL (int), POLLIB (int),
##   REG (int), STRA (int)
new_data_ally <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME", 
    "YEAR"), all.x = FALSE, all.y = TRUE)

Let’s check what it did

filter(new_data_ally, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 73]
## 
##   cty_name  year     id   id1 year1     wdr6   wdr123 wdr135   wdr269
##     (fctr) (int) (fctr) (int) (int)    (dbl)    (dbl)  (int)    (dbl)
## 1  Algeria  1962     NA    NA    NA       NA       NA     NA       NA
## 2  Algeria  1963     NA    NA    NA       NA       NA     NA       NA
## 3  Algeria  1964     NA    NA    NA       NA       NA     NA       NA
## 4  Algeria  1965     NA    NA    NA       NA       NA     NA       NA
## 5  Algeria  1966    DZA    54  1966 2.364711 59.29279     NA 6.21e+08
## 6  Algeria  1967    DZA    54  1967 1.702917 77.03384     NA 7.24e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
##   (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
##   (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
##   (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
##   (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
##   (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
##   WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
##   (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
##   (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
##   (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl),
##   COUNTRY (int), REGION (chr), BRITCOL (int), CATH (dbl), CIVLIB (int),
##   EDT (dbl), ELF60 (dbl), GDPW (int), MOSLEM (dbl), NEWC (int), OIL (int),
##   POLLIB (int), REG (int), STRA (int)
filter(rossoil, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 59]
## 
##   cty_name     id   id1  year year1     wdr6   wdr123 wdr135   wdr269
##     (fctr) (fctr) (int) (int) (int)    (dbl)    (dbl)  (int)    (dbl)
## 1  Algeria    DZA    54  1966  1966 2.364711 59.29279     NA 6.21e+08
## 2  Algeria    DZA    54  1967  1967 1.702917 77.03384     NA 7.24e+08
## 3  Algeria    DZA    54  1968  1968 1.291746 70.97992     NA 8.30e+08
## 4  Algeria    DZA    54  1969  1969 0.804877 67.61797     NA 9.34e+08
## 5  Algeria    DZA    54  1970  1970 0.508937 70.23643     NA 1.01e+09
## 6  Algeria    DZA    54  1971  1971 0.891620 74.85199     NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
##   (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
##   (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
##   (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
##   (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
##   (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
##   WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
##   (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
##   (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
##   (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl)
filter(democracy, CTYNAME == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 16]
## 
##   COUNTRY CTYNAME REGION  YEAR BRITCOL  CATH CIVLIB   EDT ELF60  GDPW
##     (int)   (chr)  (chr) (int)   (int) (dbl)  (int) (dbl) (dbl) (int)
## 1       1 Algeria Africa  1962       0   0.5     NA 1.160  0.43  5012
## 2       1 Algeria Africa  1963       0   0.5     NA 1.250  0.43  6083
## 3       1 Algeria Africa  1964       0   0.5     NA 1.345  0.43  6502
## 4       1 Algeria Africa  1965       0   0.5     NA 1.450  0.43  6620
## 5       1 Algeria Africa  1966       0   0.5     NA 1.560  0.43  6612
## 6       1 Algeria Africa  1967       0   0.5     NA 1.675  0.43  6982
## Variables not shown: MOSLEM (dbl), NEWC (int), OIL (int), POLLIB (int),
##   REG (int), STRA (int)

dplyr has its own merge functions described here,

Outliers

Challenge Replicate the analysis in Fox Chapte 11.2 Conduct outlier diagnostics for the regression of the prestige of occuptions in Canada in 1971 on income, education, percent women, and type (white collar, blue collar, professional). Are there any outliers? Consider hat values, Studentized residuals, and Cook’s distance. Which observation has the largest influence on the regression? How does the regression line change if you drop that observation?

library("car")
data("Prestige")
mod_prestige <- lm(prestige ~ income + education + women + type, data = Prestige)

Multiple Imputation

For this part we will use the Amelia package which implements a multiple imputation method.

library("Amelia")

We will use the Ross oil data that we’ve used throughout this course.

rossoil <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/rossoildata.csv") %>% 
    arrange(id1, year) %>% group_by(id1) %>% mutate(oilL5 = lag(wdr123, 5)/100, 
    metalL5 = lag(wdr313, 5)/100, GDPpcL5 = lag(wdr135, 5)/100, islam = islam/100)
rossoil1980 <- rossoil %>% filter(year == 1980)

Challenge Estimate the the following regression of regime type in 1980 with (1) listwise deletion, and (2) multiple imputation. How do the coefficients and standard errors of the regression coefficients differ?

model2 <- lm(regime1 ~ log(GDPcap) + metalL5 + oilL5 + oecd + islam, data = rossoil)

Note, it would be better to both estimate this model as a panel using all available data and to impute the data as a TSCS. See the Amelia vignette for examples of how to do that.

Sources


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. R code is licensed under a BSD 2-clause license.