POLS/CSSS 503, University of Washington, Spring 2015
This lab uses the following libraries
library("dplyr")
library("ggplot2")
Use a dataset as an example and
You can save plots from within RStudio in the Plots pane with the Export menu item.
You generally want to export to a vector format such as PDF or SVG if possible. Otherwise, use PNG. You do not want to use JPEG since that is a lossy compression format.
You can also use R commands to save a plot to a file. The default way to do this in R is to using R’s low level graphics functions: pdf
, png
.
pdf("carplot.pdf")
ggplot(mtcars, aes(wt, mpg)) + geom_point()
dev.off()
Note that the file does not save until you close the device using dev.off()
. This is to allow devices to work with base R graphics which often require several commands to create the plot.
The dev
functions works for all types of ggplot2
For ggplot2 objects, you can use the function ggsave
:
mtcars_plot <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
ggsave(filename = "mtcars_plot.pdf", plot = mtcars_plot)
ggsave()
will determine the file format of the file to save from the extension of the filename
argument. There are options for adjusting the height, width, dpi, etc. See the documentation for more information.
Important when you run an R markdown file, plots are saved to {filename}_files
. So you an use them without manually
rossoil <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/rossoildata.csv",
na.strings = "")
democracy <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/democracy.csv",
header = TRUE, stringsAsFactors = FALSE, na.strings = ".")
Merge dataframes, keeping all countries in each even if no match in the other. Because our data is organised by country-year, include each
new_data <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME",
"YEAR"), all.x = TRUE, all.y = TRUE)
how do the original and merged datasets compare?
dim(rossoil)
## [1] 4530 59
dim(democracy)
## [1] 4126 16
dim(new_data)
## [1] 6312 73
ncol(democracy) + ncol(rossoil) - 2
## [1] 73
summary(new_data)
We can also, keep all dataframe 1
new_data_allx <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME",
"YEAR"), all.x = TRUE, all.y = FALSE)
Let’s check what it did
filter(new_data_allx, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 73]
##
## cty_name year id id1 year1 wdr6 wdr123 wdr135 wdr269
## (fctr) (int) (fctr) (int) (int) (dbl) (dbl) (int) (dbl)
## 1 Algeria 1966 DZA 54 1966 2.364711 59.29279 NA 6.21e+08
## 2 Algeria 1967 DZA 54 1967 1.702917 77.03384 NA 7.24e+08
## 3 Algeria 1968 DZA 54 1968 1.291746 70.97992 NA 8.30e+08
## 4 Algeria 1969 DZA 54 1969 0.804877 67.61797 NA 9.34e+08
## 5 Algeria 1970 DZA 54 1970 0.508937 70.23643 NA 1.01e+09
## 6 Algeria 1971 DZA 54 1971 0.891620 74.85199 NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
## (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
## (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
## (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
## (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
## (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
## WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
## (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
## (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
## (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl),
## COUNTRY (int), REGION (chr), BRITCOL (int), CATH (dbl), CIVLIB (int),
## EDT (dbl), ELF60 (dbl), GDPW (int), MOSLEM (dbl), NEWC (int), OIL (int),
## POLLIB (int), REG (int), STRA (int)
filter(rossoil, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 59]
##
## cty_name id id1 year year1 wdr6 wdr123 wdr135 wdr269
## (fctr) (fctr) (int) (int) (int) (dbl) (dbl) (int) (dbl)
## 1 Algeria DZA 54 1966 1966 2.364711 59.29279 NA 6.21e+08
## 2 Algeria DZA 54 1967 1967 1.702917 77.03384 NA 7.24e+08
## 3 Algeria DZA 54 1968 1968 1.291746 70.97992 NA 8.30e+08
## 4 Algeria DZA 54 1969 1969 0.804877 67.61797 NA 9.34e+08
## 5 Algeria DZA 54 1970 1970 0.508937 70.23643 NA 1.01e+09
## 6 Algeria DZA 54 1971 1971 0.891620 74.85199 NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
## (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
## (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
## (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
## (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
## (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
## WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
## (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
## (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
## (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl)
filter(democracy, CTYNAME == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 16]
##
## COUNTRY CTYNAME REGION YEAR BRITCOL CATH CIVLIB EDT ELF60 GDPW
## (int) (chr) (chr) (int) (int) (dbl) (int) (dbl) (dbl) (int)
## 1 1 Algeria Africa 1962 0 0.5 NA 1.160 0.43 5012
## 2 1 Algeria Africa 1963 0 0.5 NA 1.250 0.43 6083
## 3 1 Algeria Africa 1964 0 0.5 NA 1.345 0.43 6502
## 4 1 Algeria Africa 1965 0 0.5 NA 1.450 0.43 6620
## 5 1 Algeria Africa 1966 0 0.5 NA 1.560 0.43 6612
## 6 1 Algeria Africa 1967 0 0.5 NA 1.675 0.43 6982
## Variables not shown: MOSLEM (dbl), NEWC (int), OIL (int), POLLIB (int),
## REG (int), STRA (int)
new_data_ally <- merge(rossoil, democracy, by.x = c("cty_name", "year"), by.y = c("CTYNAME",
"YEAR"), all.x = FALSE, all.y = TRUE)
Let’s check what it did
filter(new_data_ally, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 73]
##
## cty_name year id id1 year1 wdr6 wdr123 wdr135 wdr269
## (fctr) (int) (fctr) (int) (int) (dbl) (dbl) (int) (dbl)
## 1 Algeria 1962 NA NA NA NA NA NA NA
## 2 Algeria 1963 NA NA NA NA NA NA NA
## 3 Algeria 1964 NA NA NA NA NA NA NA
## 4 Algeria 1965 NA NA NA NA NA NA NA
## 5 Algeria 1966 DZA 54 1966 2.364711 59.29279 NA 6.21e+08
## 6 Algeria 1967 DZA 54 1967 1.702917 77.03384 NA 7.24e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
## (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
## (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
## (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
## (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
## (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
## WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
## (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
## (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
## (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl),
## COUNTRY (int), REGION (chr), BRITCOL (int), CATH (dbl), CIVLIB (int),
## EDT (dbl), ELF60 (dbl), GDPW (int), MOSLEM (dbl), NEWC (int), OIL (int),
## POLLIB (int), REG (int), STRA (int)
filter(rossoil, cty_name == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 59]
##
## cty_name id id1 year year1 wdr6 wdr123 wdr135 wdr269
## (fctr) (fctr) (int) (int) (int) (dbl) (dbl) (int) (dbl)
## 1 Algeria DZA 54 1966 1966 2.364711 59.29279 NA 6.21e+08
## 2 Algeria DZA 54 1967 1967 1.702917 77.03384 NA 7.24e+08
## 3 Algeria DZA 54 1968 1968 1.291746 70.97992 NA 8.30e+08
## 4 Algeria DZA 54 1969 1969 0.804877 67.61797 NA 9.34e+08
## 5 Algeria DZA 54 1970 1970 0.508937 70.23643 NA 1.01e+09
## 6 Algeria DZA 54 1971 1971 0.891620 74.85199 NA 8.57e+08
## Variables not shown: wdr271 (dbl), wdr272 (dbl), wdr273 (dbl), wdr313
## (dbl), wdr344 (dbl), wdr400 (dbl), wdr477 (dbl), ssafrica (int), mideast
## (int), me_nafr (int), oecd (int), v6 (dbl), agr (dbl), v123 (dbl), oil
## (dbl), v313 (dbl), metal (dbl), regime (dbl), regime1 (dbl), wdr97
## (dbl), wdr151 (int), wdr152 (int), log135 (dbl), milpers (dbl), islam
## (dbl), ELF (int), Food (dbl), AgrFood (dbl), WDR85 (dbl), WDR87 (dbl),
## WDR88 (dbl), illit (dbl), life (dbl), WDR409 (dbl), WDR411 (dbl), tv
## (dbl), WDR86 (dbl), phones (dbl), wdr129 (dbl), cgdp (int), GDPcap
## (dbl), logGDPcp (dbl), wdr93 (dbl), wdr440 (dbl), eth (dbl), govtconsump
## (dbl), regime1_5 (dbl), log135_5 (dbl), oil_5 (dbl), metal_5 (dbl)
filter(democracy, CTYNAME == "Algeria") %>% tbl_df() %>% head()
## Source: local data frame [6 x 16]
##
## COUNTRY CTYNAME REGION YEAR BRITCOL CATH CIVLIB EDT ELF60 GDPW
## (int) (chr) (chr) (int) (int) (dbl) (int) (dbl) (dbl) (int)
## 1 1 Algeria Africa 1962 0 0.5 NA 1.160 0.43 5012
## 2 1 Algeria Africa 1963 0 0.5 NA 1.250 0.43 6083
## 3 1 Algeria Africa 1964 0 0.5 NA 1.345 0.43 6502
## 4 1 Algeria Africa 1965 0 0.5 NA 1.450 0.43 6620
## 5 1 Algeria Africa 1966 0 0.5 NA 1.560 0.43 6612
## 6 1 Algeria Africa 1967 0 0.5 NA 1.675 0.43 6982
## Variables not shown: MOSLEM (dbl), NEWC (int), OIL (int), POLLIB (int),
## REG (int), STRA (int)
dplyr has its own merge functions described here,
Challenge Replicate the analysis in Fox Chapte 11.2 Conduct outlier diagnostics for the regression of the prestige of occuptions in Canada in 1971 on income, education, percent women, and type (white collar, blue collar, professional). Are there any outliers? Consider hat values, Studentized residuals, and Cook’s distance. Which observation has the largest influence on the regression? How does the regression line change if you drop that observation?
library("car")
data("Prestige")
mod_prestige <- lm(prestige ~ income + education + women + type, data = Prestige)
For this part we will use the Amelia package which implements a multiple imputation method.
library("Amelia")
We will use the Ross oil data that we’ve used throughout this course.
rossoil <- read.csv("http://UW-POLS503.github.io/pols_503_sp15/data/rossoildata.csv") %>%
arrange(id1, year) %>% group_by(id1) %>% mutate(oilL5 = lag(wdr123, 5)/100,
metalL5 = lag(wdr313, 5)/100, GDPpcL5 = lag(wdr135, 5)/100, islam = islam/100)
rossoil1980 <- rossoil %>% filter(year == 1980)
Challenge Estimate the the following regression of regime type in 1980 with (1) listwise deletion, and (2) multiple imputation. How do the coefficients and standard errors of the regression coefficients differ?
model2 <- lm(regime1 ~ log(GDPcap) + metalL5 + oilL5 + oecd + islam, data = rossoil)
Note, it would be better to both estimate this model as a panel using all available data and to impute the data as a TSCS. See the Amelia vignette for examples of how to do that.