University of Washington, Spring 2016

Primary | Jeffrey Arnold | jrnold@uw.edu |

TA | Andreu Casas | acasas2@uw.edu |

Class | Tues, Thurs | 4:30–5:50 pm | Mary Gates Hall 284 |

Lab | Fri | 1:30–3:20 pm | Savery 121 |

Jeffrey Arnold | Mon 4–5pm, Wed 2–4pm | Smith 221B |

Andreu Casas | Tues and Thurs 3:20–4:20 pm | Smith 221E |

This course continues the graduate sequence in quantitative political methodology from POLS 501. In this course, students will learn the Statistical and computational principles necessary to perform modern, flexible, and creative analysis of quantitative social data. This course is focused particularly on fitting, interpreting, and refining the linear regression model. Emphasis is placed on modern interpretations of linear regression as causal inference, as well as an introduction to several modern computational tools (bootstrapping, cross-validation, regularization).

By the end of the semester, you will be able to:

- Conduct, interpret, and communicate results from analysis using multiple regression (including dummy variables and interactions).
- Explain the limitations of observational data for making causal claims, and begin to use existing strategies for attempting to make causal claims from observational data.
- Write clean, reusable, and reliable R code.
- Build a solid, reproducible research pipeline to go from raw data to final paper.
- Feel empowered working with data.

Further, because we cannot possibly cover everything that you will need to know during your career as a researcher, there are two final long-term goals. After this course is over, you will be able to:

- Learn new statistics
- Learn new programing

This course is designed to be a continuation of POLS/CS&SS 501. Although that is not a formal prerequisite for this course, I will assume that students have a basic understanding of the material covered in that course. In particular, students should have had a course in hypothesis testing, univariate statistical tests, and linear regression. I also assume that students have proficiency in R *prior* to starting the course.

There are two required texts for this course,

- Angrist, Joshua D., and Jörn-Steffen Pischke. 2009.
*Mostly Harmless Econometrics: An Empiricist’s Companion*. - Wooldrige, Jeffrey M.
*Introductory Econometrics*. 5th edition or earlier.

and one optional text,

- Angrist, Joshua D., and Jörn-Steffen Pischke. 2014.
*Mastering ’Metrics: The Path from Cause to Effect.*This covers most of the same material as*Mostly Harmless*but at a less technical level.

Other reading will come from articles or chapters, which if not open, will be available through either the UW library, or posted on Canvas.

Finally, much of the material and reading for this course will be available in the course notes.

This course takes an applied and computational approach to learning statistics. As such a programming language is essential. This course uses R as its statistical programming language, and the [RStudio] IDE as an interface to R. We will make use of several R packages, with extensive use of the Hadleyverse packages (ggplot2, dplyr, tidyr, …). Additionally, this course will use R Markdown for writing reproducible research reports with R and git and GitHub for version control, collaboration, and distribution of code and research.

Assignments for this course comprise:

Research project: Every student in this class will execute their own statistical data analysis of a research question. The results of this analysis will be presented as a paper due at the end of the course. See the schedule for the due date.

The purpose of this paper is for the students to apply the quantitative methods used in this course to the real-world research problems that they will encounter in their research careers. However, due to the limited time in this course, it is not necessary for this paper to address an important research problem or a novel contribution to the literature. While those will not be criteria for the evaluation of this paper, the author is encouraged to pursue those, as they are what leads to publications. The paper will be evaluated on the appropriateness of the statistical methods applied to the data and question, and not the novelty or contribution of the question itself.

If you developed a research design for POLS 501, you can continue to use it for 503. If you did not take POLS 501, then talk to the instructor to confirm that your project is feasible.

While the final paper is the ultimate objective of the paper, students will work with their data throughout the course, including the following assignments related to the research project.

- Proposal (week 2)
- Several analyses throughout the quarter
- Draft (week 9)
- Poster presentation (week 10)

- Participation: Students will submit either pull requests or issues that contribute to, or raise questions about the current week’s readings.
- Weekly or bi-weekly assignments: These assignments will largely focus on applying the concepts to either real or simulated data.
Peer review of assignments/projects: Students will review each others code and analysis and provide feedback.

The exact nature and timing of the assignments will adjust with the exigencies of the course in consulation with the students.

Students will be evaluated on the whole of their work in this course with an emphasis on the final paper. For this course, grades on the 4.0 scale have the following interpretation:4.0 | Exceptional |

3.9 | Very good |

3.8 | Meeting expectations |

3.7 | Somewhat below average |

3.6 | Not up to expectations |

≤ 3.5 | Way below expectations |

Below is a list of some of the topics that this course may cover. What is actually covered in course will depend on how the course evolves in practice. See the Schedule for readings and schedule, though it, too, will change over the course of the quarter.

- Types of Research Questions: Prediction vs. Casual Inference
- Potential outcomes framework for causal inference
- Linear Regression
- Matching estimators
- Instrumental variables
- Fixed effects and Difference-in-difference designs
- Regression discontinuity

- Reproducible Research
- Version control with GitHub
- Reproducible documents with R Markdown
- Programming with R

For questions about the course that would be of general interest to all students in the course, email the course mailing list, rather than the individual instructors. Please reserve emails to individual instructors for individual concerns, such as your data analysis project or personal matters.

Beyond what the teaching team can providing, there are several resources on campus that you can go for assistance with data, computing, and statistical problems:

- Center for Social Science Computing and Research (CSSCR) has a drop-in statistical consulting center in Savery 119. They provide consulting on statistical software, e.g. R. Go there for software or data related questions.
- CSSS Statistical Consulting provides general statistical consulting. Go there for questions about statistical methods.
- eScience Data Science Office Hours

This course was inspired by and makes use of some material from:

- Christopher Adolph, POLS 503. He was the previous instructor for this course.
- Jenny Bryan, Stat 545: Data wrangling, exploration, and analysis with R
- Software Carpentry
- Brenton Kenkel PSCI 8357: Statistics for Political Research II
- Matthew Blackwell, Gov 2002: Causal Inference and GOV 1000/2000/2000e/Stat E-190: Quantitative Research Methodology.
- Matthew Salganik, SOC 504: Sociology 504: Advanced data analysis for the social sciences
- MOOCS: Mine Çetinkaya-Rundel Sta 101 and the John Hopkins Data Science Sequence.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Parts of the course materials are derived from

- Matthew Salgnik Sociology 504: Advanced data analysis for the social sciences under a CC-BY. I use several of the learning objectives.

The source for the materials of this course is on GitHub at https://github.com/UW-POLS503/pols_503_sp16.