Class Meetings
Monday | 04:30–05:50 PM | Savery 139 | |
Wednesday | 04:30–05:50 PM | Savery 139 | |
Lab | Friday | 01:30–03:20 PM | Savery 121 |
Overview
This course is the second of the two-quarter quantitative methods sequence for PhD students in the social sciences.
Some topics covered in this course are:
Linear Regression
- Assumptions
- Mechanics (linear algebra)
- Inference
- Diagnostics
- Robust and clustered standard errors
Model vs. design-based inference
Prediction vs. causal inference
Causal inference methods
- Selection on observables (regression, matching)
- Instrumental variables
- Regression discontinuity
- Fixed effects and difference-in-difference
Reproducible research methods
Course Goals
By the end of the course students will be able to
Prerequisites
This course is intended to be taken after POLS/CSSS 501 and will be taught with the assumption that students have an understanding of the material in that course. Contact the instructor if you would like to take this course but did not take 503. See the POLS 501 site for what was covered in that course.
- Introductory statistics: inference, hypothesis testing, linear regression.
R: We will use the R programming language in this course. POLS/CSSS 501 introduced R for data wrangling and visualization. Students are expected to be familiar with R prior to taking this course or be willing to spend extra time learning it. I suggest R for Data Science or DataCamp. In particular, this course will heavily use the tidyverse packages (ggplot2, dplyr, …).
Git and GitHub: We will use git and GitHub for assignments and projects. This was introduced in POLS 501, so you are not familiar with these tools it may take some extra time to familiarize yourself with them
Assignments
There are three main types of assignments for students:
- Weekly homework: Learning data analysis requires practice. There will be weekly homework assignments. See the assignments page.
- Research project: Students will complete a research project. The expectation is that students will work on the project throughout the quarter and apply concepts and skills to that project soon after covering them in class. See the project page for more details.
- Reading Assignments: Students are expected to come to class prepared. I have chosen textbooks that are accessible, so we will not spend valuable class time summarizing assigned readings. Instead we will use class for more value-added learning activities. As part of that, before each class students will provide feedback and questions on the readings that will be used to guide class in-class discussion.
Materials
Computational Tools
Students should have a laptop that they can bring to both class and lab as we will integrate computing with learning data analysis and statistics throughout the course.
This course will use R, which is a free and open-source programming language primarily used for statistics and data analysis. We will also use RStudio, which is an easy-to-use interface to R. Instructions to install or upgrade R are here.
This course will also use git (through RStudio) for version control, which is like “track changes” for a directory of files, in reproducible research. Homework assignments will be distributed and submitted via GitHub, which is a website that hosts git repositories. If that did not make sense, don’t worry; we’ll cover it in the course.
Students will have access to DataCamp classroom. You can use this for additional practice.
Books
This course will primarily rely on the following texts:
The primary texts for this course are:
- Angrist, Joshua D. and Pischke, Jörn-Steffen. 2015. “Mastering Metrics: The Path from Cause to Effect“
- Bailey. 2015. “Real Stats: Using Econometrics for Political Science and Public Policy“
Additional readings are on the schedule.
These texts were chosen to be accessible. Students are expected to complete readings and master the material prior to class so we can spend our time together on answering questions and solving problems.
Evaluation
Students will be evaluated on the whole of their work in this course. We will use the scale common to U.W. political science graduate courses:
4.0 | Exceptional |
3.9 | Very good |
3.8 | Meeting expectations |
3.7 | Somewhat below average |
3.6 | Not up to expectations |
≤ 3.5 | Way below expectations |
Communication
For questions regarding the content of the course, ask and answer them on our Slack channel. If you have a question about the topic, it is likely that someone else had the same question. Posting questions and answers publicly allows us all to learn from each of these questions and answers.
Reserve emails to the instructors for personal matters.
Errata
Changes
A summary of changes to the syllabus and schedule are posted in the CHANGELOG/
Resources
Beyond what the teaching team can provide, there are several resources on campus that you can go to for assistance with data, computing, and statistical problems:
- Center for Social Science Computing and Research (CSSCR) has a drop-in statistical consulting center in Savery 119. They provide consulting on statistical software, e.g. R. Go there for software or data related questions.
- CSSS Statistical Consulting provides general statistical consulting. Go there for questions about statistical methods.
- eScience Data Science Office Hours
License
Science should be open, and this course builds up other open licensed material, so unless otherwise noted, all materials for this class are licensed under a Creative Commons Attribution 4.0 International License.
Bugs
If you find any typos or other issues in this page, or any other page in the site, go to issues, click on the “New Issue” button to create a new issue, and describe the problem.