See the instructions here. Additionally, here are a few clarifications.

Your paper should be written in the form as an academic paper. Meaning, it should not be in the format of homeworks.
Your can write your paper in whatever you want (LaTeX, Word, …), but please turn it a pdf. Occassionaly, .doc files can have issues rendering correctly when opened in programs other than Microsoft Word, so it is safest to export them to a pdf.
Turn in the code and data you used to run the analyses in your paper. These can be R markdown files or R scripts. Your analysis should run without errors. It is very important that you relate your code to the analyses in your paper. E.g. “this is the code used to create the models in Table 1 …”.
If you are using Stata for some data cleaning, include the .do files.
Explain what is in your analyses in a file named README. This file should explain what is in each of the other files, as well as brief instructions of how to run the analyses that you include. This can be brief, but should be written so that if (when) someone else wants to reproduce your analyses, they have sufficient instructions to be able to do so with the code and data you have provided.

Next week and logit code

2015-05-29 10:36

I updated the syllabus with some relevant readings on panel data. Additionally, I’m going to try to write up a worked example for limited dependent variables, but in the meantime you can look at a lab from last year here.

Homework 4 solutions posted

Jeffrey Arnold

2015-05-28 10:10

I posted homework 4 solutions on canvas. Your answers may (and likely will) diverge on the model selection question using the Ross data. Model selection is not an easy problem, and in research, there is not simple algorithm to find the “best” model. In part, this is because the “best” model is contingent on your purposes. Generally what you are looking for while exploring the space of models is how sensitive your results are to model specification. If they are robust to a variety of specifications, that is a good sign. If they are sensitive to the particular specification, it could be the case that the result was spurious, or you need to bring other data or theory to the problem to understand why one particular specification should be preferred to the others.

Final Papers

Jeffrey Arnold

2015-05-27 17:01

A reminder, the final paper is due on June 9, 2015 15:00 PDT. Submit a copy of the paper and all associated code and data necessary to produce the results of the paper (see rule 7) via Canvas.

As in any graduate courses, strive to write, draft, or work on research that could turn into a publishable article. There are several rules and recommendations for this assignment.¹

Rules

Students may work in pairs or triples on the assignment, or combine the paper with an assignment for another class (provided the instructor of that class approves), but not both.
Students may write a replication and extension of a published article or pursue an original analysis. Replication and extension may help get the assignment done on time, especially since the quarter system only grants  weeks.
Except in cases of family or medical emergency, I will not grant extensions or incompletes. Think of this as a favor: I expect the paper to be the product of ten weeks work, not a year, so turn in what you can accomplish in ten weeks to get interim feedback, even if your ultimate research aims are broader than a term assignment.
The paper should be 12–20 pages long, but longer papers are acceptable. However, the quality of the analysis and clarity of presentation far outweigh the quantity of prose.
The main focus of the paper should be on data, methods, and results. Justify your modeling choices with reference to the data and present your findings in terms any intelligent person could understand, regardless of their statistical knowledge. This should not limit the sophistication of your methods: just think hard about how to explain results from complicated models in approachable terms.
Don’t spend too much time on literature reviews or theory but don’t neglect hypothesis building, either. (Note, however, that hypotheses often can be clearly explicated without recourse to numbered lists.) By the time I reach the results of your paper, I should have some idea what you expect, what would be surprising, and why. I should also know why this matters.
Students must provide the code and data necessary to produce the results in the paper. Replication policies similar to this are already present in most top journals, e.g. AJPS, and it is likely to be the standard everywhere soon. Provide instructions and documentation so that I can run the code and understand which code relates to various parts of your analyses.

Recommendations

Papers that ask interesting, novel, or controversial questions are better – potentially much better – than papers that do not, all else equal.
Papers that explain their empirical findings in ways non-specialists can understand are better than papers that do not, all else equal.
Model specifications informed by test statistics, substantive knowledge and theory are better than model specifications based solely on test statistics, all else equal.
Number pages, tables, figures, and sections of your paper for easy reference. Embed all figures and tables in the text, facing the same direction as the text, just as you would see in a book or journal.
Tables of regression results should be nicely formatted and selective. Do not just cut-and-paste from your statistical package. Do not star your estimates or provide redundant measures of uncertainty (standard errors and t-statistics); instead, provide substantively informative measures of uncertainty such as confidence intervals or standard errors.
Variable names should be readable, memorable, and clearly denote what the variable is: use Female rather than Gender, and Conservatism instead of Ideology.
Provide the reader with descriptive statistics of the data. Often, a correlation matrix helps too. The reader may be unfamiliar with your data, or at the very least, knows less about it than you. You are providing the reader context with which to understand your results. In so doing, you are also arming reader to pick apart your findings. That is a good thing.
Except when precision of presentation is paramount, use graphics rather than tables to present results. Graphics are easier to read, can convey more information, and are far more memorable than tables.
Scholars carefully craft prose, but often paste in graphics without a thought to making them elegant, clear, or effective. Graphics are as much a part of the paper as the words, and deserve as much attention—if not more—in design.
Start the paper immediately. Ideally, you should start the class with a research topic in mind.
If you have not done original quantitative research before, you will be surprised how long it takes to get original data. If you are doing a replication, you will be surprised how hard it can be to get a replication dataset.

These rules and recommendations are taken nearly verbatim from Christopher Adolph, “Writing Empirical Papers: 6 Rules & 12 Recommendations”, v 2.0, September 24, 2013. http://faculty.washington.edu/cadolph/503/papers.pdf. I added the requirement to include data and code. I removed the recommendation for LaTeX, which although I have a strong preference for, we did not introduce in this course.↩

Updated Problem Set 4

Jeffrey Arnold

2015-05-20 14:50

I updated problem set 4 with the following:

added code for calculating a logit transformation
added instructions for installing simcf
added example code using simcf functions to do simulations relevant to the problem set

Also, I’ve updated, corrected and expanded some the outliers and missing data imputation examples that I went through in class.

Next week, I’ll be talking about limited dependent variables, and starting on causal inference. For this week, I had assigned readings by Schrodt and Achen. It is unlikely that we will have time to talk about them in class, but I suggest that you read them, especially Schrodt’s 7 Deadly Sins piece. It is a modern classic.

Updated Lecture Slides and Example from This Week

Jeffrey Arnold

2015-05-14 15:02

Based off of class and some of the questions, I revised and updated the lecture slides and in-class regression example from this week.

Solutions to HW 3 Posted

Jeffrey Arnold

2015-05-14 14:57

Solutions to homework 3 are posted to canvas: .html, .Rmd.

Some Other Advice on Model Choice and Specification

Jeffrey Arnold

2015-05-13 15:22

I uploaded the chapter “Specification” from Peter Kennedy, A Guide to Econometrics, to canvas. It has some good and honest advice about choosing specification of models; the text itself is a little dated, but the general advice and intuition is ageless. I suggest you read through it. This is a text that I used back when I first learned linear models, and I’ve only grown to appreciate it more as time has gone it.

Reminder for Problem Set 3

Jeffrey Arnold

2015-05-11 18:34

Before you turn it in, some reminders regarding Problem Set 3:

You need to answer the problems with natural language — i.e. what humans use — in addition to code. Explain what your answer is and how the code you ran supports your conclusion.
Format your problem set so it is clear and easy to read. Clearly separate the problems, and parts of problems. If you follow the template, this should not be much of an issue.
Show code, but do not show any more than is necessary. Remove extraneous messages, and print no more output than is necessary to support your claims. Anything else only creates confusion confusion.

Update 2 to Problem Set 3

Jeffrey Arnold

2015-05-10 19:31

A few notes on Problem Set 3:

On Problem 3 (Type I and II errors), it asks you to use n = 1024, but later says “using a smaller sample size, n = 32, and a larger sample size n = 1024.”. Use sample sizes of n = 32, 128, and 1024. The purpose point is the look at how p-values and errors change with respect to sample size, and I wanted to give you a few values at which to analyze it. But, the precise values of n are not important, as long as you use enough values of n to be able to see the pattern.

This does not change the problem, but I also added some plots to visualize how the values of γ₁ affect the heteroskedasticity in the regression.