POLS/CSSS 503, University of Washington, Spring 2015
Install R, RStudio and devtools. Follow the instructions here.
R is the name of the programming language, and RStudio is a convenient and widely used interface to that language.
Since you will be using it for the remainder of the course, you should familiarize yourself with the RStudio GUI.
It consists of four windows,
>
prompt and R executes them.Bottom right
RStudio documentation can be found at http://www.rstudio.com/ide/docs/. Of those, the most likely to be useful to you are:
Challenge:
Alt+Shift+K
?Although it is so much more, you can use R as a calculator. For example, to add, subtract, multiply or divide:
2 + 3
2 - 3
2 * 3
2 / 3
The power of a number is calculated with ^
, e.g. \(4^2\) is,
4 ^ 2
R includes many functions for standard math functions. For example, the square root function is sqrt
, e.g. \(\sqrt{2}\),
sqrt(2)
And you can combine many of them together
(2 * 4 + 3 ) / 10
sqrt(2 * 2)
In R, you can save the results of calculations into objects that you can use later. This is done using the special symbol, <-
. For example, this saves the results of 2 + 2 to an object named foo
1
foo <- 2 + 2
You can see that foo
is equal to 4
foo
And you can reuse foo in other calculations,
foo + 3
foo / 2 * 8 + foo
You can use =
instead of <-
for assignment. You may see this in some other code. There are some technical reasons to use <-
instead of =
, but the primary reason we will use <-
instead of =
is that this is the convention used in modern R
programs.
Challenge 1. Create a variable named whatever strikes your fancy and set it equal to the square root of 2. Then multiply it by 4. 2. Create a variable with a really long name and assign it a value. Start typing its name 3. Enter the following in the console sdgagasdgjasda
.
Keeping all the files associated with a project organized together – input data, R scripts, analytical results, figures – is such a wise and common practice that RStudio has built-in support for this via its projects. Read this for more information about RStudio projects.
You will use RStudio projects for your labs and homeworks, and final paper. Create a RStudio project that you will use for all your labs.
For this course, you will be we using R Markdown documents for homeworks. Create your firs
Ctrl-S
.Cheatsheets and additional resources about R Markdown are available at http://rmarkdown.rstudio.com/.
For the remainder of this lab you will be using a dataset of GDP per capita and fertility from Gapminder.2
Download the csv (“comma-separated values”)" https://github.com/POLS503/pols_503_sp15/blob/master/labs/gapminder.csv.
Then load the file
gapminder <- read.csv("gapminder.csv", stringsAsFactors = FALSE)
This creates a data frame. A data frame is a type of R object that corresponds to what you usually think of as a dataset or a spreadsheet — rows are observations and columns are variables.
Challenge: What happens when you do the following?
gapminder
This is a lot of information. How can we get a more useful picture of the dataset as a whole?
dim(gapminder)
names(gapminder)
head(gapminder)
summary(gapminder)
dim
shows the dimensions of the data frame as the number of rows, columnsnames
shows the column names of the data frame.head
shows the first few observationssummary
calculates summary statistics for all variables in the data frame.Challenge: Given the information previously:
You can extract single variables (or columns) and perform different operations on them. To extract a variable, we use the dollar sign ($
) extraction operator.
gapminder$lifeExp
Again, perhaps a summary may be more interesting. We can do more specific operations on this variable alone:
mean(gapminder$lifeExp)
median(gapminder$lifeExp)
sd(gapminder$lifeExp)
min(gapminder$lifeExp)
max(gapminder$lifeExp)
quantile(gapminder$lifeExp)
Challenge 1. What are the mean and median of GDP per capita? 2. Find the 30th percentile of GDP per capita? 3. The function length()
calculates the length of a vector. The function unique()
returns the number of unique values in a vector. How many countries in the data are there? How many years?
Summary statistics reduce the information of a distribution to single values. A distribution providess a richer understanding of the data. Look at the distribution of the variable lifeExp
.
You will use the ggplot2 package for graphics for most of this course. In order to use it, you will need to load it using library()
library("ggplot2")
Create a histogram:
ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram()
You could also save the plot to a variable
lifexp_plot <- ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram()
If you just enter the variable name in the console it will “print”" the object, which in this case, simply creates the plot:
lifexp_plot
Challenge Explore another variable of your choosing
Use ggplot to create hisograms for each year
lifexp_plot + facet_wrap( ~ continent)
Challege: Describe how the distribution varies across years (Write it in your Markdown!)
You can also use ggplot2 to create a scatterplot
ggplot(gapminder, aes(y = lifeExp, x = log(gdpPercap))) +
geom_point() +
geom_smooth()
Challenge:
You can save R commands in a file called an R script. To create a new R Script use File -> New File -> R Script. This will create a new tab in the upper left panel which will have a name like “Untitled1”. Save this to a file with the extension “.R” (RStudio will warn you if you do not)
To see how this works, write a few commands in the editor. For example,
2 + 2
3 + 8
mean(c(1, 2, 3))
You can run the current line or highlighted section with Ctl-Enter or the Run button.
Refer to Getting Help with R
gdpPercap
. Is is right skewed, left skewed, or symmetric?Go to stackoverlow and search for questions with tag [r]
.
Find and download the cowsay package. You cannot use install.packages
. What does the cowsay
function do? Run it with something fun (it’ll make make sense once you know what it does).
Some text and the data set used in this are taken from Jenny Bryant, R basics, workspace and working directory, RStudio projects, licensed under CC BY-NC 3.0
Comments
Any R code following a hash (
#
) is not executed. These are called comments, and can and should be used to annotate and explain your code. For example, this doesn’t do anything.And in this, nothing after the
#
is executed,Challenge: What is this equal to?