If you dont’ have the uwpols501
package yet, install it first from GitHub.
library(devtools)
install_github("UW-POLS501/r-uwpols501")
Make sure you can load all these packages and dataset before starting the module
library(dplyr)
library(uwpols501)
data(turnout)
data(iver)
Why should we use functions? Well, don’t take it from me, take it from Grolemund & Wickham (2016):
Writing a function has three big advantages over using copy-and-paste:
Writing a function step by step:
function()
commandfunction()
command the arguments that the functions takes{ }
(Function names should be verbs, and arguments should be nouns)A function has the following skeleton:
function_name <- function(argument1, argument2, ...) {
# do_something with the arguments
output <- argument1 + argument2
return(output)
}
A short/silly example:
say_hi <- function(name) {
full_statement <- paste0("Hi! My name is ", name)
return(full_statement)
}
full_statement <- say_hi("Andreu")
full_statement
## [1] "Hi! My name is Andreu"
For the final project last quarter, one of your classmates needed to retrieve the first digit of all numbers for several numeric variables. So the person had to write the same code multiple times. Let’s now write a function that would simplify that code.
get_first_digit <- function(variable) {
num_digits <- nchar(variable)
var_first_num <- variable %/% (10 ^ (nchar(variable) - 1))
return(var_first_num)
}
Now apply the function get_first_digit()
that we just created to the variables age
and educate
of the turnout
dataset.
turnout$age_new <- get_first_digit(turnout$age)
turnout$educate_new <- get_first_digit(turnout$educate)
Last quarter some other classmates needed to create a dummy variable indicating if the country or political party for any given row was part of a particular list of country or political party.
Here we first create a list of countries we are interested in, and then we write a function that takes a country
variable as an argument, and returns a dummy variable indicating which observations are in the list of countries of interest.
peripherial_countries <- c("Portugal", "Italy", "Ireland",
"Cyprus", "Greece", "Spain")
create_per_dummy <- function(country_variable) {
country_variable <- as.character(country_variable)
dummy_boolean <- country_variable %in% peripherial_countries
dummy_numeric <- as.numeric(dummy_boolean)
return(dummy_numeric)
}
Now we apply the function to the cty
variable of the iver
dataset from the uwpols501
package
head(iver, n = 10)
## Source: local data frame [10 x 4]
##
## cty elec_sys povred enp
## (fctr) (fctr) (dbl) (dbl)
## 1 Australia maj 42.16 2.38
## 2 Belgium pr 78.79 7.01
## 3 Canada maj 29.90 1.69
## 4 Denmark pr 71.54 5.04
## 5 Finland pr 69.08 5.14
## 6 France maj 57.91 2.68
## 7 Germany maj 46.90 3.16
## 8 Italy pr 42.81 4.11
## 9 Netherlands pr 66.93 3.49
## 10 Norway pr 67.17 3.09
iver$peripherial_cty <- create_per_dummy(iver$cty)
head(iver, n = 10)
## Source: local data frame [10 x 5]
##
## cty elec_sys povred enp peripherial_cty
## (fctr) (fctr) (dbl) (dbl) (dbl)
## 1 Australia maj 42.16 2.38 0
## 2 Belgium pr 78.79 7.01 0
## 3 Canada maj 29.90 1.69 0
## 4 Denmark pr 71.54 5.04 0
## 5 Finland pr 69.08 5.14 0
## 6 France maj 57.91 2.68 0
## 7 Germany maj 46.90 3.16 0
## 8 Italy pr 42.81 4.11 1
## 9 Netherlands pr 66.93 3.49 0
## 10 Norway pr 67.17 3.09 0
Look through some of your previous code from other classes and projects, and write a function to reduce redundancy (10 min.)