Writing Functions

Erin Grand

7/7/2017

Writing Functions

What is a function?

What would your definition be?

Why and when should you write your own functions?

Do not repeat yourself: Never copy paste more than twice if you can, instead write out your process in a function.
Another advantage of functions is that if our requirements change, we only need to make the change in one place.

There are three key steps to creating a new function:

You need to pick a name for the function. It is recommended to use a verd as a name. I.e read_ny_data, even though it’s a bit long - it’s readable.
You list the inputs, or arguments, to the function inside function. A function with one argument may look like function(x) whereas one with multiple looks like function(x, y, z).
You place the code you have developed in body of the function, a { block that immediately follows function(…). We’ll see an example in a second.

Simple Function

add <- function(x) {
  sum(x)
}

best_pets <- function(animal){
  if(animal == "Cats") {
      print("Cats are better than dogs.")
    }
  else if(animal == "Dogs") {
      print("Dogs are better than cats")
    }
  else{
    print("You must choose: Cats or Dogs")
  }
}

Structure

function(args): let R know that are you writing a function and not to directly implement this section of code till you call it
{} the code of your function should be between {’s
foo <- function(args){}: name your function with the assignment operator
Reminder: name your function in the VERB sense of what you’re function is accomplishing
i.e: clean_data(), read_data(), input_cohort(), …etc

Real world example

read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Why is this so useful?

Discuss!

Assignments within Functions

The body of the function is normal R code… however it is not executed outside the function
i.e x <- 1 in a a function does not change the value of x outside the function
The last thing created in a function is automatically returned, so you do not have to use return(x)

read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Arugments

Name the arguments something that makes sense to use throughout the function
Include as many arguments as you need
If you are passing the data frame into the function, it should be the first argument
Example: Here our arguments are the file location and a sheet number or name.

read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Discussion Questions

Why did I set sheet_name to be sheet_name=1? What does this allow for in my function?
What can’t I do with functions? Or at least, what’s super hard and annoying? (Hint. Think dplyr.)
Where should you house common functions you may need all the time?

Using Custom Functions in DPLYR

Example student roster

Example with `mutate`

include_cohort <- function(grade, school_year) {
  years_to_grad <- 12 - as.numeric(grade) # add the as.numeric() due to possibility of leading 0s
  year <- stringr::str_sub(school_year, -2, -1) %>% paste0("20", .) %>% as.numeric()
  cohort <- year + years_to_grad
}

students %>%
  mutate(cohort = include_cohort(grade, school_year)) %>% # run through all the students
  head()
## Warning: package 'bindrcpp' was built under R version 3.3.3
## # A tibble: 6 x 6
##   student_id grade school_year es_grad backfilled cohort
##        <dbl> <dbl>       <chr>   <dbl>      <dbl>  <dbl>
## 1 4641219404     9     2016-17      NA          1   2020
## 2 5677497066     7     2015-16       1          0   2021
## 3 4410073751     4     2016-17      NA          0   2025
## 4 5165383282     6     2016-17       1          0   2023
## 5 7414861057     3     2016-17      NA          1   2026
## 6 8899701876    10     2016-17      NA          0   2019

You try

Create a function that will change all the NAs in es_grad to 0.

What are you going to name your function?
What arguments do you need to pass in?
Are there other columns that may need to be worked on the same way?
Extra credit: Use a function to change all the 1’s to YES and the 0’s to NO in both es_grad and backfilled

Example with `mutate_at`

ignore_nas <- function(x){
  ifelse(is.na(x), 0, x)
}

class_yes_no <- function(x){
  x <- ignore_nas(x)
  ifelse(x == 1, "YES", "NO")
}

students %>% mutate_at(vars(es_grad, backfilled), class_yes_no)

## # A tibble: 12 x 5
##    student_id grade school_year es_grad backfilled
##         <dbl> <dbl>       <chr>   <chr>      <chr>
##  1 4641219404     9     2016-17      NO        YES
##  2 5677497066     7     2015-16     YES         NO
##  3 4410073751     4     2016-17      NO         NO
##  4 5165383282     6     2016-17     YES         NO
##  5 7414861057     3     2016-17      NO        YES
##  6 8899701876    10     2016-17      NO         NO
##  7 5814766659     5     2015-16      NO        YES
##  8 4523939000     4     2015-16      NO        YES
##  9 2852910440     9     2015-16     YES         NO
## 10 4237200058     6     2016-17     YES         NO
## 11 6288262677     4     2015-16      NO         NO
## 12 1511873670     3     2015-16      NO         NO

Break for Any Questions

Using Custom Functions with PURRR

`purrr::map_x`

Map_x functions:

map_df: outputs a data frame, usually by sticking multiple inputs together THIS IS THE ONE YOU WILL USE
map_chr: outputs a character
map_int: outputs an integer
map: outputs a list Do not use this without Erin’s help

Three ways to call map:

One Line:

Input 1: list or column vector of inputs Input 2: function name Input 3+: other arguments for the the function

map_df(filepaths, read_excel, col_names=T)

Custom function with one input

Input 1: list or column vector of inputs Input 2: function name

combine_new_data <- function(file){
  read_excel(file)
}

map_df(filepaths, combine_ny_data)

Write the function in map

Input 1: list or column vector of inputs Input 2: code, beginning with ~ and include .x for the inputs

Note: strspilt is pretty annoying, but can be really useful for strings

# example: you have text such as UCHS_ELA09 and you want just UCHS
schools <- c("UCHS_ELA09", "UCC_ALG02")
map_chr(schools, ~strsplit(.x, "_")[[1]][1])

## [1] "UCHS" "UCC"

Comments

-< ALWAYS try your function on ONE thing first before using map. -<

You Try

Task: Read in many files of the same format to one data frame.

You’re going to be using the NY prior year scores: Non-Uncommon Prior Year Results\~Data\Source\NYC
Get the list of files into R (hint: ?list.files)
Write a function that will read in a file and clean it.
- only include grade 3-8 students
- only include season == “SP16” where ss (scaled score) is given
Use map_df to read in ALL the files into one data frame
Extra credit: There are lots of duplicates! Try and get rid of them.

Writing Functions

Erin Grand

7/7/2017

Writing Functions

What is a function?

Why and when should you write your own functions?

Simple Function

Structure

Real world example

Why is this so useful?

Assignments within Functions

Arugments

Discussion Questions

Using Custom Functions in DPLYR

Example student roster

Example with mutate

You try

Example with mutate_at

Break for Any Questions

Using Custom Functions with PURRR

purrr::map_x

One Line:

Custom function with one input

Write the function in map

Comments

You Try

Task: Read in many files of the same format to one data frame.

Example with `mutate`

Example with `mutate_at`

`purrr::map_x`