Writing Functions

Erin Grand

7/7/2017

Writing Functions

What is a function?

What would your definition be?

Why and when should you write your own functions?

  • Do not repeat yourself: Never copy paste more than twice if you can, instead write out your process in a function.
  • Another advantage of functions is that if our requirements change, we only need to make the change in one place.

There are three key steps to creating a new function:

  1. You need to pick a name for the function. It is recommended to use a verd as a name. I.e read_ny_data, even though it’s a bit long - it’s readable.

  2. You list the inputs, or arguments, to the function inside function. A function with one argument may look like function(x) whereas one with multiple looks like function(x, y, z).

  3. You place the code you have developed in body of the function, a { block that immediately follows function(…). We’ll see an example in a second.

Simple Function

add <- function(x) {
  sum(x)
}
best_pets <- function(animal){
  if(animal == "Cats") {
      print("Cats are better than dogs.")
    }
  else if(animal == "Dogs") {
      print("Dogs are better than cats")
    }
  else{
    print("You must choose: Cats or Dogs")
  }
}

Structure

  • function(args): let R know that are you writing a function and not to directly implement this section of code till you call it
  • {} the code of your function should be between {’s
  • foo <- function(args){}: name your function with the assignment operator
  • Reminder: name your function in the VERB sense of what you’re function is accomplishing
  • i.e: clean_data(), read_data(), input_cohort(), …etc

Real world example

read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Why is this so useful?

Discuss!

Assignments within Functions

  • The body of the function is normal R code… however it is not executed outside the function
  • i.e x <- 1 in a a function does not change the value of x outside the function
  • The last thing created in a function is automatically returned, so you do not have to use return(x)
read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Arugments

  • Name the arguments something that makes sense to use throughout the function
  • Include as many arguments as you need
  • If you are passing the data frame into the function, it should be the first argument
  • Example: Here our arguments are the file location and a sheet number or name.
read_file <- function(file, sheet_name=1){
  read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
    remove_empty_cols() %>%
    remove_empty_rows() %>%
    clean_names()
}

read_file("~/data.xlsx")

Discussion Questions

  • Why did I set sheet_name to be sheet_name=1? What does this allow for in my function?
  • What can’t I do with functions? Or at least, what’s super hard and annoying? (Hint. Think dplyr.)
  • Where should you house common functions you may need all the time?

Using Custom Functions in DPLYR

Example student roster

Example with mutate

include_cohort <- function(grade, school_year) {
  years_to_grad <- 12 - as.numeric(grade) # add the as.numeric() due to possibility of leading 0s
  year <- stringr::str_sub(school_year, -2, -1) %>% paste0("20", .) %>% as.numeric()
  cohort <- year + years_to_grad
}

students %>%
  mutate(cohort = include_cohort(grade, school_year)) %>% # run through all the students
  head()
## Warning: package 'bindrcpp' was built under R version 3.3.3
## # A tibble: 6 x 6
##   student_id grade school_year es_grad backfilled cohort
##        <dbl> <dbl>       <chr>   <dbl>      <dbl>  <dbl>
## 1 4641219404     9     2016-17      NA          1   2020
## 2 5677497066     7     2015-16       1          0   2021
## 3 4410073751     4     2016-17      NA          0   2025
## 4 5165383282     6     2016-17       1          0   2023
## 5 7414861057     3     2016-17      NA          1   2026
## 6 8899701876    10     2016-17      NA          0   2019

You try

Create a function that will change all the NAs in es_grad to 0.

  • What are you going to name your function?
  • What arguments do you need to pass in?
  • Are there other columns that may need to be worked on the same way?
  • Extra credit: Use a function to change all the 1’s to YES and the 0’s to NO in both es_grad and backfilled

Example with mutate_at

ignore_nas <- function(x){
  ifelse(is.na(x), 0, x)
}

class_yes_no <- function(x){
  x <- ignore_nas(x)
  ifelse(x == 1, "YES", "NO")
}

students %>% mutate_at(vars(es_grad, backfilled), class_yes_no)
## # A tibble: 12 x 5
##    student_id grade school_year es_grad backfilled
##         <dbl> <dbl>       <chr>   <chr>      <chr>
##  1 4641219404     9     2016-17      NO        YES
##  2 5677497066     7     2015-16     YES         NO
##  3 4410073751     4     2016-17      NO         NO
##  4 5165383282     6     2016-17     YES         NO
##  5 7414861057     3     2016-17      NO        YES
##  6 8899701876    10     2016-17      NO         NO
##  7 5814766659     5     2015-16      NO        YES
##  8 4523939000     4     2015-16      NO        YES
##  9 2852910440     9     2015-16     YES         NO
## 10 4237200058     6     2016-17     YES         NO
## 11 6288262677     4     2015-16      NO         NO
## 12 1511873670     3     2015-16      NO         NO

Break for Any Questions

Any questions so far?
Any questions so far?

Using Custom Functions with PURRR

purrr::map_x

Map_x functions:

  • map_df: outputs a data frame, usually by sticking multiple inputs together THIS IS THE ONE YOU WILL USE
  • map_chr: outputs a character
  • map_int: outputs an integer
  • map: outputs a list Do not use this without Erin’s help

Three ways to call map:

One Line:

Input 1: list or column vector of inputs Input 2: function name Input 3+: other arguments for the the function

map_df(filepaths, read_excel, col_names=T)

Custom function with one input

Input 1: list or column vector of inputs Input 2: function name

combine_new_data <- function(file){
  read_excel(file)
}

map_df(filepaths, combine_ny_data)

Write the function in map

Input 1: list or column vector of inputs Input 2: code, beginning with ~ and include .x for the inputs

Note: strspilt is pretty annoying, but can be really useful for strings

# example: you have text such as UCHS_ELA09 and you want just UCHS
schools <- c("UCHS_ELA09", "UCC_ALG02")
map_chr(schools, ~strsplit(.x, "_")[[1]][1])
## [1] "UCHS" "UCC"

Comments

-< ALWAYS try your function on ONE thing first before using map. -<

You Try

Task: Read in many files of the same format to one data frame.

  • You’re going to be using the NY prior year scores: Non-Uncommon Prior Year Results\~Data\Source\NYC
  • Get the list of files into R (hint: ?list.files)
  • Write a function that will read in a file and clean it.
    • only include grade 3-8 students
    • only include season == “SP16” where ss (scaled score) is given
  • Use map_df to read in ALL the files into one data frame
  • Extra credit: There are lots of duplicates! Try and get rid of them.