What would your definition be?
There are three key steps to creating a new function:
You need to pick a name for the function. It is recommended to use a verd as a name. I.e read_ny_data
, even though it’s a bit long - it’s readable.
You list the inputs, or arguments, to the function inside function. A function with one argument may look like function(x)
whereas one with multiple looks like function(x, y, z)
.
You place the code you have developed in body of the function, a { block that immediately follows function(…). We’ll see an example in a second.
add <- function(x) {
sum(x)
}
best_pets <- function(animal){
if(animal == "Cats") {
print("Cats are better than dogs.")
}
else if(animal == "Dogs") {
print("Dogs are better than cats")
}
else{
print("You must choose: Cats or Dogs")
}
}
function(args)
: let R know that are you writing a function and not to directly implement this section of code till you call it{}
the code of your function should be between {’sfoo <- function(args){}
: name your function with the assignment operatorclean_data()
, read_data()
, input_cohort()
, …etcread_file <- function(file, sheet_name=1){
read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
remove_empty_cols() %>%
remove_empty_rows() %>%
clean_names()
}
read_file("~/data.xlsx")
Discuss!
x <- 1
in a a function does not change the value of x outside the functionreturn(x)
read_file <- function(file, sheet_name=1){
read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
remove_empty_cols() %>%
remove_empty_rows() %>%
clean_names()
}
read_file("~/data.xlsx")
read_file <- function(file, sheet_name=1){
read_excel(file, sheet = sheet_name, col_names = TRUE, col_types = "text") %>%
remove_empty_cols() %>%
remove_empty_rows() %>%
clean_names()
}
read_file("~/data.xlsx")
mutate
include_cohort <- function(grade, school_year) {
years_to_grad <- 12 - as.numeric(grade) # add the as.numeric() due to possibility of leading 0s
year <- stringr::str_sub(school_year, -2, -1) %>% paste0("20", .) %>% as.numeric()
cohort <- year + years_to_grad
}
students %>%
mutate(cohort = include_cohort(grade, school_year)) %>% # run through all the students
head()
## Warning: package 'bindrcpp' was built under R version 3.3.3
## # A tibble: 6 x 6
## student_id grade school_year es_grad backfilled cohort
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 4641219404 9 2016-17 NA 1 2020
## 2 5677497066 7 2015-16 1 0 2021
## 3 4410073751 4 2016-17 NA 0 2025
## 4 5165383282 6 2016-17 1 0 2023
## 5 7414861057 3 2016-17 NA 1 2026
## 6 8899701876 10 2016-17 NA 0 2019
Create a function that will change all the NAs in es_grad to 0.
mutate_at
ignore_nas <- function(x){
ifelse(is.na(x), 0, x)
}
class_yes_no <- function(x){
x <- ignore_nas(x)
ifelse(x == 1, "YES", "NO")
}
students %>% mutate_at(vars(es_grad, backfilled), class_yes_no)
## # A tibble: 12 x 5
## student_id grade school_year es_grad backfilled
## <dbl> <dbl> <chr> <chr> <chr>
## 1 4641219404 9 2016-17 NO YES
## 2 5677497066 7 2015-16 YES NO
## 3 4410073751 4 2016-17 NO NO
## 4 5165383282 6 2016-17 YES NO
## 5 7414861057 3 2016-17 NO YES
## 6 8899701876 10 2016-17 NO NO
## 7 5814766659 5 2015-16 NO YES
## 8 4523939000 4 2015-16 NO YES
## 9 2852910440 9 2015-16 YES NO
## 10 4237200058 6 2016-17 YES NO
## 11 6288262677 4 2015-16 NO NO
## 12 1511873670 3 2015-16 NO NO
purrr::map_x
Map_x functions:
map_df
: outputs a data frame, usually by sticking multiple inputs together THIS IS THE ONE YOU WILL USEmap_chr
: outputs a charactermap_int
: outputs an integermap
: outputs a list Do not use this without Erin’s helpThree ways to call map:
Input 1: list or column vector of inputs Input 2: function name Input 3+: other arguments for the the function
map_df(filepaths, read_excel, col_names=T)
Input 1: list or column vector of inputs Input 2: function name
combine_new_data <- function(file){
read_excel(file)
}
map_df(filepaths, combine_ny_data)
Input 1: list or column vector of inputs Input 2: code, beginning with ~
and include .x
for the inputs
Note: strspilt is pretty annoying, but can be really useful for strings
# example: you have text such as UCHS_ELA09 and you want just UCHS
schools <- c("UCHS_ELA09", "UCC_ALG02")
map_chr(schools, ~strsplit(.x, "_")[[1]][1])
## [1] "UCHS" "UCC"
Non-Uncommon Prior Year Results\~Data\Source\NYC
?list.files
)ss
(scaled score) is givenmap_df
to read in ALL the files into one data frame
Comments
-< ALWAYS try your function on ONE thing first before using map. -<