Janitor

June 6, 2017

From the description file:

Janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced users can already do everything covered here, but they can do it faster with janitor and save their thinking for more fun tasks.

The janitor functions expedite the initial data exploration and cleaning that comes with any new data set. This catalog describes the usage for each function.

You should be able to do everything inside janitor on your own, but we don’t have the time to always clean up data without help.

Benefits to using Janitor over writing your own code:

  • Functions are well tested
  • Data obeys Hadley’s official style guide
  • Generally turn many lines of code into one or two (huzzah!)
  • Pipe-able
  • Written for the education data space

Two main functions I use all the time:

  • clean_names()
  • get_dupes()

Other really usual functions:

  • remove_empty_rows()
  • remove_empty_cols()
  • excel_numeric_to_date()

Example

filepath <- "S:/Data Analytics/State Test Analysis/2016-2017/Uncommon Roster Prep/~Data/Source/Uncommon Roster 2016-17.xlsx"
read_excel(filepath, sheet="Sheet1", col_types = "text") %>%
  clean_names() %>%
  remove_empty_cols() %>%
  remove_empty_rows() %>%
  mutate_at(vars(entrydate, exitdate, student_id, yearsinuncommon), as.numeric) %>%
  mutate_at(vars(entrydate, exitdate), excel_numeric_to_date) %>%
  head()
## # A tibble: 6 x 16
##      network school student_id    last_name first_name     grade gender
##        <chr>  <chr>      <dbl>        <chr>      <chr>     <chr>  <chr>
## 1 Collegiate    BEC  220405468       Abassy     Ernest 7th Grade      M
## 2 Collegiate    BEC  208846345 Abdus-Salaam     Saleem 8th Grade      M
## 3 Collegiate    BEC  219633948        Actie     Samach 7th Grade      M
## 4 Collegiate    BEC  242674893         Aguy    Kedrick 5th Grade      M
## 5 Collegiate    BEC  226778173       Alcide       Chaz 8th Grade      F
## 6 Collegiate    BEC  220835102 Alcindor Jr.      Erwin 7th Grade      M
## # ... with 9 more variables: ethnicity <chr>, lunch_status <chr>,
## #   iep_status <chr>, ell_flag <chr>, entrydate <date>, exitdate <date>,
## #   exit_explanation <chr>, yearsinuncommon <dbl>, student_count <chr>

Even more functions

  • tabyl()
  • adorn_totals("row")
  • crosstab()
  • adorn_crosstab()

Activity: Find the user guide for Janitor.