Data IO

Understanding your Data

Before you try reading data into R, make sure you understand it first.

Do you have headers?
What is the separator?
What type of data do you have? Numeric? Integers? Dates? Ranges?
- readr usually does a good job of auto-detecting column formats but you still have to check!
Are there lines you need to skip?

Depending on the answers to some of these questions you may need to change the default behavior of the readr functions to help (i.e. different column types, separtors that are not a “,”, NA value that is not blank…etc)

In

Format	Functions
CSV	read.csv , `readr::read_csv`
Excel	`readxl::read_excel`
Database	`RODBC::sqlQuery` , `DBI::dbGetQuery`
SPSS / SAS / Stata	`haven::read_[prog]`

Out

Format	Functions
CSV	`write_csv`
Excel	`openxlsx::write.xlsx`
Database	`RODBC::sqlSave , DBI::dbWriteTable`
SPSS / SAS / Stata	`foreign::write.foreign`

Examples

All of the following examples use packages from the Tidyverse. I prefer readr and readxl to their basic R counterparts.

Read Data In:

read_csv() #readr

Write Data out:

write_csv() #readr

Excel Files:

This is a helper function I’ve written to make sure that the column names of my data frame are usable in the future code (i.e. so that column names don’t have spaces or punctuation.)

#readxl
read_excel(file, sheet = sheetname, na = "", col_names = TRUE, col_types="text") %>%
  clean_names()

Other file types

As well standard formats, there’s a lot of connector packages out there, including a suite for Hadoop.