Understanding your Data

Before you try reading data into R, make sure you understand it first.

  • Do you have headers?
  • What is the separator?
  • What type of data do you have? Numeric? Integers? Dates? Ranges?
    • readr usually does a good job of auto-detecting column formats but you still have to check!
  • Are there lines you need to skip?

Depending on the answers to some of these questions you may need to change the default behavior of the readr functions to help (i.e. different column types, separtors that are not a “,”, NA value that is not blank…etc)

In

Format Functions
CSV read.csv , readr::read_csv
Excel readxl::read_excel
Database RODBC::sqlQuery , DBI::dbGetQuery
SPSS / SAS / Stata haven::read_[prog]

Out

Format Functions
CSV write_csv
Excel openxlsx::write.xlsx
Database RODBC::sqlSave , DBI::dbWriteTable
SPSS / SAS / Stata foreign::write.foreign

Examples

All of the following examples use packages from the Tidyverse. I prefer readr and readxl to their basic R counterparts.

Read Data In:

read_csv() #readr

Write Data out:

write_csv() #readr

Excel Files:

This is a helper function I’ve written to make sure that the column names of my data frame are usable in the future code (i.e. so that column names don’t have spaces or punctuation.)

#readxl
read_excel(file, sheet = sheetname, na = "", col_names = TRUE, col_types="text") %>%
  clean_names()

Other file types

As well standard formats, there’s a lot of connector packages out there, including a suite for Hadoop.