Before you try reading data into R, make sure you understand it first.
Depending on the answers to some of these questions you may need to change the default behavior of the readr functions to help (i.e. different column types, separtors that are not a “,”, NA value that is not blank…etc)
Format | Functions |
---|---|
CSV | read.csv , readr::read_csv |
Excel | readxl::read_excel |
Database | RODBC::sqlQuery , DBI::dbGetQuery |
SPSS / SAS / Stata | haven::read_[prog] |
Format | Functions |
---|---|
CSV | write_csv |
Excel | openxlsx::write.xlsx |
Database | RODBC::sqlSave , DBI::dbWriteTable |
SPSS / SAS / Stata | foreign::write.foreign |
All of the following examples use packages from the Tidyverse. I prefer readr
and readxl
to their basic R counterparts.
read_csv() #readr
write_csv() #readr
This is a helper function I’ve written to make sure that the column names of my data frame are usable in the future code (i.e. so that column names don’t have spaces or punctuation.)
#readxl
read_excel(file, sheet = sheetname, na = "", col_names = TRUE, col_types="text") %>%
clean_names()
As well standard formats, there’s a lot of connector packages out there, including a suite for Hadoop.