Unlike most other statistical software packages, R doesn't have a native data file format. You can certainly import and export data in any number of formats, but there's no native "R data file format". The closest equivalent is the saveRDS
/loadRDS
function pair, which allows you to serialize an R object to a file and then load it back into a later R session. But these files don't hew to a standardized format (it's essentially a dump of R in-memory representation of the object), and so you can't read the data with any software other than R.
The goal of the feather project, a collaboration of Wes McKinney and Hadley Wickham, is to create a standard data file format that can be used for data exchange by and between R, Python, and any other software that implements its open-source format. Data are stored in a computer-native binary format, which makes the files small (a 10-digit integer takes just 4 bytes, instead of the 10 ASCII characters required by a CSV file), and fast to read and write (no need to convert numbers to text and back again). Another reason why feather is fast is that it's a column-oriented file format, which matches R's internal representation of data. (In fact, feather is based on the Apache Arrow framework for working with columnar data stores.) When reading or writing traditional data files with R, it must spend signfican time translating the data from column format to row format and back again; with feather the entire second step in the process below is eliminated.
For users of R 3.3.0 and later, the feather package is now available on CRAN. (Users of older versions of R can install feather from GitHub.) With feather installed, you can read and write R data frames to feather files using simple functions:
write_feather(mtcars. "mtcars.feather")
mtcars2 <- read_feather("mtcars.feather")
Better yet, the mtcars.feather file can easily be read into Python, using its feather-format package. This example uses the small built-in mtcars data frame, but you should see a significant performance impact when working with larger data. Eduardo Ariño de la Rubia performed some benchmarking of feather, and found it to be significantly faster for ingesting data than other popular R functions. The chart below compares using feather, the data.table package, and loadRDS
to import 508Mb file of 8.5 million rows and 7 columns:
Feather wasn't the fastest function benchmarked for writing data — data.table's fwrite
function generally performed a bit better — but given that you typically read a file more often than writing it, the speedups should be very noticable in day-to-day data science activites.
For more on the feather package, check out its announcement from the RStudio blog linked below.
RStudio blog: Feather: A Fast On-Disk Format for Data Frames for R and Python, powered by Apache Arrow
I tried to install the package on my 3.2.2 R version but there is a message that:
" Package which is only available in source form, and may need compilation of
C/C++/Fortran: ‘feather’
These will not be installed"
I tried downloading the tar.gz package from CRAN but also there was a problem, and exit with status 1...
Walter
Posted by: Walter Humberto Subiza Pina | May 24, 2016 at 04:43
New R users may be confused by the statement that "R doesn't have a native data file format." The key word here is data, not native. The RData (RDA) and RDS formats are native file formats in R. But they are not data file formats because they store R object serializations, not generic data structures.
Posted by: Ian Cook | May 24, 2016 at 06:22
@Ian Good point, that's exactly what I was trying to convey. Thanks for clarifying.
Posted by: David Smith | May 24, 2016 at 08:43
@Walter, feather is tricky to install from source, and it's only available in binary form for R 3.3.0 and later (I held this post until it was available on CRAN for that reason). Unless you're experienced with C++11 toolchains, I recommend you use feather with R 3.3.0 and install from CRAN.
In fact, it's such a recent package that it may not even be compatible with R 3.2.2. (Anyone tried it?)
Posted by: David Smith | May 24, 2016 at 08:46
@David, with R 3.2.3 I was given a "compilation fail" error when I tried installing feather using both library() and devtools::install_github(). Installed R 3.3.0 and it successfully installed from CRAN.
Posted by: Luke Smith | May 24, 2016 at 11:08