Hadley Wickham, co-author (with Garrett Grolemund) of R for Data Science and RStudio's Chief Scientist, has focused much of his R package development on the un-sexy but critically important part of the data science process: data management. In the Tidy Tools Manifesto, he proposes four basic principles for any computer interface for handling data:
Reuse existing data structures.
Compose simple functions with the pipe.
Embrace functional programming.
Design for humans.
Those principles are realized in a new collection of his R packages: the tidyverse. Now, with a simple call to library(tidyverse) (after installing the package from CRAN), you can load a suite of tools to make managing data easier into your R session:
The tidyverse also loads purrr, for functional programming with data, and ggplot2, for data visualization using the grammar of graphics.
Installing the tidyverse package also installs for you (but doesn't automatically load) a raft of other packages to help you work with dates/time, strings, factors (with the new forcats package), and statistical models. It also provides various packages for connecting to remote data sources and data file formats.
Simply put, tidyverse puts a complete suite of modern data-handling tools into your R session, and provides an essential toolbox for any data scientist using R. (Also, it's a lot easier to simply add library(tidyverse) to the top of your script rather than the dozen or so library(...) calls previously required!) Hadley regularly updates these packages, and you can easily update them in your R installation using the provided tidyverse_update() function.
For more on tidyverse, check out Hadley's post on the RStudio blog, linked below.
RStudio Blog: tidyverse 1.0.0
Take a satellite image, and extract the pixels into a uniform 3-D color space. Then run a clustering algorithm on those pixels, to extract a number of clusters. The centroids of those clusters them make a representative palette of the image. Here's the palette of Chicago:
The R package earthtones by Will Cornwell, Mitch Lyons, and Nick Murray — now available on CRAN — does all this for you. Pass the get_earthtones function a latitude and longitude, and it will grab the Google Earth tile at the requested zoom level (8 works well for cities) and generate a palette with the desired number of colors. This Shiny app by Homer Strong uses the earthtones package to make the process even easier: it grabs your current location for the first palette, or you can pass in an address and it geolocates it for another. That's what I used to create the image above. (Another Shiny app by Andrew Clark shows the size of the clusters as a bar chart, but I prefer the simple palettes.) There are a few more examples below, and you can see more in the earthtones vignette. If you find more interesting palettes, let us know where in the world you found them in the comments.
Will Cornwell (github): earthtones
If you have dense data on a continuous scale, an effective way of representing the data visually is to use a heatmap, where the values are represented by a color on a continuous scale. For example, this chart from a Wall Street Journal interactive feature (and mentioned in Tal Galili's useR!2016 talk) represents the number of measles cases in each US state and year by a colored square:
(Here's how to create that chart in R.) But, note that scale at the bottom of the chart, mapping measles cases to a color on the rainbow. Here, we'll zoom in on it:
The scale you choose for a heat map is very important, and has a major impact on how the viewer will interpret the data presented. This scale has been chosen with care: while most of the scale is red, very few of the data cells are red (because the distribution of measles cases is skewed, thanks in particular to the introduction of a vaccine in 1964). A naively chosen scale would wash out the data.
The actual colors you choose are important too. The physics, technology, and neuroscience behind the interpretation of colors is surprisingly complex, but this talk on the default color schemes used in Python's matplotlib does a great job of explaining:
You can easily use the viridis color scales in R as well, thanks to the viridis package by Simon Garnier, which is available on CRAN. The package provides for heatmap color schemes, all carefully chosen for optimized perception and usefulness for color-impaired viewers.
You can find several examples of using the viridis color pallettes in the package vignette, both for base R graphics (including raster) and ggplot2. To get started, just install.packages("viridis") to install the package from CRAN.
Github (Simon Garnier): viridis
You download the data and complete your analysis with ample time to spare. Then, just before deadline, your collaborator lets you know that they've "fixed a data error". Now, you have to do your analysis all over again. This is the reproducibility horror story:
R provides many tools to make reproducibility easy, and the creators of the above video, Ecoinformática - AEET, provide a useful list of tutorials and guides. Chief amongst these is using the knitr package for R: the R language automates the process of importing, preparing and analyzing the data, while knitr automates the process of assembling text, code, tables and charts into a Word, PDF, HTML and many other document formats.
But while knitr solves a good chunk[*] of the reproducibility problem, there's one complicating factor it doesn't deal with: updated R pacakges. In the same way that a collaborator updating the data triggers a restart, someone updating an R package your script uses can also affect your results. (That someone was likely you, working on a different R project.) The checkpoint package for R solves that problem by letting you "lock in" the package versions you use with a project. It's easy to use: all you need to do is add a line like checkpoint("2016-08-31") to the beginning of your script, which:
It does some clever things to avoid re-downloading packages if it doesn't need to, and avoiding duplicates of multiple copies of the same package version, but that's the basic gist. Checkpoint also makes it really easy to share code with others, because you can be confident they'll also get the packages they need to make your script work. You can learn more about the checkpoint package here and in this vignette, and just install it from CRAN to get started. (If you use Microsoft R Open you don't even need to download it, it's already included.)
[*] pun intended
R has some good tools for importing data from spreadsheets, among them the readxl package for Excel and the googlesheets package for Google Sheets. But these only work well when the data in the spreadsheet are arranged as a rectangular table, and not overly encumbered with formatting or generated with formulas. As Jenny Bryan pointed out in her recent talk at the useR!2016 conference (and embedded below, or download PDF slides here), in practice few spreadsheets have "a clean little rectangle of data in the upper-left corner", because most people use spreadsheets not just a file format for data retrieval, but also as a reporting/visualization/analysis tool.
Nonetheless, for a practicing data scientist, there's a lot of useful data locked up in these messy spreadsheets that needs to be imported into R before we can begin analysis. As just one example given by Jenny in her talk, this spreadsheet was included as one of 15,000 spreadsheet attachments (one with 175 tabs!) in the Enron Corpus.
To make it easier to import data into R from messy spreadsheets like this, Jenny and co-author Richard G. FitzJohn created the jailbreakr package. The package is in its early stages, but it can already import Excel (xlsx format) and Google Sheets intro R as a new "linen" objects from which small sub-tables can easily be extracted as data frames. It can also print spreadsheets in a condensed text-based format with one character per cell — useful if you're trying to figure out why an apparently simple spreadsheet isn't importing as you expect. (Check out the "weekend getaway winner" story near the end of Jenny's talk for a great example.)
The jailbreakr package isn't yet on CRAN, but if you want to try it out you can download it from the Github repository (or even contribute!) at the link below.
Github (rsheets): jailbreakr
by Joseph Rickert
Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data.
bigQueryR: Provides an interface to Google's BigQuery. The vignette shows how to use it.
blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to access inflation and price data, one for accessing Wages and Benefits data, and one for mapping BLS data.
cdlTools: Provides functions to download USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state.
dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as "a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data". The package comes with several vignettes including this overview.
dataRetrieval: Package to retrieve USGS and EPA hydrologic and water quality data, officially supported by USGS. The vignette gives several examples of downloading interesting data sets.
eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations.
There are also vignettes on census and election data, shapefiles and mapping Australia's Electorates.
getHFdata: Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site. There is a vignette to get you started. The following plot showing unemployment data by state comes from the vignette on Census data.
googleAnalyticsR: Provides an interface to the Google Analytics Reporting API. There is a vignette.
googleway: Provides functions to retrieve data from 6 Google Maps APIs. The vignette shows how.
gutenberg: Search and download public domain works in the Project Gutenberg collection. The vignette shows you how to search and download public domain texts.
ie2miscdata: Contains a collection of USGS environmental and water resources data sets. There is a vignette showing how to create plots from the data. (See also: dataRetrieval.)
macleish: Provides functions to data from the Ada & Archibald MacLeish field station in Whately, MA. Thev ignette shows how to obtain weather data.
muckrock: Contains public domain information on requests made by muckrock through the US Freedom of Information Act.
nasadata: Provides an interface to NASA's Earth Imagery and Assets API and Earth Observatory and Natural Event Tracker.
oec: Provides an interface to the Observatory for Economic Complexity.
osi: Provides a connector to the Open Source Initiative API that provides machine --readable data about open source software licenses.
pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center. The vignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.
TCGAretriever: Provides an interface to data sets from the The Cancer Genome Atlas (TCGA) via the Cancer Genomic Data Server web service.
For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services. For a list of interesting data sets out there in the wild see the MRAN Data Sources page.
[Update: added the dataRetrieval package, at the suggestion of Laura DeCicco.]
Editor's note: This is Joe's last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!
Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with external data sources, and so Hong Ooi created the dplyrXdf package to work with Xdf data files. With dplyrXdf you can manipulate data files of virtually unlimited size using R, and even use the pipe operator %>% from the magrittr package.
To use the dplyrXdf package, you will need to use Microsoft R Client (free download for Windows) or Microsoft R Server (on Windows, Linux, Hadoop or HDInsight with Spark). The Xdf files you create can then be used with the big-data functions of the included ScaleR package, enabling you to use R to perform statistical analysis of files hundreds of gigabytes in size.
To help you get started with the dplyrXdf packaghe, Hong has created a new dplyrXdf cheat sheet (pdf). This handy and printable 2-page document explains how dplyrXdf:
It also includes some extended examples of working with big data with dplyrXdf and analyzing them with the ScaleR package. To download the cheat-sheet, click on the link below.
Microsoft Advanced Analytics: dplyrXdf cheat sheet
by Joseph Rickert
My guess is that a good many statistics students first encounter the bivariate Normal distribution as one or two hastily covered pages in an introductory text book, and then don't think much about it again until someone asks them to generate two random variables with a given correlation structure. Fortunately for R users, a little searching on the internet will turn up several nice tutorials with R code explaining various aspects of the bivariate Normal. For this post, I have gathered together a few examples and tweaked the code a little to make comparisons easier.
Here are five different ways to simulate random samples bivariate Normal distribution with a given mean and covariance matrix.
To set up for the simulations this first block of code defines N, the number of random samples to simulate, the means of the random variables, and and the covariance matrix. It also provides a small function for drawing confidence ellipses on the simulated data.
library(mixtools) #for ellipse
N <- 200 # Number of random samples
set.seed(123)
# Target parameters for univariate normal distributions
rho <- -0.6
mu1 <- 1; s1 <- 2
mu2 <- 1; s2 <- 8
# Parameters for bivariate normal distribution
mu <- c(mu1,mu2) # Mean
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2),
2) # Covariance matrix
# Function to draw ellipse for bivariate normal data
ellipse_bvn <- function(bvn, alpha){
Xbar <- apply(bvn,2,mean)
S <- cov(bvn)
ellipse(Xbar, S, alpha = alpha, col="red")
}
The first method, the way to go if you just want to get on with it, is to use the mvrnorm() function from the MASS package.
library(MASS)
bvn1 <- mvrnorm(N, mu = mu, Sigma = sigma ) # from MASS package
colnames(bvn1) <- c("bvn1_X1","bvn1_X2")
It takes so little code to do the simulation it might be possible to tweet in a homework assignment.
A look at the source code for mvrnorm() shows that it uses eignevectors to generate the random samples. The documentation for the function states that this method was selected because it is stabler than the alternative of using a Cholesky decomposition which might be faster.
For the second method, let's go ahead and directly generate generate bivariate Normal random variates with the Cholesky decomposition. Remember that the Cholesky decomposition of sigma (a positive definite matrix) yields a matrix M such that M times its transpose gives sigma back again. Multiplying M by a matrix of standard random Normal variates and adding the desired mean gives a matrix of the desired random samples. A lecture from Colin Rundel covers some of the theory.
M <- t(chol(sigma))
# M %*% t(M)
Z <- matrix(rnorm(2*N),2,N) # 2 rows, N/2 columns
bvn2 <- t(M %*% Z) + matrix(rep(mu,N), byrow=TRUE,ncol=2)
colnames(bvn2) <- c("bvn2_X1","bvn2_X2")
For the third method we make use of a special property of the bivariate normal that is discussed in almost all of those elementary textbooks. If X_{1} and X_{2} are two jointly distributed random variables, then the conditional distribution of X_{2} given X_{1} is itself normal with: mean = m_{2} + r(s_{2}/s_{1})(X_{1} - m_{1}) and variance = (1 - r^{2})s^{2}X_{2}.
Hence, a sample from a bivariate Normal distribution can be simulated by first simulating a point from the marginal distribution of one of the random variables and then simulating from the second random variable conditioned on the first. A brief proof of the underlying theorem is available here.
rbvn<-function (n, m1, s1, m2, s2, rho)
{
X1 <- rnorm(n, mu1, s1)
X2 <- rnorm(n, mu2 + (s2/s1) * rho *
(X1 - mu1), sqrt((1 - rho^2)*s2^2))
cbind(X1, X2)
}
bvn3 <- rbvn(N,mu1,s1,mu2,s2,rho)
colnames(bvn3) <- c("bvn3_X1","bvn3_X2")
The fourth method, my favorite, comes from Professor Darren Wiliinson's Gibbs Sampler tutorial. This is a very nice idea; using the familiar bivariate Normal distribution to illustrate the basics of the Gibbs Sampling Algorithm. Note that this looks very much like the previous method, except that now we are alternately sampling from the full conditional distributions.
gibbs<-function (n, mu1, s1, mu2, s2, rho)
{
mat <- matrix(ncol = 2, nrow = n)
x <- 0
y <- 0
mat[1, ] <- c(x, y)
for (i in 2:n) {
x <- rnorm(1, mu1 +
(s1/s2) * rho * (y - mu2), sqrt((1 - rho^2)*s1^2))
y <- rnorm(1, mu2 +
(s2/s1) * rho * (x - mu1), sqrt((1 - rho^2)*s2^2))
mat[i, ] <- c(x, y)
}
mat
}
bvn4 <- gibbs(N,mu1,s1,mu2,s2,rho)
colnames(bvn4) <- c("bvn4_X1","bvn4_X2")
The fifth and final way uses the rmvnorm() function form the mvtnorm package with the singular value decomposition method selected. The functions in this package are overkill for what we are doing here, but mvtnorm is probably the package you would want to use if you are calculating probabilities from high dimensional multivariate distributions. It implements numerical methods for carefully calculating the high dimensional integrals involved that are based on some papers by Professor Alan Genz dating from the early '90s. These methods are briefly explained in the package vignette.
library (mvtnorm)
bvn5 <- mvtnorm::rmvnorm(N,mu,sigma, method="svd")
colnames(bvn5) <- c("bvn5_X1","bvn5_X2")
Note that I have used the :: operator here to make sure that R uses the rmvnorm() function from the mvtnorm package. There is also a rmvnorm() function in the mixtools package that I used to get the ellipse function. Loading the packages in the wrong order could lead to the rookie mistake of having the function you want inadvertently overwritten.
Next, we plot the results of drawing just 100 random samples for each method. This allows us to see how the algorithms spread data over the sample space as they are just getting started.
bvn <- list(bvn1,bvn2,bvn3,bvn4,bvn5)
par(mfrow=c(3,2))
plot(bvn1, xlab="X1",ylab="X2",main= "All Samples")
for(i in 2:5){
points(bvn[[i]],col=i)
}
for(i in 1:5){
item <- paste("bvn",i,sep="")
plot(bvn[[i]],xlab="X1",ylab="X2",main=item, col=i)
ellipse_bvn(bvn[[i]],.5)
ellipse_bvn(bvn[[i]],.05)
}
par(mfrow=c(1,1))
The first plot shows all 500 random samples color coded by the method with which they were generated. The remaining plots show the samples generated by each method. In each of these plots the ellipses mark the 0.5 and 0.95 probability regions, i.e. the area within the ellipses should contain 50% and 95% of the points respectively. Note that bvn4 which uses the Gibbs sampling algorithm looks like all of the rest. In most use cases for the Gibbs it takes the algorithm some time to converge to the target distribution. In our case, we start out with a pretty good guess.
Finally, a word about accuracy: nice coverage of the sample space is not sufficient to produce accurate results. A little experimentation will show that, for all of the methods outlined above, regularly achieving a sample covariance matrix that is close to the target, sigma, requires something on the order of 10,000 samples as is Illustrated below.
> sigma
[,1] [,2]
[1,] 4.0 -9.6
[2,] -9.6 64.0
for(i in 1:5){
print(round(cov(bvn[[i]]),1))
}
bvn1_X1 bvn1_X2
bvn1_X1 4.0 -9.5
bvn1_X2 -9.5 63.8
bvn2_X1 bvn2_X2
bvn2_X1 3.9 -9.5
bvn2_X2 -9.5 64.5
bvn3_X1 bvn3_X2
bvn3_X1 4.1 -9.8
bvn3_X2 -9.8 63.7
bvn4_X1 bvn4_X2
bvn4_X1 4.0 -9.7
bvn4_X2 -9.7 64.6
bvn5_X1 bvn5_X2
bvn5_X1 4.0 -9.6
bvn5_X2 -9.6 65.3
Many people coming to R for the first time find it disconcerting to realize that there are several ways to do some fundamental calculation in R. My take is that rather than being a point of frustration, having multiple options indicates that richness of the R language. A close look at the package documentation will often show that yet another method to do something is a response to some subtle need that was not previously addressed. Enjoy the diversity!
by Joseph Rickert
My impression is that the JSM has become ever more R friendly over recent years, but with two sessions organized around R tools and several talks featuring R packages, this year may turn out to be the beginning of a new era where conference organizers see value in putting R on the agenda and prospective speakers perceive it to be advantageous to mention R, an R package or a Shiny App in their abstract.
As should be expected, the vast majority of the presentations will focus on statistics or the application of statistical methods, and not on the underlying computational platform. Nevertheless, based on past experience I would be very surprised if there is not quite a bit more R talk buzzing around the conference.
If you are going to Chicago please stop by the Microsoft booth 232. We would be happy to tell you how we are using R at Microsoft and even more interested in hearing your opinion about what Microsoft should be doing with R. Also look for us at the opening night mixer (Sunday 6 - 8PM in the Expo Hall) and the Student Mixer (Monday 6 - 7:30PM in the Chicago Hilton Hotel)
Here follows my R Users Guide to JSM 2016. I have organized the talks by session number and included information on times and room numbers.
Session 21 Statistical Computing and Graphics Student Awards – Contributed papers
Sun, 7/31/2016, 2:00 PM - 3:50 PM – Room: CCW175b
2:25 PM |
The PICASSO Package for High Dimensions Nonconvex Sparse Learning in R — Xingguo Li ; Tuo Zhao, The Johns Hopkins University ; Tong Zhang, Rutgers University ; Han Liu, Princeton |
3:05 PM |
Using the Geomnet Package: Visualizing African Slave Trade, 1514--1866 — Samantha Tyner, Iowa State University |
3:25 PM |
Xgboost: An R Package for Fast and Accurate Gradient Boosting — Tong He, Simon Fraser University |
Session: 47 Making the Most of R Tools - Invited papers
Sun, 7/31/2016, 4:00 PM - 5:50 PM – Room: CC-W183b
4:05 PM |
Thinking with Data Using R and RStudio: Powerful Idioms for Analysts — Nicholas Jon Horton, Amherst College; Randall Pruim, Calvin College ; Daniel Kaplan, Macalester College |
4:35 PM |
Transform Your Workflow and Deliverables with Shiny and R Markdown — Garrett Grolemund, RStudio |
Session 127: R Tools for Statistical Computing – Contributed papers
Mon, 8/1/2016, 8:30 AM - 10:20 AM – Room: CC-W196c
8:35 AM |
The Biglasso Package: Extending Lasso Model Fitting to Big Data in R — Yaohui Zeng, University of Iowa ; Patrick Breheny, University of Iowa |
8:50 AM |
Independent Sampling for a Spatial Model with Incomplete Data — Harsimran Somal, University of Iowa ; Mary Kathryn Cowles, University of Iowa |
9:05 AM |
Introduction to the TextmineR Package for R — Thomas Jones, Impact Research |
9:20 AM |
Vector-Generalized Time Series Models — Victor Miranda Soberanis, University of Auckland ; Thomas Yee, University of Auckland |
9:35 AM |
New Computational Approaches to Large/Complex Mixed Effects Models — Norman Matloff, University of California at Davis |
9:50 AM |
Broom: An R Package for Converting Statistical Modeling Objects Into Tidy Data Frames — David G. Robinson, Stack Overflow |
10:05 AM |
Exact Parametric and Nonparametric Likelihood-Ratio Tests for Two-Sample Comparisons — Yang Zhao, SUNY Buffalo ; Albert Vexler, SUNY Buffalo ; Alan Hutson, SUNY Buffalo ; Xiwei Chen, SUNY Buffalo |
Session 247: Better Communication with Statistical Graphics – Contributed papers
Mon, 8/1/2016, 2:00 PM - 3:50 PM – Room CC W184bc
2:35 PM |
The Linked Microposter Plot as a New Means for the Visualization of Eye-Tracking Data — Chunyang Li, Utah State University ; Juergen Symanzik, Utah State University |
2:50 PM |
Optimizing Diffusion Cartograms for Areal Data Using a New Evaluation Method — Xiaoyue Cheng, University of Nebraska - Omaha |
3:35 PM |
Interactive Graphics for Functional Data Analyses — Julia Wrobel, Columbia University ; Jeff Goldsmith, Columbia Mailman School of Public Health |
Session 349: Applications of Regression Trees on Sample Data – Contributed papers
Tue, 8/2/2016, 10:30 AM - 12:20 PM – Room W184a
10:55 AM |
Modeling Survey Data with Regression Trees — Daniell Toth, Bureau of Labor Statistics |
Session 530: Applications in Drug Development – Contributed papers
Wed, 8/3/2016, 10:30 AM - 12:20 PM – Room: CC W187a
11:15 AM |
Facilitating Clinical Trial Simulation in Alzheimer's Disease Using the CAMD IPD, Literature Summary Level Data, and the 'adsim' R Package — Daniel Polhamus, Metrum Research Group |
Session 353: Statistical Learning and Data Science – Contributed speed presentations
Tue, 8/2/2016, 10:30 AM - 12:20 PM – Room: CC-W181a
10:55 AM |
An R Package Enabling Likelihood-Based Inference for Generalized Linear Mixed Models — Christina Knudson |
Session 354: Business, Finance and Economic Statistics – Contributed speed presentations
Tue, 8/2/2016, 10:30 AM - 12:20 PM – Room: CCW181b
12:00 PM |
Optimal Stratification of Univariate Populations via StratifyR Package — Karuna Garan Reddy, University of the South Pacific ; Mohammed G. M. Khan, University of the South Pacific |
Posters
Session 88: The Extraordinary Power of Data – Invited Poster Presentations
Sun, 7/31/2016, 6:00 PM - 8:00 PM – Room CC-Hall F1 West
1: Communicate Better with R, R Markdown, and Shiny — Garrett Grolemund, RStudio
Session 203: Environmental Statistics – Contrbuted Poster Presentations
Mon, 8/1/2016, 11:35 AM - 12:20 PM – Room: CC- Hall F1 West
6: Using the R Caret Package as a Teaching Tool for Topics in Classification and Prediction Methods: A Case Study —Keith Williams, University of Arkansas for Medical Sciences
Session 376: Posters on Statistics in Genomics and Genetics
Tue, 8/2/2016, 10:30 AM - 12:20 PM – Room: CC Hall F1 West
51: BANFF: An R Package for BAyesian Network Feature Finder — Zhou Lan, North Carolina State University ; Yize Zhao, Statistical and Applied Mathematical Sciences Institute ; Jian Kang, University of Michigan ; Tianwei Yu, Emory University
Session 449: Posters Statistical Learning and Data Science
Tue, 8/2/2016, 2:00 PM - 2:45 PM– R CC_Hall F1 West
15: An R Package Enabling Likelihood-Based Inference for Generalized Linear Mixed Models — Christina Knudson |
Session 453: Posters on Business and Economic Statistics
Tue, 8/2/2016, 3:05 PM - 3:50 PM – Room: CC Hall F1 West
27: Optimal Stratification of Univariate Populations via StratifyR Package — Karuna Garan Reddy, University of the South Pacific ; Mohammed G. M. Khan, University of the South Pacific
Session 556: Posters on Statistical Computing
Wed, 8/3/2016, 10:30 AM - 12:20 PM – Room CC-Hall F1 West
31: Lucid: An R Package for Pretty Printing Floating Point Numbers — Kevin Wright, DuPont Pioneer