by John Mount

Data Scientist, Win-Vector LLC

R has a number of very good packages for manipulating and aggregating data (plyr, sqldf, RevoScaleR, data.table, and more), but when it comes to accumulating results the beginning R user is often at sea. The R execution model is a bit exotic so many R users are very uncertain which methods of accumulating results are efficient and which are inefficient.

In this latest "R as it is" we will quickly become expert at efficiently accumulating results in R. To read more please click here.

by Joseph Rickert

The XXXV Sunbelt Conference of the International Network for Social Network Analysis (INSNA) was held last month at Brighton beach in the UK. (And I am still bummed out that I was not there.)

A run of 35 conferences is impressive indeed, but the social network analysts have been at it for an even longer time than that:

and today they are still on the cutting edge of the statistical analysis of networks. The conference presentations have not been posted yet, but judging from the conference workshops program there was plenty of R action in Brighton.

Social network analysis at this level involves some serious statistics and mastering a very specialized vocabulary. However, it seems to me that some knowledge of this field will become important to everyone working in data science. Supervised learning models and statistical models that assume independence among the predictors will most likely represent only the first steps that data scientists will take in exploring the complexity of large data sets.

And, maybe of equal importance is that fact that working with network data is great fun. Moreover, software tools exist in R and other languages that make it relatively easy to get started with just a few pointers.

From a statistical inference point of view what you need to know is Exponential Random Graph Models (ERGMs) are at the heart of modern social network analysis. An ERGM is a statistical model that enables one to predict the probability of observing a given network from a specified given class of networks based on both observed structural properties of the network plus covariates associated with the vertices of the network. The exponential part of the name comes from exponential family of functions used to specify the form of these models. ERGMs are analogous to generalized linear models except that ERGMs take into account the dependency structure of ties (edges) between vertices. For a rigorous definition of ERGMs see sections 3 and 4 of the paper by Hunter et al. in the 2008 special issue of the JSS, or Chapter 6 in Kolaczyk and Csárdi's book *Statistical Analysis of Network Data with R*. (I have found this book to be very helpful and highly recommend it. Not only does it provide an accessible introduction to ERGMs it also begins with basic network statistics and the igraph package and then goes on to introduce some more advanced topics such as modeling processes that take place on graphs and network flows.)

In the R world, the place to go to work with ERGMs is the statnet.org. statnet is a suite of 15 or so CRAN packages that provide a complete infrastructure for working with ERGMs. statnet.org is a real gem of a site that contains documentation for all of the statnet packages along with tutorials, presentations from past Sunbelt conferences and more.

I am particularly impressed with the Shiny based GUI for learning how to fit ERGMs. Try it out on the Shiny webpage or in the box below. Click the **Get Started** button. Then select "built-in network" and "ecoli 1" under **File type**. After that, click the right arrow in the upper right corner. You should see a plot of the ecoli graph.

--------------------------------------------------------------------------------------------------------------------------

You will be fitting models in no time. And since the commands used to drive the GUI are similar to specifying the parameters for the functions in the ergm package you will be writing your own R code shortly after that.

by Bill Jacobs, Director Technical Sales, Microsoft Advanced Analytics

In the course of working with our Hadoop users, we are often asked, what's the best way to integrate R with Hadoop?

The answer, in nearly all cases is, It depends.

Alternatives ranging from open source R on workstations, to parallelized commercial products like Revolution R Enterprise and many steps in between present themselves. Between these extremes, lie a range of options with unique abilities scale data, performance, capability and ease of use.

And so, the right choice or choices depends on your data size, budget, skill, patience and governance limitations.

In this post, I’ll summarize the alternatives using pure open source R and some of their advantages. In a subsequent post, I’ll describe the options for achieving even greater scale, speed, stability and ease of development by combining open source and commercial technologies.

These two posts are written to help current R users who are novices at Hadoop understand and select solutions to evaluate.

As with most thing open source, the first consideration is of course monetary. Isn’t it always? The good news is that there are multiple alternatives that are free, and additional capabilities under development in various open source projects.

We see generally 4 options for building R to Hadoop integration using entirely open source stacks.

This baseline approach’s greatest advantage is simplicity and cost. It’s free. End to end free. What else in life is?

Through packages Revolution contributed to open source including rhdfs and rhbase, R users can directly ingest data from both the hdfs file system and the hbase database subsystems in Hadoop. Both connectors are part of the RHadoop package created and maintained by Revolution and are a go-to choice.

Additional options exist as well. The RHive package executes Hive’s HQL SQL-like query language directly from R, and provides functions for retrieving metadata from Hive such as database names, table names, column names, etc.

The rhive package, in particular, has the advantage that its data operations some work to be pushed down into Hadoop, avoiding data movement and parallelizing operations for big speed increases. Similar “push-down” can be achieved with rhbase as well. However, neither are particularly rich environments, and invariably, complex analytical problems will reveal some gaps in capability.

Beyond the somewhat limited push-down capabilities, R’s best at working on modest data sampled from hdfs, hbase or hive, and in this way, current R users can get going with Hadoop quickly.

Once you tire of R’s memory barriers on your laptop the obvious next path is a shared server. With today’s technologies, you can equip a powerful server for only a few thousand dollars, and easily share it between a few users. Using Windows or Linux with 256GB, 512GB of RAM, R can be used to analyze files in to the hundreds of gigabytes, albeit not as fast as perhaps you’d like.

Like option 1, R on a shared server can also leverage push-down capabilities of the rhbase and rhive packages to achieve parallelism and avoid data movement. However, as with workstations, the pushdown capabilities of rhive and rhbase are limited.

And of course, while lots of RAM keeps the dread out of memory exhustion at bay, it does little for compute performance, and depends on sharing skills learned [or perhaps not learned] in kindergarten. For these reasons, consider a shared server to be a great add-on to R on workstations but not a complete substitute.

Replacing the CRAN download of R with the R distribution: Revolution R Open (RRO) enhances performance further. RRO is, like R itself, open source and 100% R and free for the download. It accelerates math computations using the Intel Math Kernel Libraries and is 100% compatible with the algorithms in CRAN and other repositories like BioConductor. No changes are required to R scripts, and the acceleration the MKL libraries offer varies from negligible to an order of magnitude for scripts making intensive use of certain math and linear algebra primitives. You can anticipate that RRO can double your average performance if you’re doing math operations in the language.

As with options 1 and 2, Revolution R Open can be used with connectors like rhdfs, and can connect and push work down into Hadoop through rhbase and rhive.

Once you find that your problem set is too big, or your patience is being taxed on a workstation or server and the limitations of rhbase and rhive push down are impeding progress, you’re ready for running R inside of Hadoop.

The open source RHadoop project that includes rhdfs, rhbase and plyrmr also includes a package rmr2 that enables R users to build Hadoop map and reduce operations using R functions. Using mappers, R functions are applied to all of the data blocks that compose an hdfs file, an hbase table or other data sets, and the results can be sent to a reducer, also an R function, for aggregation or analysis. All work is conducted inside of Hadoop but is built in R.

Let’s be clear. Applying R functions on each hdfs file segment is a great way to accelerate computation. But for most, it is the avoidance of moving data that really accentuates performance. To do this, rmr2 applies R functions to the data residing on Hadoop nodes rather than moving the data to where R resides.

While rmr2 gives essentially unlimited capabilities, as a data scientist or statistician, your thoughts will soon turn to computing entire algorithms in R on large data sets. To use rmr2 in this way complicates development, for the R programmer because he or she must write the entire logic of the desired algorithm or adapt existing CRAN algorithms. She or he must then validate that the algorithm is accurate and reflects the expected mathematical result, and write code for the myriad corner cases such as missing data.

rmr2 requires coding on your part to manage parallelization. This may be trivial for data transformation operations, aggregates, etc., or quite tedious if you’re trying to train predictive models or build classifiers on large data.

While rmr2 can be more tedious than other approaches, it is not untenable, and most R programmers will find rmr2 much easier than resorting to Java-based development of Hadoop mappers and reducers. While somewhat tedious, it is a) fully open source, b) helps to parallelize computation to address larger data sets, c) skips painful data movement, d) is broadly used so you’ll find help available, and e), is free. Not bad.

rmr2 is not the only option in this category – a similar package called rhipe is also and provides similar capabilities. rhipe is described here and here and is downloadable from GitHub.

The range of open source-based options for using R with Hadoop is expanding. The Apache Spark community, for example is rapidly improving R integration via the predictably named SparkR. Today, SparkR provides access to Spark from R much as rmr2 and rhipe do for Hadoop MapReduce do today.

We expect that, in the future, the SparkR team will add support for Spark’s MLLIB machine learning algorithm library, providing execution directly from R. Availability dates haven’t been widely published.

Perhaps the most exciting observation is that R has become “table stakes” for platform vendors. Our partners at Cloudera, Hortonworks, MapR and others, along with database vendors and others, are all keenly aware of the dominance of R among the large and growing data science community, and R’s importance as a means to extract insights and value from the burgeoning data repositories built atop Hadoop.

In a subsequent post, I’ll review the options for creating even greater performance, simplicity, portability and scale available to R users by expanding the scope from open source only solutions to those like Revolution R Enterprise for Hadoop.

by Joseph Rickert

It is incredibly challenging to keep up to date with R packages. As of today (6/16/15), there are 6,789 listed on CRAN. Of course, the CRAN Task Views are probably the best resource for finding what's out there. A tremendous amount of work goes into maintaining and curating these pages and we should all be grateful for the expertise, dedication and efforts of the task view maintainers. But, R continues to grow at a tremendous rate. (Have a look at growth curve in Bob Muenchen's 5/22/15 post R Now Contains 150 Times as Many Commands as SAS). CRANberries, a site that tracks new packages and package updates, indicates that over the last few months the list of R packages has been growing by about 100 packages per month. How can anybody hope to keep current?

So, on any given day, expect that finding out what R packages exist that may pertain to any particular topic will require some work. What follows, is a beginners guide to fishing for packages in CRAN. This example looks for "Bayesian" packages using some simple web page scraping and elementary text mining.

The Bayesian Inference Task View lists 144 packages. This is probably everything that is really important, but let's see what else is to be found that has anything at all to do with Bayesian Inference. In the first block of code, R's available.packages() function fetches the list of packages available from my Windows PC. (This is an extremely interesting function and I don't do justice to it here.) Then, this list is used to scrape the package descriptions from the various package webpages. The loop takes some time to run so I saved the package descriptions both in a csv file and a in a .RData workspace.

library(svTools) library(RCurl) library(tm) #----------------------------------------- # TWO HELPER FUNCTIONS # Funcion to get ackage description from CRAN package page getDesc <- function(package){ l1 <- regexpr("</h2>",package) ind1 <- as.integer(l1[[1]]) + 9 l2 <- regexpr("Version",package) ind2 <- as.integer(l2[[1]]) - (46 + nchar("package")) desc <- substring(package,ind1,ind2) return(desc) } # Function to get CRAN package page getPackage <- function(name){ url <- paste("http://cran.r-project.org/web/packages/",name,"/index.html",sep="") txt <- getURL(url,ssl.verifypeer=FALSE) return(txt) } #-------------------------------------------- # SCRAPE PACKAGE DATA FROM CRAN # Get the list of R packages packages <- as.data.frame(available.packages()) head(packages) dim(packages) pkgNames <- rownames(packages) rm(packages) # Dont need this any more pkgDesc <- vector() for (i in 1:length(pkgNames)){ pkgDesc[i] <- getDesc(getPackage(pkgNames[i])) } length(pkgDesc) #6598 #---------------------------------------------- # SOME HOUSEKEEPING # cranP <- data.frame(pkgNames,pkgDesc) # write.csv(cranP,"C:/DATA/CRAN/CRAN_pkgs_6_15_15") # save.image("pkgs.RData") # load("pkgs.RData")

When I did this a few days ago 6,598 packages were available. The next section of code turns the vector of package descriptions into a document corpus and creates a document term matrix with a row for each package and 20,781worth of terms. Taking the transpose of the term matrix makes it easier to see what is going on. The matrix is extremely sparse (only one 1 shows up) as this small portion of the matrix illustrates and all of the terms are pretty much useless. Removing the sparse terms cuts the matrix down to only 372 terms.

# SOME SIMPLE TEXT MINING # Make a corpus out of package descriptions pCorpus <- VCorpus(VectorSource(pkgDesc)) pCorpus inspect(pCorpus[1:3]) # Function to prepare corpus prepC <- function(corpus){ c <- tm_map(corpus, stripWhitespace) c <- tm_map(c,content_transformer(tolower)) c <- tm_map(c,removeWords,stopwords("english")) c <- tm_map(c,removePunctuation) c <- tm_map(c,removeNumbers) return(c)} pCorpusPrep <- prepC(pCorpus) #------------------------------------------------------------ # Create the document term matrix dtm <- DocumentTermMatrix(pCorpusPrep) dtm # <<DocumentTermMatrix (documents: 6598, terms: 20781)>> # Non-/sparse entries: 142840/136970198 # Sparsity : 100% # Maximal term length: 83 # Weighting : term frequency (tf) # Work with the transpose to list keywords as rows inspect(t(dtm[100:105,90:105])) # Docs # Terms 100 101 102 103 104 105 # accomodated 0 0 0 0 0 0 # accompanied 0 0 0 0 0 0 # accompanies 0 0 0 0 0 0 # accompany 0 0 0 0 0 0 # accompanying 0 0 0 0 0 0 # accomplished 0 0 0 0 0 0 # accomplishes 0 0 0 0 0 0 # accordance 0 0 0 0 0 0 # according 0 0 1 0 0 0 # accordingly 0 0 0 0 0 0 # accordinglyp 0 0 0 0 0 0 # account 0 0 0 0 0 0 # accounted 0 0 0 0 0 0 # accounting 0 0 0 0 0 0 # accountp 0 0 0 0 0 0 # accounts 0 0 0 0 0 0 # Reduce the number of sparse terms dtms <- removeSparseTerms(dtm,0.99) dim(dtms) # 6598 372

I am pretty much counting on some luck here, hoping that "Bayesian" will be one of the remaining 372 terms. This last bit of code finds 229 packages associated with the keyword "Bayesian"

# Find the Bayesian packages dtmsT <- t(dtms) keywords <- row.names(dtmsT) bi <- which(keywords == "bayesian") # Find the index of an interesting keyword bayes <- inspect(dtmsT)[bi,] # Vexing that it prints to console bayes_packages_index <- names(bayes[bayes==1]) # Here are the "Bayesian" packages bayes_packages <- pkgNames[as.numeric(bayes_packages_index)] length(bayes_packages) #229 # Here are the descriptions of the "Bayesian" packages bayes_pkgs_desc <- pkgDesc[bayes==1])

Here is the list of packages found.

Not all of these "fish" are going to be worth keeping, but at least we have reduced the search to something manageable. In 10 or 15 minutes of fishing you might catch something interesting.

R is an environment for programming with data, so unless you're doing a simulation study you'll need some data to work with. If you don't have data of your own, we've made a list of open data sets you can use with R to accompany the latest release of Revolution R Open.

At the Data Sources on the Web page on MRAN, you can find links to dozens of open data sources both large and more. You'll find some classics of data science and machine learning, like the Enron emails data set, and the famous Airlines data. You can find official statistics on economics and government from countries around the world, including links to every country's official data repositories at UNdata. There are links to scientific data, including several sources from the social sciences. And of course you'll find links to various financial data sources (but not all of these are 100% free to use).

Many of the data sets are indicated as ready-to-use in R format; for the others, you can use R's various data import tools to access the data (for which there is a great guide at ComputerWorld).

Got other suggestions for great open data sources? Let us know in the comments below, or send an email to mran@revolutionanalytics.com.

MRAN: Data Sources on the Web

Computerworld's Sharon Machlis published today a very useful list of R packages that every R user should know. The list covers packages for data import, data wrangling, data visualization and package development, but for beginning R users the biggest challenge is usually just dealing with data. To that end, I thought it was worth listing the package for data access and manipulation, which I thoroughly endorse:

**Data import/access**: readr (text data files), rio (many binary data file formats), readxl (Excel spreadsheets), googlesheets (Google Sheets), RMySQL (MySQL databases), quantmod (economic and financial data sources);**Data manipulation**: dplyr (general data frame processing); data.table (aggregation and filtering); tidyr (tidying messy data into row/col format); sqldf (SQL queries on data frames), zoo (time series data wrangling)

Check out Sharon's complete list below for details on these and many other useful R packages.

ComputerWorld: Great R packages for data import, wrangling & visualization

R is already in use in well over 100 cities around the world, and now we can add another to the list: Yangon, Myanmar. Ben Marwick is a trainer with Software Carpentry (a non-profit organization devoted to improving basic computing skills among researchers in science, engineering, medicine, and other disciplines), and last month he visited the University of Yangon to teach 23 archaeologists how to use R.

Software Carpentry maintains a number of useful resources for teaching R, and Ben began with the "Programming in R" tutorial to get the students familiar with the R command line. The Data Carpentry Lessons in R provided further depth around creating data frames, and analyzing and plotting data.

Despite the recent democratic reforms, reliable internet access appears to still be a challenge in parts of Myanmar, so Ben had to improvise to get all the students up and running with R during the workshop. Rather than downloading R, he provided USB sticks for the students to install it directly onto their PCs. For the graphics portion of the workshop, Ben wanted to use the ggplot2 package, which has 18 dependencies:

Getting all those dependencies installed via the Web wasn't a practical option so Ben used the miniCRAN package to bundle all of the needed packages and dependencies, which could then be easily installed via the USB sticks. Said Ben:

Had the entire class attempted to get the packages in the usual way, from an online CRAN mirror, we'd probably still be there waiting for the downloads to finish. Although there are no doubt other solutions, I highly recommend local repositories with miniCRAN to anyone going in to a similar situation of needing to use contributed packages with limited (or no) internet access.

You can read all of Ben's stories about teaching R in Myanmar at the link below.

Software Carpentry: Teaching in Yangon

Hadley Wickham and the RStudio team have created some new packages for R, which will be very useful for anyone who needs to read data into R (that is, everyone). The readr package provides functions for reading text data into R, and the readxl package provides functions for reading Excel spreadsheet data into R. Both are much faster than the functions you're probably using now.

The readr package provides several functions for reading tabular text data into R. This is a task normally accomplished with the read.table family of functions in R, and readr provides a number of replacement functions that provide additional functionality and are *much* faster.

First, there's read_table which provides a near-replacement for read.table. Here's a comparison of using both functions on a file with 4 million rows of data (which I created by stacking copies of this file):

dat <- read_table("biggerfile.txt",

col_names=c("DAY","MONTH","YEAR","TEMP"))

dat2 <- read.table("biggerfile.txt",

col.names=c("DAY","MONTH","YEAR","TEMP"))

The commands look quite similar, but while read.table took just over 30 seconds to complete, readr's read_table accomplished the same task in less than a second. The trick is that read_table treats the data as a fixed-format file, and uses C++ to process the data quickly. (One small caveat is that read.table supports arbitrary amounts of whitespace between columns, while read_table requires the columns be lined up exactly. In practice, this isn't much of a restriction.)

Base R has a function for reading fixed-width data too, and here readr *really* shines:

dat <- read_fwf("biggerfile.txt",

fwf_widths(c(3,15,16,12),

col_names=c("DAY","MONTH","YEAR","TEMP")))

dat2 <- read.fwf("biggerfile.txt", c(3,15,16,12),

col.names=c("DAY","MONTH","YEAR","TEMP"))

While readr's read_fwf again accomplished the task in about a second, the standard read.fwf took over 3 minutes — almost 200 times as long.

Other functions in the package include read_csv (and a European-friendly variant read_csv2) for comma-separated data, read_tsv for tab-separated data, and read_lines for line-by-line file extraction (great for complicated post-processing). The package also makes it much easier to read columns of dates in various formats, and sensibly always handles text data as strings (no more strings.as.factors=FALSE).

For data in Excel format, there's also the new readxl package. This package provides function to read Excel worksheets in both .xls and .xlsx formats. I haven't benchmarked the read_excel function myself, but like the readr functions it's based on a C++ library so should be quite snappy. And best of all, it has no external dependencies, so you can use it to read Excel data on just about any platform — there's no requirement that Excel itself be installed.

The readr package is on CRAN now, and readxl can be installed from GiHub. If you try them yourself, let us know how it goes in the comments.

RStudio blog: readr 0.1.0

by Joseph Rickert

There have been well over a hundred books on R published within the last ten years. Most of these texts with titles like “Introduction Statistics with R” or “Time Series with R” offer the reader a way to jump right in and perform some concrete statistical analysis using R’s myriad built-in functions and extensive visualization features. And, while it is true that some R books appear to be little more than a rehash of basic documentation, there are nevertheless scores of carefully written texts from experts that not only illuminate some area of statistics but also demonstrate some good R programming as well. In no small way, I believe these works have contributed to the R’s popularity and growth by providing quality application level documentation.

Comparatively few books, however, are focused on teaching R programming itself. So it was a pleasant surprise when a copy of Garrett Grolemund’s “Hands-On Programming with R: Write Your Own Functions and Simulations” (O’Reilly 2015) came my way. This is a superb book: well conceived, unusual in the choice of material and sufficiently streamlined (185 pages not including the appendices) to make it a non-stop beginning-to-end read.

At the very beginning Garrett says:

I want to help you become a data scientist, as well as a computer scientist, so this book will focus on programming skills that are most related to data science.

These skills have to do with solving what Garrett refers to as the "logistical problems" of data science. In the context of the R language, they include acquiring data, manipulating R objects, constructing custom functions, negotiating the R environment and above all, writing vectorized code.

Given the ambitious agenda, "Hands-On Programming with R" starts surprisingly slowly with arithmetic, assignment, useful R functions and basic housekeeping chores: getting help and looking for packages. Then, still slowly and deliberately the text discusses R objects, atomic vectors, data types and data structures. 48 pages in and Garrett is still lingering on attributes. But this discussion is more sophisticated than most authors attempt. The presentation of type, attributes and class, in particular the insight that the concept of class follows directly from attributes, is meant to cultivate a programmer's mindset.

Around page 65 when Garrett gets excited about subsetting the pace really picks up. If you are hooked and still reading like I was by page 112 you will have acquired a working knowledge of scoping rules and environments and be ready for the beguilingly lucid discussion of the S3 class system that begins on page 139. Even if you are an experienced R programmer you may want to borrow a copy of the book and read this. If you really know your stuff, you may not learn anything new, but I bet you will be hard pressed to do a better job of explaining S3 classes to someone else.

After S3 the text moves to considering loops as a prelude to its presentation of vectorized code. This section, which is really the final destination of the book, is exceptionally well done. First, vectorized code is characterized as code that takes advantage of three great features of the R language: fast logical tests, powerful subsetting operations and a multitude of built-in functions that permit element-wise execution. Then the text demonstrates how to put these ideas into practice.

As you can gather, I was impressed by the conceptual formulation of the material. However, the real strength of the text is its sharp presentation of essential elements of the R language through a well-crafted, extended example that forms the spine of the book. "Hands-On Programming with R" is indeed a “hands-on” text that guides and challenges the reader to write good R code. A reader / coder who makes it to the end will have worked through several refinements of a small collection of functions that implement a fairly complex slot machine simulation. This example significantly raises the bar for selecting code examples in any R book. The simulation is rich enough to illustrate all of the R features presented in the text while allowing for refinement and polishing as the final form of the slot machine takes shape. The whole presentation is very tight. Garrett tells a pretty good story. During the final vectorized-code chapter I found myself reading with the delight of anticipation: “Just how is he going to make this code better?”.

I should also mention that the book is notable for what it does not include. This might be the first R book I have encountered that doesn’t develop any statistical models. Not a single regression is fit and there are no plots to speak of (3 histograms and a scatter plot). Certainly, this is the only R book I have come across that mentions data science in the preface that is not replete with Random Forest models and the like. Presumably, all of this will show up in the follow up book that Garrett promises in the preface.

"Hands-On Programming with R" presents but one carefully thought-through trajectory of many possible R language excursions. It is not to be compared with Hadley Wickham’s encyclopedic "Advanced R" and it contains only a fraction of the material you can find in Norm Matloff’s "The Art of R Programming". But, having worked through "Hands-On Programming with R" both of these texts should be accessible.

Garrett's book is a good read: a technical story with a plot and a few surprises that could help anyone starting out with the R language learn to write some pretty slick code.

Computerworld's Sharon Machlis has done a great service for the R community — and R especially novices — by creating the on-line Beginner's Guide to R. You can read our overview of her guide from 2013 here, but it's been regularly updated since then.

As an added bonus, the guide is now available as a downloadable PDF for your offline-reading pleasure. You'll need to provide your email address to download it, but that's a tiny price to pay for this excellent hands-on guide. Put it on your e-reader (or print it out if you're going the old-school route), put it next to your laptop, and type in the R commands from its 45 pages of worked examples. (Yes, you could cut-and-paste them if you like, but I find actually *typing* commands is an effective way to learn a new language.) Download the PDF from the link below, and then check out some of the other beginner's tips for R from our archive.

Computerworld: Learn R for beginners with our PDF