Posted by David Smith at 14:36 in random | Permalink | Comments (0)

The scientific process has been going through a welcome period of introspection recently, with a focus on understanding just how reliable the results of scientific studies are. We're not talking here about scientific fraud, but how the scientific process itself and the focus on p-values (which not even statisticians can easily explain) as the criterion for a positive result leads to a surprisingly large number of false positives to be published. On top of that, there's the issue of publication bias (especially in the pharmaceutical industry), an area where Ben Goldacre has taken a lead. The whole issue is wrapped in the concept of reproducibility — the idea that independent researchers should be able to replicate the results of published studies — for which David Spiegelhalter gives a great primer in the video below.

So it's welcome news that one of the top science breakthroughs of 2015 according to *Science* and *Nature* is Brian Nosek's project to reproduce the results of 100 scientific studies published in psychology journals. The detailed methodology is described in this paper, but in short Nosek recruited replication teams to recreate the studies as described in the carefully-selected papers, and analyze the data they collect:

Moreover,to maximize reproducibility and accuracy, the analyses for every replication study were reproduced by another analyst independent of the replication team using the R statistical programming language and a standardized analytic format. A controller R script was created to regenerate the entire analysis of every study and recreate the master data file.

R is a natural fit for a reproducibility project like this: as a scripting language, the R script itself provides a reproducible documentation of every step of the process. (Revolution R Open, Microsoft's enhanced R distribution, additionally includes features to facilitate reproducibility when using R packages.) The R script used for the psychology replication project describes and executes the process for checking the results of the papers.

Of the 100 papers studies, 97 of them reported statistically significant effects. (This is itself a reflection of publication bias; studies where there is no effect rarely get published.) Yet of those 97 papers, in 61 cases the reported significant results could not be replicated when the study was repeated. Their conclusion:

A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes.

Study like this of the scientific method itself can only improve the scientific process, and is deserving of its accolade as a breakthrough. Read more about the project and the replicated studies at the link below.

Open Science Framework: Estimating the Reproducibility of Psychological Science (via Solomon Messing)

Posted by David Smith at 09:43 in applications, R, statistics | Permalink | Comments (0)

by Joseph Rickert

There are number of R packages devoted to sophisticated applications of Markov chains. These include msm and SemiMarkov for fitting multistate models to panel data, mstate for survival analysis applications, TPmsm for estimating transition probabilities for 3-state progressive disease models, heemod for applying Markov models to health care economic applications, HMM and depmixS4 for fitting Hidden Markov Models and mcmc for working with Monte Carlo Markov Chains. All of these assume some considerable knowledge of the underlying theory. To my knowledge only DTMCPack and the relatively recent package, markovchain, were written to facilitate basic computations with Markov chains.

In this post, we’ll explore some basic properties of discrete time Markov chains using the functions provided by the markovchain package supplemented with standard R functions and a few functions from other contributed packages. “Chapter 11”, of Snell’s online probability book will be our guide. The calculations displayed here illustrate some of the theory developed in this document. In the text below, section numbers refer to this document.

A large part of working with discrete time Markov chains involves manipulating the matrix of transition probabilities associated with the chain. This first section of code replicates the Oz transition probability matrix from section 11.1 and uses the plotmat() function from the diagram package to illustrate it. Then, the efficient operator %^% from the expm package is used to raise the Oz matrix to the third power. Finally, left matrix multiplication of OZ^3 by the distribution vector u = (1/3, 1/3, 1/3) gives the weather forecast three days ahead.

library(expm)

library(markovchain)

library(diagram)

library(pracma)

stateNames <- c("Rain","Nice","Snow")

Oz <- matrix(c(.5,.25,.25,.5,0,.5,.25,.25,.5),

nrow=3, byrow=TRUE)

row.names(Oz) <- stateNames; colnames(Oz) <- stateNames

Oz

# Rain Nice Snow

# Rain 0.50 0.25 0.25

# Nice 0.50 0.00 0.50

# Snow 0.25 0.25 0.50

plotmat(Oz,pos = c(1,2),

lwd = 1, box.lwd = 2,

cex.txt = 0.8,

box.size = 0.1,

box.type = "circle",

box.prop = 0.5,

box.col = "light yellow",

arr.length=.1,

arr.width=.1,

self.cex = .4,

self.shifty = -.01,

self.shiftx = .13,

main = "")

Oz3 <- Oz %^% 3

round(Oz3,3)

# Rain Nice Snow

# Rain 0.406 0.203 0.391

# Nice 0.406 0.188 0.406

# Snow 0.391 0.203 0.406

u <- c(1/3, 1/3, 1/3)

round(u %*% Oz3,3)

#0.401 0.198 0.401

The igraph package can also be used to Markov chain diagrams, but I prefer the “drawn on a chalkboard” look of plotmat.

This next block of code reproduces the 5-state Drunkward’s walk example from section 11.2 which presents the fundamentals of absorbing Markov chains. First, the transition matrix describing the chain is instantiated as an object of the S4 class makrovchain. Then, functions from the markovchain package are used to identify the absorbing and transient states of the chain and place the transition matrix, P, into canonical form.

p <- c(.5,0,.5)

dw <- c(1,rep(0,4),p,0,0,0,p,0,0,0,p,rep(0,4),1)

DW <- matrix(dw,5,5,byrow=TRUE)

DWmc <-new("markovchain",

transitionMatrix = DW,

states = c("0","1","2","3","4"),

name = "Drunkard's Walk")

DWmc

# Drunkard's Walk

# A 5 - dimensional discrete Markov Chain with following states

# 0 1 2 3 4

# The transition matrix (by rows) is defined as follows

# 0 1 2 3 4

# 0 1.0 0.0 0.0 0.0 0.0

# 1 0.5 0.0 0.5 0.0 0.0

# 2 0.0 0.5 0.0 0.5 0.0

# 3 0.0 0.0 0.5 0.0 0.5

# 4 0.0 0.0 0.0 0.0 1.0

# Determine transient states

transientStates(DWmc)

#[1] "1" "2" "3"

# determine absorbing states

absorbingStates(DWmc)

#[1] "0" "4"

In canonical form, the transition matrix, P, is partitioned into the Identity matrix, I, a matrix of 0’s, the matrix, Q, containing the transition probabilities for the transient states and a matrix, R, containing the transition probabilities for the absorbing states.

Next, we find the Fundamental Matrix, N, by inverting (I – Q). For each transient state, j, n_{ij} gives the expected number of times the process is in state j given that it started in transient state i. u_{i} is the expected time until absorption given that the process starts in state i. Finally, we compute the matrix B, where b_{ij} is the probability that the process will be absorbed in state J given that it starts in state i.

# Find Matrix Q

getRQ <- function(M,type="Q"){

if(length(absorbingStates(M)) == 0) stop("Not Absorbing Matrix")

tm <- M@transitionMatrix

d <- diag(tm)

m <- max(which(d == 1))

n <- length(d)

ifelse(type=="Q",

A <- tm[(m+1):n,(m+1):n],

A <- tm[(m+1):n,1:m])

return(A)

}

# Put DWmc into Canonical Form

P <- canonicForm(DWmc)

P

Q <- getRQ(P)

# Find Fundamental Matrix

I <- diag(dim(Q)[2])

N <- solve(I - Q)

N

# 1 2 3

# 1 1.5 1 0.5

# 2 1.0 2 1.0

# 3 0.5 1 1.5

# Calculate time to absorption

c <- rep(1,dim(N)[2])

u <- N %*% c

u

# 1 3

# 2 4

# 3 3

R <- getRQ(P,”R”)

B <- N %*% R

B

# 0 4

# 1 0.75 0.25

# 2 0.50 0.50

# 3 0.25 0.75

For section 11. 3, which deals with regular and ergodic Markov chains we return to Oz, and provide four options for calculating the steady state, or limiting probability distribution for this regular transition matrix. The first three options involve standard methods which are readily available in R. Method 1 uses %^% to raise the matrix Oz to a sufficiently high value. Method 2 calculates the eigenvalue for the eigenvector 1, and method 3 uses the nullspace() function form the pracma package to compute the null space, or kernel of the linear transformation associated with the matrix. The fourth method uses the steadyStates() function from the markovchain package. To use this function, we first convert Oz into a markovchain object.

# 11.3 Ergodic Markov Chains

# Four methods to get steady states

# Method 1: compute powers on Matrix

round(Oz %^% 6,2)

# Rain Nice Snow

# Rain 0.4 0.2 0.4

# Nice 0.4 0.2 0.4

# Snow 0.4 0.2 0.4

# Method 2: Compute eigenvector of eigenvalue 1

eigenOz <- eigen(t(Oz))

ev <- eigenOz$vectors[,1] / sum(eigenOz$vectors[,1])

ev

# Method 3: compute null space of (P - I)

I <- diag(3)

ns <- nullspace(t(Oz - I))

ns <- round(ns / sum(ns),2)

ns

# Method 4: use function in markovchain package

OzMC<-new("markovchain",

states=stateNames,

transitionMatrix=

matrix(c(.5,.25,.25,.5,0,.5,.25,.25,.5),

nrow=3,

byrow=TRUE,

dimnames=list(stateNames,stateNames)))

steadyStates(OzMC)

The steadyState() function seems to be reasonably efficient for fairly large Markov Chains. The following code creates a 5,000 row by 5,000 column regular Markov matrix. On my modest, Lenovo ThinkPad ultrabook it took a little less than 2 minutes to create the markovchain object and about 11 minutes to compute the steady state distribution.

# Create a large random regular matrix

randReg <- function(N){

M <- matrix(runif(N^2,min=1,max=N),nrow=N,ncol=N)

rowS <- rowSums(M)

regM <- M/rowS

return(regM)

}

N <- 5000

M <-randReg(N)

#rowSums(M)

system.time(regMC <- new("markovchain", states = as.character(1:N),

transitionMatrix = M,

name = "M"))

# user system elapsed

# 98.33 0.82 99.46

system.time(ss <- steadyStates(regMC))

# user system elapsed

# 618.47 0.61 640.05

We conclude this little Markov Chain excursion by using the rmarkovchain() function to simulate a trajectory from the process represented by this large random matrix and plot the results. It seems that this is a reasonable method for simulating a stationary time series in a way that makes it easy to control the limits of its variability.

#sample from regMC

regMCts <- rmarkovchain(n=1000,object=regMC)

regMCtsDf <- as.data.frame(regMCts,stringsAsFactors = FALSE)

regMCtsDf$index <- 1:1000

regMCtsDf$regMCts <- as.numeric(regMCtsDf$regMCts)

library(ggplot2)

p <- ggplot(regMCtsDf,aes(index,regMCts))

p + geom_line(colour="dark red") +

xlab("time") +

ylab("state") +

ggtitle("Random Markov Chain")

For more on the capabilities of the markovchain package do have a look at the package vignette. For more theory on both discrete and continuous time Markov processes illustrated with R see Norm Matloff's book: From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science.

Posted by Joseph Rickert at 08:30 in beginner tips, packages, R, statistics | Permalink | Comments (4)

If you want to get started doing data science with R in the cloud, a good place to start is Stephen Elston's free O'Reilly report, Data Science in the Cloud with Azure ML and R. But if you learn better with a show-and-tell approach, he now also has an O'Reilly Video Training course, Data Science with Microsoft Azure and R. The first part of the course is free, and includes an overview of Azure ML Studio (the browser-based drag-and-drop data science workflow tool), using the built-in data import, manipulation, and modeling modules in Azure ML, and using the Execute R Script node to run custom R code. Stephen takes you through the step-by-step process of writing and testing the R code in R studio, then running it as part of the workflow with ML Studio.

The remainder of the course must be purchased to view (current price is USD$119.99), and covers advanced R topics including the dplyr and ggplot2 packages, statistical modeling (including regression, time series and random forests), and writing functions in R. There's also a chapter on publishing Azure ML models as Web services. To get started with the free part of the course, follow the link below.

O'Reilly Video Training: Data Science with Microsoft Azure and R

Posted by David Smith at 11:38 in courses, Microsoft, R | Permalink | Comments (0)

By Virgilio Gómez Rubio, Spanish R Users Organizing Committee

As every autumn since 2009, Spanish R users gathered at their annual meeting. It is organised by Spanish R users group ‘Comunidad R-Hispano’and took place in 5-6 November in the historic city of Salamanca. The 7th Meeting of Spanish R Users attracted more than 100 R entusiasts and provided a mix of tutorials and contributed talks within the quarters of the University of Salamanca.

First of all, from Comunidad R-Hispano we would like to thanks our sponsors Revolution Analytics, Telefónica, Instituto de Ingeniería del Conocimiento, Open Sistemas and Datatons for their financial support. Publishers Springer, Wiley and CRC/Taylor & Francis also supported this meeting by providing flyers, discounts and samples of recent books.

Tutorials presented at the conference covered topics such as R on Spark, the analysis of data from telephone surveys, analysis of survival data,analysis of data obtained from randomised surveys, analysis of network data with igraph and how to use R for the analysis of questionnaire data. All these tutorials provided a hands-on approach to the subject and the materials are available from the conference website.

Contributed talks focused on four main areas: Applications, Interfaces/Data Mining, Statistical Methodology and Biostatistics. Altogether, there were 22 oral presentations on these topics.

Among the Applications, Marcos Fernández Arias showed how to use R to gather information available on-line to pay a fair price for a new car.Teresa González Arteaga also shared some of her teaching experiences with R in a Degree in Statistics. In the Interfaces/Data Mining section, Christian González Martel and co-authors explored the use of R to use Wikipedia searches of top Spanish companies for investment in the Spanish stock market.

Regarding the contributed session on Statistical Methodology, José Luis Vicente Villardón talked about his experience migrating his code for multivariate analysis using biplots from Matlab to R. Finally, in the Biostatistics section, Carlos Prieto and co-authors presented some interactive plots developed with R and other tools to visualize genomic data.

The prize to the best presentation by a young presenter was awarded to Karel López Quintero for his work on a Price Sensitivity Meter (PSM) with R.

Comunidad R-Hispano is already preparing the 8th Meeting that will take place in November 2016, in the University of Castilla-La Mancha in Albacete and it will be locally organised by the same team that took care of useR! 2013.

Posted by Joseph Rickert at 08:12 in events, R, user groups | Permalink | Comments (0)

One of the themes of the Christmas movie classic Love Actually is the interconnections between people of different communities and cultures, from the Prime Minister of the UK to a young student in London. StackOverflow's David Robinson brings these connections to life by visualizing the network diagram of 20 characters in the movie, based on scenes in which they appear together:

That graph is based on all but the last scene in the movie (where most of the characters come together in the airport, which makes for a less interesting cluster diagram). Until that point, Billy and Joe's story takes place independently of all the other characters, while five other characters are connected to the rest by just one scene (Mark and Sarah's conversation at a wedding). David even created an interactive Shiny app (from where I grabbed the chart above) that allows you to step through the movie scene by scene and watch the connections develop as the movie unfolds.

The network analysis begind the chart and the app was done entirely in the R language. David began by parsing the text of the movie script, which yields a data file of each character's lines labelled by scene number. From there, he created a co-occurence matrix counting the number of times each pair of characters shared a scene, from which it was a simple process to generate the network diagram using the igraph package. David helpfully provided the R code, so if you have another movie script at hand, it should be easy to adapt. You can learn more about the details of the analysis in David's blog post, linked below.

Variance Explained: Analyzing networks of characters in 'Love Actually''

Posted by David Smith at 12:26 in data science, R | Permalink | Comments (0)

Happy New Year everyone! It's hard to believe that this blog has now been going since 2008: our anniversary was on December 9. Thanks to everyone who has supported this blog over the past 7 years by reading, sharing and commenting on our posts, and an extra special thanks to my co-bloggers Joe Rickert and Andrie de Vries and all the guest bloggers from Microsoft and elsewhere that have contributed this year.

2015 was a busy year for the blog, with a 8% increase in users and a 13% increase in page views compared to 2014. The most popular posts of the year, starting with the most popular, were:

- Revolution Analytics joins Microsoft (January 23)
- In-database R coming to SQL Server 2016 (May 15)
- R at Microsoft (June 26)
- Using R with Jupyter Notebooks, by Andrie de Vries (September 9)
- Association Rules and Market Basket Analysis with R (April 8)
- New packages for reading data into R — fast (Apr 10)
- SparkR: Distributed data frames with Spark and R (Jun 12)
- Because it's Friday: Visualizing the Discrete Fourier Transform (Sep 18)
- Revolution Analytics ∈ Microsoft == TRUE (Apr 6)
- Parallel Programming with GPUs and R, by Norman Matloff (Jan 27)

That's all from us from the team here at *Revolutions* for this week, and indeed for this year! We'll be back in the New Year with more news, tips and tricks about R, but in the meantime we'll let R have the last word thanks to some careful seed selection by Berry Boessenkool:

>set.seed(31612310)>paste0(sample(letters,5,T))

[1]"h" "a" "p" "p" "y">set.seed(12353)>sample(0:9,4,T)

[1]2 0 1 6

Posted by David Smith at 04:00 in R | Permalink | Comments (1)

It's been a banner year for the R project in 2015, with frequent new releases, ever-growing popularity, a flourishing ecosystem, and accolades from both users and press. Here's a roundup of the big events for R from 2015.

R continues to advance under the new leadership of the R Foundation. There were *five* updates in 2015: R 3.1.3 in March, R 3.2.0 in April, R 3.2.1 in June, R 3.2.2 in August, and R 3.2.3 in December. That's impressive release rate, especially for a project that's been in active development for 18 years!

R's popularity continued unabated in 2015. R is the most popular language for data scientists according to the 2015 Rexer survey, and the most popular Predictive Analytics / Data Mining / Data Science software in the KDnuggets software poll. While R's popularity amongst data scientists is no surprise, R ranked highly even amongst general-purpose programming languages. In July, R placed #6 in the IEEE list of top programming languages, rising 3 places from its 2014 ranking. It also continues to rank highly amongst StackOverflow users, where it is the 8th most popular language by activity, and the fastest-growing language by number of questions. R was also a top-ranked language on GitHub in 2015.

The R Consortium, a trade group dedicated to the support and growth of the R community, was founded in June. Already, the group has published best practices for secure use of R, and formed the Infrastructure Steering Committee to fund and oversee commuinity projects. Its first project (a hub for R package developers) was funded in November, and proposals are being accepted for future projects.

2015 was the year that Microsoft put its weight behind R, beginning with the acquisition of Revolution Analytics in April and prominent R announcements at the BUILD Conference in May. Microsoft continues the steady pace of open-source R project releases, with regular updates to Revolution R Open, DeployR Open and the foreach and checkpoint packages. Revolution R Enterprise saw updates, and new releases of several Microsoft platforms have integrated R, including SQL Server 2016, Cortana Analytics, PowerBI, Azure and the Data Science Virtual Machine.

Activity within local R user groups accelerated in 2015, with 18 new groups founded for a total of 174. Microsoft expanded its R user group sponsorship with the Microsoft Data Science User Group Program. Community conferences also boasted record attendance, inclusing at useR! 2015, R/Finance, EARL Boston, and EARL London. Meanwhile, companies including Betterment, Zillow, Buzzfeed, the New York Times and many others shared how they benefit from R.

R also got some great coverage in the media this year, with features in Priceonomics, TechCrunch, Nature, Inside BigData, Mashable, The Economist, opensource.com and many other publications.

That's a pretty big year ... and we expect even more from R in 2016. A big thanks go out to everyone in the R community, and especially the R Core group, for making R the standout success it is today. Happy New Year!

Posted by David Smith at 08:44 in R, roundups | Permalink | Comments (0)

by Andrie de Vries

A week ago my high school friend, @XLRunner, sent me a link to the article "How Zach Bitter Ran 100 Miles in Less Than 12 Hours". Zach's effort was rewarded with the American record for the 100 mile event.

This reminded me of some analysis I did, many years ago, of the world record speeds for various running distances. The International Amateur Athletics Federation (IAAF) keeps track of world records for distances from 100m up to the marathon (42km). The distances longer than 42km do not fall in the IAAF event list, but these are also tracked by various other organisations.

You can find a list of IAAF world records at Wikipedia, and a list of ultramarathon world best times at Wikepedia.

I extracted only the mens running events from these lists, and used R to plot the average running speeds for these records:

Continue reading "Using segmented regression to analyse world record running times" »

Posted by Andrie de Vries at 09:37 in R, sports, statistics | Permalink | Comments (3)

by Matt Parker, Data Scientist at Microsoft

One of the great advantages of R's openness is its extensibility. R's abundant packages are the most conspicuous example of that extensibility, and Revolution R Enterprise is a powerful example of how far it can stretch.

But R is also part of an entire ecosystem of open tools that can be linked together. For example, Markdown, Pandoc, and knitr combine to make R an incredible tool for dynamic reporting and reproducible research. If your chosen output format is HTML, you've linked into yet another open ecosystem with countless further extensions.

One of those extensions - and the focus of this post - is jQuery UI. jQuery UI makes a set of JavaScript's most useful moves available to developers as a robust, easy-to-implement toolkit ideal for adding a bit of interactivity to your knitr reports.

For example: it's easy to use jQuery UI's Tabs widget to split a long report across several tabs of a webpage. Tabs are great for splitting complex reports up by topic, or for providing different types of users with customized views of the results.

To get a sense of what this conversion might look like, here's a simple R-Markdown report without tabs (Rmd source):

... and the same report with tabs (source):

Here's how I added tabs to the report.

1) First, I downloaded jQuery UI. Picking the right place to store the library can be tricky, but as long as the jQuery UI files are accessible to knitr when it's building the report, you'll be okay. For this demo report, I just unzipped the files right next to the .rmd source.

2) Next, I added a few lines to the `<head>`

element of the report. Every
webpage has a `<head>`

element. knitr would typically build this for you, but
in this case we need to write it manually to be sure that the jQuery UI scripts
and CSS are linked in the HTML output.

```
<head>
<meta charset="utf-8">
<title>Reported Active Tuberculosis Cases in the United States, 1993-2013</title>
<link rel="stylesheet" href="jquery-ui/jquery-ui.min.css">
<script src="jquery-ui/external/jquery/jquery.js"></script>
<script src="jquery-ui/jquery-ui.js"></script>
<script>
$(function() {
$( "#tabs" ).tabs();
});
</script>
</head>
```

3) Next, I created the navigation bar by creating an HTML chunk (`div`

) with a
list inside of it (`ul`

). Each item in that list (`li`

) represents one tab
that I'd like the page to have. Finally, I make each of those list items a link
with a unique tag (`<a href="#nation">`

),
and give the link a title (`Nationally`

, `By State`

, `Treatment Completion`

).

```
<div id="tabs">
<ul>
<li><a href="#nation">Nationally</a></li>
<li><a href="#states">By State</a></li>
<li><a href="#treatment">Treatment Completion</a></li>
</ul>
```

Don't worry if you don't understand the HTML syntax here - you can just copy and edit the code above.

4) Finally, I marked out which sections of R-Markdown I wanted to put on each
tab by surrounding that section with a `div`

:

```
<div id="nation">
## Reported Active TB Cases in the United States, 1993-2013
```{r nation}
tbstats %>%
group_by(Year) %>%
summarise(n_cases = sum(Count)) %>%
ggplot(aes(x = Year, y = n_cases)) +
geom_line(size = 2) +
labs(x = "Year Reported",
y = "Number of Cases",
title = "Reported Active TB Cases in the United States") +
expand_limits(y = 0)
```
</div>
```

There are two crucial details here:
- the `div`

has an `id`

that corresponds to one of the tabs I've created
(`href=#nation`

corresponds to `<div id="nation">`

)
- the `div`

is closed with a `</div>`

tag. Without this, the entire report
would be included on the first tab.

5) Click the "Knit HTML" button! knitr will convert your R-Markdown into plain Markdown, and then call Pandoc to complete the conversion into gloriously-tabbed HTML.

Tabs are very handy for reporting - but the whole HTML/CSS/JavaScript ecosystem is at your disposal. If you've seen other good reporting tricks in HTML, let us know in the comments below.

Posted by David Smith at 08:18 in developer tips, R | Permalink | Comments (6)

Got comments or suggestions for the blog editor?

Email David Smith.

Email David Smith.

- academia
- advanced tips
- announcements
- applications
- beginner tips
- big data
- courses
- current events
- data science
- developer tips
- events
- finance
- government
- graphics
- high-performance computing
- life sciences
- Microsoft
- open source
- other industry
- packages
- popularity
- predictive analytics
- profiles
- R
- R is Hot
- random
- reviews
- Revolution
- Rmedia
- roundups
- sports
- statistics
- user groups

- Find R packages

CRAN package directory at MRAN - Download Microsoft R Open

Free, high-performance R - R Project site

Information about the R project

- FlowingData

Modern data visualization - One R Tip A Day

Code examples for graphics and analysis - Probability and statistics blog

Monte Carlo simulations in R - R Bloggers

Daily news and tutorials about R, contributed by R bloggers worldwide. - R Project group on analyticbridge.com

Community and discussion forum - Statistical Modeling, Causal Inference, and Social Science

Andrew Gelman's statistics blog