by Joseph Rickert
Broadly speaking, a metaanalysis is any statistical analysis that attempts to combine the results of several individual studies. The term was apparently coined by statistician Gene V Glass in a 1976 speech he made to the American Education Research Association. Since that time, not only has metaanalysis become a fundamental tool in medicine, but it is also becoming popular in economics, finance, the social sciences and engineering. Organizations responsible for setting standards for evidencebased medicine such as the United Kingdom’s National Institute for Health and Care Excellence (NICE) make extensive use of metaanalysis.
The application of metaanalysis to medicine is intuitive and, on the surface, compelling. Clinical trials designed to test efficacy for some new treatment for a disease against the standard treatment tend to be based on relatively small samples. (For example, the largest four trials for Respiratory Tract Diseases currently listed on ClinicalTrials.gov has an estimated enrollment of 533 patients.) It would seem to be a “no brainer” to use “all of the information” to get more accurate results. However, as for so many things, the devil is in the details. The preliminary tasks of establishing a rigorous protocol for guiding the metaanalysis and the systematic review to search for relevant studies are themselves far from trivial. One has to work hard to avoid “selection bias”, “publication bias” and other even more subtle difficulties.
In my limited experience with metaanalysis, I found it extraordinariy difficult to determine whether patient populations from different clinical trials were sufficiently homogenous to be included in the same metaanalysis. Even when working with wellwritten papers, published in quality journals, a considerable amount of medical expertise was required to interpret the data. I came away with the strong impression that a good metaanalysis requires collaboration from a team of experts.
Historically, it has probably been the case that most metaanalyses were conducted either with general tools such as Excel or specialized software like RevMan from the Cochrane Collaboration. However, R is the natural platform for metaanalysis both because of the myriad possibilities for statistical analyses that are not generally available through the specialized software, and because of the many packages devoted to various aspects of metaanalysis. The CRAN Meta Analysis Task View is exceptionally wellorganized listing R packages according to the different stages of conducting a metaanalysis and also calling out some specialized techniques such as metaregression and networkmeta analysis.
ln a future post, I hope to be able to explore some of these packages more closely. For now, let’s look at a very simple analysis based on Thomas Lumley’s rmeta package which has been a part of R since 1999. The following simple metaanalysis is written up very nicely in the book by Chen and Peace titled Applied MetaAnalysis with R.
The cochrane data set in the rmeta package contains the results from seven randomized clinical trials designed to test the effectiveness of corticosteriod therapy in preventing neonatal deaths in premature labor. The columns of the data set are: the name of the trial center, the number of deaths in the treatment group, the total number of patients in the treatment group, the number of deaths in the control group and the total number of patients in the control group.
The null hypothesis is that there is no difference between treatment and control. Following Chen and Peace, we fit both fixed effects and random effects models to look at the odds ratios.
The summary for the fixed effects models shows that while only two studies, Auckland and Doran, individually show a significant effect, the overall confidence interval from the Mantel Haenszel test does indicate a benefit from the treatment.
Fixed effects ( MantelHaenszel ) metaanalysis Call: meta.MH(ntrt = n.trt, nctrl = n.ctrl, ptrt = ev.trt, pctrl = ev.ctrl, names = name, data = cochrane)  OR (lower 95% upper) Auckland 0.58 0.38 0.89 Block 0.16 0.02 1.45 Doran 0.25 0.07 0.81 Gamsu 0.70 0.34 1.45 Morrison 0.35 0.09 1.41 Papageorgiou 0.14 0.02 1.16 Tauesch 1.02 0.37 2.77  MantelHaenszel OR =0.53 95% CI ( 0.39,0.73 ) Test for heterogeneity: X^2( 6 ) = 6.9 ( pvalue 0.3303 )
The summary for the random effects model for this data is identical except, as one would expect, the overall confidence interval is somewhat wider: SummaryOR= 0.53 95% CI ( 0.37,0.78 ). A slight modification to enhanced the forest plot code provided by Chen and Peace (which works for both the fixed effects and random effects model objects) shows the typical way to present these results.
CPplot < function(model){ c1 < c("","Study",model$names,NA,"Summary") c2 < c("Deaths","(Steroid)",cochrane$ev.trt,NA,NA) c3 < c("Deaths","(Placebo)",cochrane$ev.ctrl,NA,NA) c4 < c("","OR",format(exp(model[[1]]),digits=2),NA,format(exp(model[[3]]),digits=2)) tableText <cbind(c1,c2,c3,c4) mean < c(NA,NA,model[[1]],NA,model[[3]]) stderr < c(NA,NA,model[[2]],NA,model[[4]]) low < mean  1.96*stderr up < mean + 1.96*stderr forestplot(tableText,mean,low,up,zero=0, is.summary=c(TRUE,TRUE,rep(FALSE,8),TRUE),clip=c(log(0.1),log(2.5)),xlog=TRUE) }
CPplot(model.FE)
The whole idea of metaanalysis is intriguing. However, because of the challenges I mentioned above, I would be remiss not to point out that it elicits considerable criticism. The article Metaanalysis and its problems by H J Eysenck captures the issues and is well worth reading. Also, have a look at the review article by Walker, Hernandez and Kattan writing in the Cleveland Clinic Journal of Medicine.
With the growing popularity of R, there is an associated increase in the popularity of online forums to ask questions. One of the most popular sites is StackOverflow, where more than 60 thousand questions have been asked and tagged to be related to R.
On the same page, you can also find related tags. Among the top 15 tags associated with R, several are also packages you can find on CRAN:
It very easy to install these packages directly from CRAN using the R function install.packages(), but this will also install all these package dependencies.
This leads to the question: How can one determine all these dependencies?
It is possible to do this using the function available.packages() and then query the resulting object.
But it is easier to answer this question using the functions in a new package, called miniCRAN, that I am working on. I have designed miniCRAN to allow you to create a mini version of CRAN behind a corporate firewall. You can use some of the function in miniCRAN to list packages and their dependencies, in particular:
I illustrate these functions in the following scripts.
Start by loading miniCRAN and retrieving the available packages on CRAN. Use the function pkgAvail() to do this:
library(miniCRAN) pkgdata < pkgAvail(repos = c(CRAN="http://cran.revolutionanalytics.com"), type="source") head(pkgdata[, c("Depends", "Suggests")]) ## Depends Suggests ## A3 "R (>= 2.15.0), xtable, pbapply" "randomForest, e1071" ## abc "R (>= 2.10), nnet, quantreg, MASS" NA ## abcdeFBA "Rglpk,rgl,corrplot,lattice,R (>= 2.10)" "LIM,sybil" ## ABCExtremes "SpatialExtremes, combinat" NA ## ABCoptim NA NA ## ABCp2 "MASS" NA
Next, use the function pkgDep() to get dependencies of the 7 popular tags on StackOverflow:
tags < c("ggplot2", "data.table", "plyr", "knitr", "shiny", "xts", "lattice") pkgList < pkgDep(tags, availPkgs=pkgdata, suggests=TRUE) pkgList ## [1] "abind" "bit64" "bitops" "Cairo" ## [5] "caTools" "chron" "codetools" "colorspace" ## [9] "data.table" "dichromat" "digest" "evaluate" ## [13] "fastmatch" "foreach" "formatR" "fts" ## [17] "ggplot2" "gtable" "hexbin" "highr" ## [21] "Hmisc" "htmltools" "httpuv" "iterators" ## [25] "itertools" "its" "KernSmooth" "knitr" ## [29] "labeling" "lattice" "mapproj" "maps" ## [33] "maptools" "markdown" "MASS" "mgcv" ## [37] "mime" "multcomp" "munsell" "nlme" ## [41] "plyr" "proto" "quantreg" "RColorBrewer" ## [45] "Rcpp" "RCurl" "reshape" "reshape2" ## [49] "rgl" "RJSONIO" "scales" "shiny" ## [53] "stringr" "testit" "testthat" "timeDate" ## [57] "timeSeries" "tis" "tseries" "XML" ## [61] "xtable" "xts" "zoo"
Wow, look how these 7 packages have dependencies on 63 other packages!
You can graphically visualise these dependencies in a graph, by using the function makeDepGraph():
p < makeDepGraph(pkgList, availPkgs=pkgdata) library(igraph) plotColours < c("grey80", "orange") topLevel < as.numeric(V(p)$name %in% tags) par(mai=rep(0.25, 4)) set.seed(50) vColor < plotColours[1 + topLevel] plot(p, vertex.size=8, edge.arrow.size=0.5, vertex.label.cex=0.7, vertex.label.color="black", vertex.color=vColor) legend(x=0.9, y=0.9, legend=c("Dependencies", "Initial list"), col=c(plotColours, NA), pch=19, cex=0.9) text(0.9, 0.75, expression(xts %>% zoo), adj=0, cex=0.9) text(0.9, 0.8, "xts depends on zoo", adj=0, cex=0.9) title("Package dependency graph")
So, if you wanted to install the 7 most popular packages R packages (according to StackOverflow), R will in fact download and install up to 63 different packages!
The annual worldwide user conference useR! 2014 is underway at UCLA, beginning with a full day of tutorials. This year's useR! conference is a recordbreaker with more than 700 attendees, so most of the tutorial sessions have been jampacked. The tutorials cover a diverse array of R applications: data management, visualization, statistics and biostatistics, programming, and interactive applications. Follow the links below for more details about the packages and methods covered — some authors have already provided slides for their tutorials (and those that haven't probably will soon).
useR! 2014: Tutorials
Hadley Wickham's been working on the nextgeneration update to ggplot2 for a while, and now it's available on CRAN. The ggvis package is completely new, and combines a chaining syntax reminiscent of dplyr with the grammar of graphics concepts of ggplot2. The resulting charts are webready in scalable SVG format, and can easily be made interactive thanks to RStudio's shiny package.
For example, here's the code to create a scatterplot with a smoothing line from the mtcars data set:
mtcars %>%
ggvis(~wt, ~mpg) %>%
layer_points() %>%
layer_smooths()
And here's the corresponding SVG image:
SVG graphics are great online, because they're compact (this one's just 25Kb) and look great whatever size they're displayed at (it's a vector format, so you never get pixellation). The only think SVGs don't work well for is charts with millions of elements (points, lines, etc.) because then they can be large and slow to render. (The only other downside is that our blogging platform, TypePad, doesn't support SVG with its image tools, so I had to insert an <image> element into the HTML directly.)
You can easily add interactivity to a chart, by specifying parameters as input controls rather than numbers. Here's the code for the same chart, with a slider to specify the smoothing parameter and point size:
mtcars %>%
ggvis(~wt, ~mpg) %>%
layer_smooths(span = input_slider(0.5, 1, value = 1)) %>%
layer_points(size := input_slider(100, 1000, value = 100))
If you run that code in RStudio you'll get an interactive chart, or go here to see the same interactivity on a web page, rendered with RStudio's Shiny. For more details, check out the ggvis website linked below.
RStudio: ggvis 0.3 overview
To play in a World Cup national soccer team, a player must be a citizen of that country. But most World Cup players don't regularly play in the nation of their World Cup team. Some hold dual citizenship; others simply play for a league team in a foreign country where citizenship rules don't apply.
In this elegant chart, Guy Abel, a statistician and R programmer at the Vienna Institute of Demography, illustrates how the World Cup national teams are drawn from League players from around the world. (Click to enlarge.)
The arrows on the chart flow FROM the World Cup national teams TO the countries where the players currently play in league teams. Most of the players in Australia's World Cup team, for example, actually play for teams in the USA, South Korea, and European league teams. By contrast, about a third of Italy's team and almost all of Russia's play for domestic leagues (note the arrows folding back on themselves indicating players who play in home leagues).
The chart was created in the R language using the circlize package. The underlying data was scraped from Wikipedia, and the code to create this plot is available on github. Guy gives other several examples (with R code) of creating such "circular migration flow plots" on his blog.
Guy Abel: 2014 World Cup Squads
by Ilya Kipnis
In this post, I will demonstrate how to obtain, stitch together, and clean data for backtesting using futures data from Quandl. Quandl was previously introduced in the Revolutions Blog. Functions I will be using can be found in my IK Trading package available on my github page.
With backtesting, it’s often times easy to get data for equities and ETFs. However, ETFs are fairly recent financial instruments, making it difficult to conduct longrunning backtests (most of the ETFs in inception before 2003 are equity ETFs), and with equities, they are all correlated in some way, shape, or form to their respective index (S&P 500, Russell, etc.), and their correlations generally go to 1 right as you want to be diversified.
An excellent source of diversification is the futures markets, which contain contracts on instruments ranging as far and wide as metals, forex, energies, and more. Unfortunately, futures are not continuous in nature, and data for futures are harder to find.
Thanks to Quandl, however, there is some freely available futures data. The link can be found here.
The way Quandl structures its futures is that it uses two separate time series: the first is the front month, which is the contract nearest expiry, and the second is the back month, which is the next contract. Quandl’s rolling algorithm can be found here.
In short, Quandl rolls in a very simple manner; however, it is also incorrect, for all practical purposes. The reason being is that no practical trader holds a contract to expiry. Instead, they roll said contracts sometime before the expiry of the front month, based on some metric.
This algorithm uses the open interest cross to roll from front to back month and then lags that by a day (since open interest is observed at the end of trading days), and then “rolls” back when the front month open interest overtakes back month open interest (in reality, this is the back month contract becoming the new front month contract). Furthermore, the algorithm does absolutely no adjusting to contract prices. That is, if the front month is more expensive than the back month, a long position would lose the roll premium and a short position would gain it. This is in order to prevent the introduction of a dominating trend bias. The reason that the open interest is chosen is displayed in the following graph:
This is the graph of the open interest of the front month of oil in 2000 (black time series), and the open interest of the back month contract in red. They cross under and over in repeatable fashion, making a good choice on when to roll the contract.
Let’s look at the code:
quandClean < function(stemCode, start_date=NULL, verbose=FALSE, ...) {
The arguments to the function are a stem code, a start date, end date, and two print arguments (for debugging purposes). The stem code takes the form of CHRIS/<<EXCHANGE>>_<<CONTRACT STEM>>, such as “CHRIS/CME_CL” for oil.
Require(Quandl)
if(is.null(start_date)) {start_date < Sys.Date()365*1000}
if(is.null(end_date)) {end_date < Sys.Date()+365*1000}
frontCode < paste0(stemCode, 1)
backCode < paste0(stemCode, 2)
front < Quandl(frontCode, type="xts", start_date=start_date, end_date=end_date, ...)
interestColname < colnames(front)[grep(pattern="Interest", colnames(front))]
front < front[,c("Open","High","Low","Settle","Volume",interestColname)]
colnames(front) < c("O","H","L","C","V","OI")
back < Quandl(backCode, type="xts", start_date=start_date, end_date=end_date, ...)
back < back[,c("Open","High","Low","Settle","Volume",interestColname)]
colnames(back) < c("BO","BH","BL","BS","BV","BI") #B for Back
#combine front and back for comparison
both < cbind(front,back)
This code simply fetches both futures contracts from Quandl and combines them into one xts. Although Quandl takes a type argument, I have programmed this function specifically for xts types of objects, since I will use xtsdependent functionality later.
Let's move along.
#impute NAs in open interest with 1
both$BI[is.na(both$BI)] < 1
both$OI[is.na(both$OI)] < 1
both$lagBI < lag(both$BI)
both$lagOI < lag(both$OI)
#impute bad back month openinterest prints 
#if it is truly a low quantity, it won't make a
#difference in the computation.
both$OI[both$OI==1] < both$lagOI[both$OI==1]
both$BI[both$BI==1] < both$lagBI[both$BI==1]
This is the first instance of countermeasures in the function taken to counteract messy data. This imputes any open interest NAs with the value 1, and then imputing the first NA after a non NA day with the previous day's open interest. Usually, days on which open interest is not available are days after which the contract is lightly traded, so the values that will be imputed in cases during which the contract was not traded will be negligible. However, imputing an NA value with a zero during the midst of heavy trading has the potential to display the wrong contract as the one with the higher open interest.
both$OIdiff < both$OI  both$BI
both$tracker < NA
#the formal open interest cross from front to back
both$tracker[both$OIdiff < 0] < 1
both$tracker < lag(both$tracker)
#since we have to observe OI cross, we roll next day
#any time we're not on the back contract, we're on the front contract
both$tracker[both$OIdiff > 0] < 1
both$tracker < na.locf(both$tracker)
This code sets up the system for keeping track of which contract is in use. When the difference in open interest crosses under zero, that's the formal open interest cross, and we roll a day later. On the other hand, when the open interest difference crosses back over zero, that isn't a cross. That is the back month contract becoming the front month contract. For instance, assume that you rolled to the June contract in the third week of May. Quandl would display the June contract as the back contract in May, but come June, that June contract is now the front contract instead. So therefore, there is no lag on the computation in the second instance.
frontRelevant < both[both$tracker==1, c(1:6)]
backRelevant < both[both$tracker==1, c(7:12)]
colnames(frontRelevant) < colnames(backRelevant) < c("Open","High","Low","Close","Volume","OpenInterest")
relevant < rbind(frontRelevant, backRelevant)
relevant[relevant==0] < NA
# remove any incomplete days, print a message saying
# how many removed days
# print them if desired
instrument < gsub("CHRIS/", "", stemCode)
relevant$Open[is.na(relevant$Open)] < relevant$Close[(which(is.na(relevant$Open))1)]
NAs < which(is.na(relevant$Open)  is.na(relevant$High)  is.na(relevant$Low)  is.na(relevant$Close))
if(verbose) {
if(verbose) { message(paste(instrument, "had", length(NAs), "incomplete days removed from data.")) }
print(relevant[NAs,])
}
if(length(NAs) > 0) {
relevant < relevant[NAs,]
}
Using the previous tracker variable, the code is then able to compile the relevant data for the futures contract. That is, front contract when the front contract is more heavily traded, and vice versa.
This code uses xtsdependent functionality with the rbind call. In this instance, there are two separate streams: the front month stream, and the back month stream. Through the use of xts functionality, it's possible to merge the two streams indexed by time.
Next, the code imputes all NA open values with the close (settle) from the previous trading day. In the case that opens are the only missing field, I opted for this over removing the observation entirely. Next, any observation with a missing open, high, low, or close value gets removed. This is simply my personal preference, rather than attempting to take some form of liberty with imputing data to the highs, lows, and closes based on the previous day, or some other pattern thereof.
If verbose is enabled, the function will print the actual data removed.
ATR < ATR(HLC=HLC(relevant))
#Technically somewhat cheating, but could be stated in terms of
#lag 2, 1,and 0.
#A spike is defined as a data point on Close that's more than
#5 ATRs away from both the preceding and following day.
spikes < which(abs((relevant$Closelag(relevant$Close))/ATR$atr) > 5
& abs((relevant$Closelag(relevant$Close, 1))/ATR$atr) > 5)
if(verbose) {
message(paste(instrument, "had", length(spikes),"spike days removed from data."))
print(relevant[spikes,])
}
if(length(spikes) > 0){
relevant < relevant[spikes,]
}
out < relevant
return(out)
}
Finally, some countermeasures against spiky types of data. I define a spike as a price move in the closing price which is 5 ATRs (in this case, n=14) away in either direction from both the previous and next day. Spikes are removed. After this, the code is complete.
To put this into perspective visually, here is a plot of the 30day Federal Funds rate (CHRIS/CME_FF), from 2008, demonstrating all the improvements my process makes to Quandl’s raw data in comparison to the front month continuous (current) contract.
The raw, frontmonth data is displayed in black (the long lines are missing data from quandl, displayed as zeroes, but modified in scale for the sake of the plot). The results of the algorithm are presented in blue.
At the very beginning, it’s apparent that the more intelligent rolling algorithm adapts to what would be the new contract prices sooner. Secondly, all of those long bars on which Quandl had missing data have been removed so as not to interfere with calculations. Lastly, at the very end, that downward “spike” in prices has also been dealt with, making for what appears to be a significantly more correct pricing series.
To summarize, here's what the code does:
1) Downloads the two data streams
2) Keeps track of the proper contract at all time periods
3) Imputes or removes bad data, bad data being defined as incomplete observations or spikes in the data.
The result is an xts object practically identical to one downloaded with more common to find data, such as equities or ETFs, which allows for a greater array of diversification in terms of the instruments on which to backtest trading strategies, such as with the quantstrat package.
The results of such backtests can be found on my blog, and my two R packages (this functionality will be available in my IKTrading package) can be found on my Github page.
by Joseph Rickert
Last week,I had the opportunity to participate in the Second Academy of Science and Engineering (ASE) Conference on Big Data Science and Computing at Stanford University. Since the conference was held simultaneously with the two other conferences, one on Social Computing and the other on Cyber Security, it was definitely not an R crowd, and not even a typical Big Data crowd. Talks from the three programs were intermixed throughout the day so at any given moment you could find yourself looking for common ground in a conversation with mostly R aware, but language impartial fellow attendees. I don’t know whether this method of organization was the desperate result of necessity or genius, but I thought it worked out very well and made for a stimulating interaction dynamic. The ASE conference must have been difficult program to set up. The organizers, however, did a wonderful job mashing talks and themes together to make for an excellent experience.
There were several very good talks at the conference, however, the tutorial on Deep Learning and Natural Language Processing given by Richard Socher was truly outstanding. Richard is a PhD student in Stanford’s Computer Science Department studying under Chris Manning and Andrew Ng. Very rarely do you come across such a polished speaker with complete and casual command of complex material. And, while the delivery was impressive the content was jaw dropping. Richard walked through the Deep Learning methodology and tools being developed in Stanford’s AI lab and showed a number of areas where the Deep Learning techniques are yielding notable results; for example, a system for single sentence sentiment detection that improved positive/negative sentence classification by 5.4%. Have a look at Andrew Ng’s or Christopher Manning’s lists of publications to get a good idea of the outstanding work that is being done in this area.
A key concept covered in the tutorial is the ability to represent natural language structures, parsing trees for example, in a finite dimensional vector space and to build the theoretical and software tools in such a way that same method can be use to deconstruct and represent other hierarchies. The following slide indicates how a structures build for Natural Language Processing (NLP) can also be used to represent images.
This ability to bring a powerful, integrated set of tools to many different areas seems to be a key reason why neural nets and Deep Learning are suddenly getting so much attention. In a tutorial similar to the one Richard gave on Saturday, Richard and Chris Manning attribute the recent resurgence of Deep Learning to three factors:
The software used in the NLP and Deep Learning work at Stanford seems to be mostly based on Python and C. (See theano and Senna for example.) So far, it does not appear that much Deep Learning work at all is being done with R. However, things are looking up. 0xdata’s H20 Deep Learning implementation is showing impressive results, and the this algorithm is available in the h20 R package. Also, the R package darch and the very recent deepnet package, both of which offer implementations of Restricted Boltzman Machines, indicate that Deep Learning researchers are working in R.
Finally, to get a quick overview of the area have a look at the book, Deep Learning: Methods and Applications by Li Deng and Dony Yu of Microsoft Research is available online.
by Joseph Rickert
I was very happy to have been able to attend R / Finance 2014 which wrapped up a couple of weeks ago. In general, the talks were at a very high level of play, some dealing with brand new ideas and many presented at a significant level of technical or mathematical sophistication. Fortunately, most of the slides from the presentations are quite detailed and available at the conference site. Collectively, these presentations provide a view of the boundaries of the conceptual space imagined by the leaders in quantitative finance. Some of this space covers infrastructure issues involving ideas for pushing the limits of R (Some Performance Improvements for the R Engine) or building a new infrasturcture (New Ideas for Large Network Analysis) or (Building Simple Data Caches) for example. Others are involved with new computational tools (Solving Cone Constrained Convex Programs) or attempt to push the limits on getting some actionable insight from the mathematical abstrations: (Portfolio Inference withthei One Wierd Trick) or (Twinkle twinkle litle STAR: Smooth Transition AR Models in R) for example.
But while the talks may be illuminating, the real takeaways from the conference are the R packages. These tools embody the work of the thought leaders in the field of computational finance and are the means for anyone sufficiently motivated to understand this cutting edge work. By my count, 20 of the 44 tutorials and talks given at the conference were based on a particular R package. Some of the packages listed in the following table are wellestablished and others are workinprogress sitting out on RForge or GitHub, providing opportunities for the interested to get involved.
R Finance 2014 Talk 
Package 
Description 
Introduction to data.table 
Extension of the data frame 

An ExampleDriven Handson introduction to Rcpp 
Functions to facilitate integrating R with C++ 

Portfolio Optimization: Utility, Computation, Equities Applications 
Environment for reaching Financial Engineering and Computational Finance 

ReEvaluation of the Low Risk Anomaly via Matching 
Implementation of the Coarsened Exact matching Algorithm 

BCP Stability Analytics: New Directions in Tactical Asset Management 
Bayesian Analysis of Change Point Problems 

On the Persistence of Cointegration in Pairs Trading 
EngleGranger Cointegration Models 

Tests for Robust Versus Least Squares Factor Model Fits 
robust methods 

The R Package cccp: Solving Cone Constrained Convex Programs 
Solver for convex problems for cone constraints 

Twinkle, twinkle little STAR: Smooth Transition AR Models in R 
Modeling smooth transition models 

Asset Allocaton with Higher Order Moments and Factor Models 
Global optimization by differential evolution / Numerical methods for portfolio optimization 

Event Studies in R 
Event study and extreme event analysis 

An R package on Credit Default Swaps 
Provides tools for pricing credit default swaps 

New Ideas for Large Network Analysis, Implemented in R 
Implicitly restarted Lanczos methods for R 

Package “Intermediate and Long Memory Time Series 

Simulate & Detect Intermediate and Long Memory Processes / in development 
Stochvol: Dealing with Stochastic Volatility in Time Series 
Efficient Bayesian Inference for Stochastic Volatility (SV) Models 

Divide and Recombine for the Analysis of Large Complex Data with R 
Package for using R with Hadoop 

gpusvcalibration: Fast Stochastic Volatility Model Calibration using GPUs 
Fast calibration of stochastic volatility models for option pricing models 

The FlexBayes Package 
Provides an MCMC engine for the class of hierarchical feneralized linear models and connections to WinBUGS and OpenBUGS 

Building Simple Redis Data Caches 
Rcpp bindings for Redis that connects R to the Redis key/value store 

Package pbo: Probability of Backtest Overfitting 
Uses Combinatorial Symmetric Cross Validation to implement performance tests. 
Many of these packages / projects also have supplementary material that is worth chasing down. Be sure to take a look at Alexios Ghalanos recent post that provides an accessible introduction to his stellar keynote address.
Many thanks to the organizers of the conference who, once again, did a superb job, and to the many professionals attending who graciously attempted to explain their ideas to a dilletante. My impression was that most of the attendies thoroughly enjoyed themselves and that the general sentiment was expressed by the last slide of Stephen Rush's presentation:
By Matt Sundquist Plotly's CoFounder
Here at Plotly, we are on a mission to build a platform where data scientists can analyze data, create beautiful graphs and collaborate: like a GitHub for data, where you can share and find plots, data, and code. The benefits are:
Supporting open science is central to Plotly's mission. We are part of the rOpenSci project, dedicated to giving scientists access to tools and data in a way that promotes collaboration and enables reproducible work. We’re thrilled to support data and figure sharing.
Plotly lets you make ggplot2 plots, and then with one additional line of code, turn your plot into an interactive, online data visualization that you can edit with others. Plotly's R API lets users interact with plotly functions from their desktop R environment to create online graphs. Edit your graphs with others online with R or Plotly's online application.
Getting Started
The following code can get you started with Plotly and the plotly package.
install.packages("devtools")
library(devtools)
install_github("plotly", "ropensci")
library(plotly)
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: RJSONIO
## Loading required package: ggplot2
library(ggplot2)
Sign up on Plot.ly or like this:
signup("new_username", "your_email@domain.com")
That should have responded with your key. Plug in your account data or you can use our “RgraphingAPI” account # # and key:
py < plotly("RgraphingAPI", "ektgzomjbx")
Interactive Plots
First we'll draw a basic graph from the CO2 data set.
a < qplot(conc, uptake, data = CO2, colour = Type) + scale_colour_discrete(name = "")
py$ggplotly(a)
When you run the plot, it will call it in your browser. It will make a graph with a URL, like this:
Note: here is the iframe that is rendering the plot in this post:
<div class = "iframe_container">
<iframe src="https://plot.ly/~RgraphingAPI/1232/650/550" width="650" height=550" frameBorder="0" seamless="seamless" scrolling="no"></iframe>
</div>
Iframes are a way to serve web content from one webpage onto another. In this case, you can serve Plotly graphs onto your blog, website, or RPubs, using this line of HTML code (and swapping in your URL):
You can fork that graph edit it, and use the underlying data. Here’s the data from that plot.
And you can use the GUI to analyze your data, make new plots, and copy and paste data to add it to a current plot. You can also update the figure using R.
Lines, Scattters and More
Next up, we'll run an example from an excellent tutorial by Toby Hocking. Toby is part of the ggplotly team and building support for ggplot2 syntax.
# Generate data
data < data.frame(x=rep(1:10, times=5), group = rep(1:5, each=10))
data$lt < c("even", "odd")[(data$group%%2+1)] # linetype
data$group < as.factor(data$group)
data$y < rnorm(length(data$x), data$x, .5) + rep(rnorm(5, 0, 2), each=10)
d < ggplot() + geom_line(data=data, aes(x=x, y=y, colour=group, group=group, linetype=group)) + ggtitle("geom_line + scale_linetype automatic")
py$ggplotly(d)
The plot should look like this:
You can also edit the plot from the GUI, and save and share the new version, which will update your data, plot, and the figure:
The goal behind plotly is to make it easy to plug in your existing code, knowledge, and workflow from ggplot2 and R. The project is new and a good way to learn the ropes for Plotly’s own R API. We’d love to hear your thoughts, feedback, and ideas. We're on GitHub. We welcome your thoughts, issues, and pull requests.
by Joseph Rickert
R/Finance 2014 is just about a week away. Over the past four or five years this has become my favorite conference. It is small (300 people this year), exceptionally wellrun, and always offers an eclectic mix of theoretical mathematics, efficient, practical computing, industry best practices and trading “street smarts”. This clip of Blair Hull delivering a keynote speech at R/Finance 2012 is an example of the latter. It ought to resonate with anyone who has followed some of the hype surrounding Michael Lewis recent book Flash Boys.
In any event, I thought it would be a good time to look at the relationship between R and Finance and to highlight some resources that are available to students, quants and data scientists looking to do computational finance with R.
First off, consider what computational finance has done for R. From the point of view of the development and growth of the R language, I think it is pretty clear that computational finance has played the role of the ultimate “Killer App” for R. This high stakes, competitive environment where a theoretical edge or a marginal computational advantage can mean big rewards has led to R package development in several areas including time series, optimization, portfolio analysis, risk management, high performance computing and big data. Additionally, challenges and crisis in the financial markets have helped accelerate R’s growth into big data. In this podcast, Michael Kane talks about the analysis of the 2010 Flash Crash he did with Casey King and Richard Holowczak and describes using R with large financial datasets.
Conversely, I think that it is also clear that R has done quite a bit to further computational finance. R’s ability to facilitate rapid data analysis and visualization, its great number of available functions and algorithms and the ease with which it can interface to new data sources and other computing environments has made it a flexible tool that evolves and adapts at a pace that matches developments in the financial industry. The list of packages in the Finance Task View on CRAN indicates the symbiotic relationship between the development of R and the needs of those working in computational finance. On the one hand, there are over 70 packages under the headings Finance and Risk Management that were presumably developed to directly respond to a problem in computational finance. But, the task view also mentions that packages in the Econometrics, Multivariate, Optimization, Robust, SocialSciences and TimeSeries task views may also be useful to anyone working in computational finance. (The High Performance Computing and Machine Learning task views should probably also be mentioned.) The point is that while a good bit of R is useful to problems in computational finance, R has greatly benefited from the contributions of the computational finance community.
If you are just getting started with R and computational finance have a look at John Nolan’s R as a Tool in Computational Finance. Other resources for R and computational finance that you may find helpful are::
Package Vignettes
Several of the Finance related packages have very informative vignettes or associated websites. For example have a look at those for the packages portfolio, rugarch, rquantlib (check out the cool rotating distributions), PerformanceAnalytics, and MarkowitzR.
Data
Quandl has become a major source for financial data, which can be easily accessed from R.
Websites
Relevant websites include the RMetrics site, The R Trader, Burns Statistics and Guy Yollin’s repository of presentations
YouTube
Three videos that.I found to be particularly interesting are recordings of the presentations “Finance with R” by Ronald Hochreiter, “Using R in Academic Finance” by Sanjiv Das and Portfolio Construction in R by Elliot Norma.
Blogs
Over the past couple of years, RBloggers has posted quite a few finance related applications. Prominent among these is the series on Quantitative Finance Applications in R by Daniel Harrison on the Revolutions Blog.
Books
Books on R and Finance include the excellent RMetrics series of ebooks, Statistics and Data Analysis for Financial Engineering by David Ruppert, Financial Risk Modeling and Portfolio Optimization with R by Bernard Pfaff, Introduction to R for Quantitative Finance by Daróczi et al. and a brand new title Computational Finance: An Introductory Course with R by Agrimiro Arratia.
Coursera
This August, Eric Zivot will teach the course Introduction to Computational Finance and Financial Econometrics which will emphasize R.
The R Journal
The R Journal frequently publishes finance related papers. The present issue: Volume 5/2, December 2013 contains three relevant papers. Performance Attribution for Equity Portfolios by Yang Lu, David Kane, Temporal Disaggregation of Time Series by Christoph Sax, Peter Steiner, and betategarch: Simulation, Estimation and Forecasting of BetaSkewtEGARCH Models by Genaro Sucarrat.
Conferences
in addition to R/Finance (Chicago) and useR!2014 (Los Angeles) look for R based, computational finance expertise at the 8th R/RMetrics Workshop (Paris).
Community
RSigFinance is one of R’s most active special interest groups.