by Joseph Rickert
If I had to pick just one application to be the “killer app” for the digital computer I would probably choose Agent Based Modeling (ABM). Imagine creating a world populated with hundreds, or even thousands of agents, interacting with each other and with the environment according to their own simple rules. What kinds of patterns and behaviors would emerge if you just let the simulation run? Could you guess a set of rules that would mimic some part of the real world? This dream is probably much older than the digital computer, but according to Jan Thiele’s brief account of the history of ABMs that begins his recent paper, R Marries NetLogo: Introduction to the RNetLogo Package in the Journal of Statistical Software, academic work with ABMs didn’t really take off until the late 1990s.
Now, people are using ABMs for serious studies in economics, sociology, ecology, sociopsychology, anthropology, marketing and many other fields. No less of a complexity scientist than Doyne Farmer (of Dynamic Systems and Prediction Company fame) has argued in Nature for using ABMs to model the complexity of the US economy, and has published on using ABMs to drive investment models. in the following clip of a 2006 interview, Doyne talks about building ABMs to explain the role of subprime mortgages on the Housing Crisis. (Note that when asked about how one would calibrate such a model Doyne explains the need to collect massive amounts of data on individuals.)
Fortunately, the tools for building ABMs seem to be keeping pace with the ambition of the modelers. There are now dozens of platforms for building ABMs, and it is somewhat surprising that NetLogo, a tool with some whimsical terminology (e.g. agents are called turtles) that was designed for teaching children, has apparently become a defacto standard. NetLogo is Java based, has an intuitive GUI, ships with dozens of useful sample models, is easy to program, and is available under the GPL 2 license.
As you might expect, R is a perfect complement for NetLogo. Doing serious simulation work requires a considerable amount of statistics for calibrating models, designing experiments, performing sensitivity analyses, reducing data, exploring the results of simulation runs and much more. The recent JASS paper Facilitating Parameter Estimation and Sensitivity Analysis of AgentBased Models: a Cookbook Using NetLogo and R by Thiele and his collaborators describe the R / NetLogo relationship in great detail and points to a decade’s worth of reading. But the real fun is that Thiele’s RNetLogo package lets you jump in and start analyzing NetLogo models in a matter of minutes.
Here is part of an extended example from Thiele's JSS paper that shows R interacting with the Fire model that ships with NetLogo. Using some very simple logic, Fire models the progress of a forest fire.
Snippet of NetLogo Code that drives the Fire model
to go if not any? turtles ;; either fires or embers [ stop ] ask fires [ ask neighbors4 with [pcolor = green] [ ignite ] set breed embers ] fadeembers tick end ;; creates the fire turtles to ignite ;; patch procedure sproutfires 1 [ set color red ] set pcolor black set burnedtrees burnedtrees + 1 end
The general idea is that turtles represent the frontier of the fire run through a grid of randomly placed trees. Not shown in the above snippet is the logic that shows that the entire model is controlled by a single parameter representing the density of the trees.
This next bit of R code shows how to launch the Fire model from R, set the density parameter, and run the model.
# Launch RNetLogo and control an initial run of the # NetLogo Fire Model library(RNetLogo) nlDir < "C:/Program Files (x86)/NetLogo 5.0.5" setwd(nlDir) nl.path < getwd() NLStart(nl.path) model.path < file.path("models", "Sample Models", "Earth Science","Fire.nlogo") NLLoadModel(file.path(nl.path, model.path)) NLCommand("set density 70") # set density value NLCommand("setup") # call the setup routine NLCommand("go") # launch the model from R
Here we see the Fire model running in the NetLogo GUI after it was launched from RStudio.
This next bit of code tracks the progression of the fire as a function of time (model "ticks"), returns results to R and plots them. The plot shows the nonlinear behavior of the system.
# Investigate percentage of forest burned as simulation proceeds and plot library(ggplot2) NLCommand("set density 60") NLCommand("setup") burned < NLDoReportWhile("any? turtles", "go", c("ticks", "(burnedtrees / initialtrees) * 100"), as.data.frame = TRUE, df.col.names = c("tick", "percent.burned")) # Plot with ggplot2 p < ggplot(burned,aes(x=tick,y=percent.burned)) p + geom_line() + ggtitle("Nonlinear forest fire progression with density = 60")
As with many dynamical systems, the Fire model displays a phase transition. Setting the density lower than 55 will not result in the complete destruction of the forest, while setting density above 75 will very likely result in complete destruction. The following plot shows this behavior.
RNetLogo makes it very easy to programatically run multiple simulations and capture the results for analysis in R. The following two lines of code runs the Fire model twenty times for each value of density between 55 and 65, the region surrounding the pahse transition.
d < seq(55, 65, 1) # vector of densities to examine res < rep.sim(d, 20) # Run the simulation
The plot below shows the variability of the percent of trees burned as a function of density in the transition region.
My code to generate plots is available in the file: Download NelLogo_blog while all of the code from Thiele's JSS paper is available from the journal website.
Finally, here are a few more interesting links related to ABMs.
by Daniel Hanson
Recap and Introduction
Last time in part 1 of this topic, we used the xts and lubridate packages to interpolate a zero rate for every date over the span of 30 years of market yield curve data. In this article, we will look at how we can implement the two essential functions of a term structure: the forward interest rate, and the forward discount factor.
Definitions and Notation
We will apply a mix of notation adopted in the lecture notes Interest Rate Models: Introduction, pp 34, from the New York University Courant Institute (2005), along with chapter 1 of the book Interest Rate Models — Theory and Practice (2nd edition, Brigo and Mercurio, 2006). A presentation by Damiano Brigo from 2007, which covers some of the essential background found in the book, is available here, from the Columbia University website.
First, t ≧ 0 and T ≧ 0 represent time values in years.
P(t, T) represents the forward discount factor at time t ≦ T, where T ≦ 30 years (in our case), as seen at time = 0 (ie, our anchor date). In other words, again in US Dollar parlance, this means the value at time t of one dollar to be received at time T, based on continuously compounded interest. Note then that, trivially, we must have P(T, T) = 1.
R(t, T) represents the continuously compounded forward interest rate, as seen at time = 0, paid over the period [t, T]. This is also sometimes written as F(0; t, T) to indicate that this is the forward rate as seen at the anchor date (time = 0), but to keep the notation lighter, we will use R(t, T) as is done in the NYU notes.
We then have the following relationships between P(t, T) and R(t, T), based on the properties of continuously compounded interest:
P(t, T) = exp(R(t, T)・(T  t)) (A)
R(t, T) = log(P(t, T)) / (T  t) (B)
Finally, the interpolated the market yield curve we constructed last time allows us to find the value of R(0, T) for any T ≦ 30. Then, since by properties of the exponential function we have
P(t, T) = P(0, T) / P(0, t) (C)
we can determine any discount factor P(t, T) for 0 ≦ t ≦ T ≦ 30, and therefore any R(t, T), as seen at time = 0.
Converting from Dates to Year Fractions
By now, one might be wondering  when we constructed our interpolated market yield curve, we used actual dates, but here, we’re talking about time in units of years  what’s up with that? The answer is that we need to convert from dates to year fractions. While this may seem like a rather trivial proposition  for example, why not just divide the number of days between the start date and maturity date by 365.25  it turns out that, with financial instruments such as bonds, options, and futures, in practice we need to be much more careful. Each of these comes with a specified day count convention, and if not followed properly, it can result in the loss of millions for a trading desk.
For example, consider the Actual / 365 Fixed day count convention:
Year Fraction (ie, T  t) = (Days between Date1 and Date2) / 365
This is one commonly used convention and is very simple to calculate; however, for certain bond calculations, it can become much more complicated, as leap years are considered, as well as local holidays in the country in which the bond is traded, plus more esoteric conditions that may be imposed. To get an idea, look up day count conventions used for government bonds in various countries.
In the book by Brigo and Mercurio noted above, the authors in fact replace the “T  t” expression with a function (tau) τ(t, T), which represents the difference in time based upon the day count convention in effect.
Equation (A) then becomes
P(t, T) = exp(R(t, T)・ τ(t, T))
where τ(t, T) might be, for example, the Actual / 365 Fixed day count convention.
For the remainder of this article, we will implement to the “T  t” above as a day count function, as demonstrated in the example to follow.
Implementation in R
We will first revisit the example from our previous article on interpolation of market zero rates, and then use this to demonstrate the implementation of term structure functions to calculate forward discount factors and forward interest rates.
a) The setup from part 1
Let’s first go back to the example from part 1 and construct our interpolated 30year market yield curve, using cubic spline interpolation. Both the xts and lubridate packages need to be loaded. The code is republished here for convenience:
require(xts)
require(lubridate)
ad < ymd(20140514, tz = "US/Pacific")
marketDates < c(ad, ad + days(1), ad + weeks(1), ad + months(1),
ad + months(2), ad + months(3), ad + months(6),
ad + months(9), ad + years(1), ad + years(2),
ad + years(3), ad + years(5), ad + years(7),
ad + years(10), ad + years(15), ad + years(20),
ad + years(25), ad + years(30))
# Use substring(.) to get rid of "UTC"/time zone after the dates
marketDates < as.Date(substring(marketDates, 1, 10))
# Convert percentage formats to decimal by multiplying by 0.01:
marketRates < c(0.0, 0.08, 0.125, 0.15, 0.20, 0.255, 0.35, 0.55, 1.65,
2.25, 2.85, 3.10, 3.35, 3.65, 3.95, 4.65, 5.15, 5.85) * 0.01
numRates < length(marketRates)
marketData.xts < as.xts(marketRates, order.by = marketDates)
createEmptyTermStructureXtsLub < function(anchorDate, plusYears)
{
# anchorDate is a lubridate here:
endDate < anchorDate + years(plusYears)
numDays < endDate  anchorDate
# We need to convert anchorDate to a standard R date to use
# the "+ 0:numDays" operation
# Also, note that we need a total of numDays + 1
# in order to capture both end points.
xts.termStruct < xts(rep(NA, numDays + 1),
as.Date(anchorDate) + 0:numDays)
return(xts.termStruct)
}
termStruct < createEmptyTermStructureXtsLub(ad, 30)
for(i in (1:numRates)) termStruct[marketDates[i]] <
marketData.xts[marketDates[i]]
termStruct.spline.interpolate < na.spline(termStruct, method = "hyman")
colnames(termStruct.spline.interpolate) < "ZeroRate"
b) Check the plot
plot(x = termStruct.spline.interpolate[, "ZeroRate"], xlab = "Time",
ylab = "Zero Rate",
main = "Interpolated Market Zero Rates 20140514 
Cubic Spline Interpolation",
ylim = c(0.0, 0.06), major.ticks= "years",
minor.ticks = FALSE, col = "darkblue")
This gives us a reasonably smooth curve, preserving the monotonicity of our data points:
c) Implement functions for discount factors and forward rates
We will now implement these functions, utilizing equations (A), (B), and (C) above. We will also take advantage of the functional programming feature in R, by incorporating the Actual / 365 Fixed day count as a functional argument, as an example. One could of course implement any other day count convention as a function of two lubridate dates, and pass it in as an argument.
First, let’s implement the Actual / 365 Fixed day count as a function:
# Simple example of a day count function: Actual / 365 Fixed
# date1 and date2 are assumed to be lubridate dates, so that we can
# easily carry out the subtraction of two dates.
dayCountFcn_Act365F < function(date1, date2)
{
yearFraction < as.numeric((date2  date1)/365)
return(yearFraction)
}
Next, since the forward rate R(t, T) depends on the forward discount factor P(t, T), let’s implement the latter first:
# date1 and date2 are again assumed to be lubridate dates.
fwdDiscountFactor < function(anchorDate, date1, date2, xtsMarketData, dayCountFunction)
{
# Convert lubridate dates to base R dates in order to use as xts indices.
xtsDate1 < as.Date(date1)
xtsDate2 < as.Date(date2)
if((xtsDate1 > xtsDate2)  xtsDate2 > max(index(xtsMarketData)) 
xtsDate1 < min(index(xtsMarketData)))
{
stop("Error in date order or range")
}
# 1st, get the corresponding market zero rates from our
# interpolated market rate curve:
rate1 < as.numeric(xtsMarketData[xtsDate1]) # R(0, T1)
rate2 < as.numeric(xtsMarketData[xtsDate2]) # R(0, T2)
# P(0, T) = exp(R(0, T) * (T  0)) (A), with t = 0 <=> anchorDate
discFactor1 < exp(rate1 * dayCountFunction(anchorDate, date1))
discFactor2 < exp(rate2 * dayCountFunction(anchorDate, date2))
# P(t, T) = P(0, T) / P(0, t) (C), with t <=> date1 and T <=> date2
fwdDF < discFactor2/discFactor1
return(fwdDF)
}
Finally, we can then write a function to compute the forward interest rate:
# date1 and date2 are assumed to be lubridate dates here as well.
fwdInterestRate < function(anchorDate, date1, date2, xtsMarketData, dayCountFunction)
{
if(date1 == date2) {
fwdRate = 0.0 # the trivial case
} else {
fwdDF < fwdDiscountFactor(anchorDate, date1, date2,
xtsMarketData, dayCountFunction)
# R(t, T) = log(P(t, T)) / (T  t) (B)
fwdRate < log(fwdDF)/dayCountFunction(date1, date2)
}
return(fwdRate)
}
d) Calculate discount factors and forward interest rates
As an example, suppose we want to get the five year forward threemonth discount factor and interest rates:
# Five year forward 3month discount factor and forward rate:
date1 < anchorDate + years(5)
date2 < date1 + months(3)
fwdDiscountFactor(anchorDate, date1, date2, termStruct.spline.interpolate,
dayCountFcn_Act365F)
fwdInterestRate(anchorDate, date1, date2, termStruct.spline.interpolate,
dayCountFcn_Act365F)
# Results are:
# [1] 0.9919104
# [1] 0.03222516
We can also check the trivial case for P(T, T) and R(T, T), where we get 1.0 and 0.0 respectively, as expected:
# Trivial case:
fwdDiscountFactor(anchorDate, date1, date1, termStruct.spline.interpolate,
dayCountFcn_Act365F) # returns 1.0
fwdInterestRate(anchorDate, date1, date1, termStruct.spline.interpolate,
dayCountFcn_Act365F) # returns 0.0
Finally, we can verify that we can recover the market rates at various points along the curve; here, we look at 1Y and 30Y, and can check that we get 0.0165 and 0.0585, respectively:
# Check that we recover market data points:
oneYear < anchorDate + years(1)
thirtyYears < anchorDate + years(30)
fwdInterestRate(anchorDate, anchorDate, oneYear,
termStruct.spline.interpolate,
dayCountFcn_Act365F) # returns 1.65%
fwdInterestRate(anchorDate, anchorDate, thirtyYears,
termStruct.spline.interpolate,
dayCountFcn_Act365F) # returns 5.85%
Concluding Remarks
We have shown how one can implement a term structure of interest rates utilizing tools available in the R packages lubridate and xts. We have, however, limited the example to interpolation within the 30 year range of given market data without discussing extrapolation in cases where forward rates are needed beyond the endpoint. This case does arise in risk management for longer term financial instruments such as variable annuity and life insurance products, for example. One simpleminded  but sometimes used  method is to fix the zero rate that is given at the endpoint for all dates beyond that point. A more sophisticated approach is to use the financial cubic spline method as described in the paper by Adams (2001), cited in part 1 of the current discussion. However, xts unfortunately does not provide this interpolation method for us out of the box. Writing our own implementation might make for an interesting topic for discussion down the road  something to keep in mind. For now, however, we have a working term structure implementation in R that we can use to demonstrate derivatives pricing and risk management models in upcoming articles.
by Ilya Kipnis
In this post, I will demonstrate how to obtain, stitch together, and clean data for backtesting using futures data from Quandl. Quandl was previously introduced in the Revolutions Blog. Functions I will be using can be found in my IK Trading package available on my github page.
With backtesting, it’s often times easy to get data for equities and ETFs. However, ETFs are fairly recent financial instruments, making it difficult to conduct longrunning backtests (most of the ETFs in inception before 2003 are equity ETFs), and with equities, they are all correlated in some way, shape, or form to their respective index (S&P 500, Russell, etc.), and their correlations generally go to 1 right as you want to be diversified.
An excellent source of diversification is the futures markets, which contain contracts on instruments ranging as far and wide as metals, forex, energies, and more. Unfortunately, futures are not continuous in nature, and data for futures are harder to find.
Thanks to Quandl, however, there is some freely available futures data. The link can be found here.
The way Quandl structures its futures is that it uses two separate time series: the first is the front month, which is the contract nearest expiry, and the second is the back month, which is the next contract. Quandl’s rolling algorithm can be found here.
In short, Quandl rolls in a very simple manner; however, it is also incorrect, for all practical purposes. The reason being is that no practical trader holds a contract to expiry. Instead, they roll said contracts sometime before the expiry of the front month, based on some metric.
This algorithm uses the open interest cross to roll from front to back month and then lags that by a day (since open interest is observed at the end of trading days), and then “rolls” back when the front month open interest overtakes back month open interest (in reality, this is the back month contract becoming the new front month contract). Furthermore, the algorithm does absolutely no adjusting to contract prices. That is, if the front month is more expensive than the back month, a long position would lose the roll premium and a short position would gain it. This is in order to prevent the introduction of a dominating trend bias. The reason that the open interest is chosen is displayed in the following graph:
This is the graph of the open interest of the front month of oil in 2000 (black time series), and the open interest of the back month contract in red. They cross under and over in repeatable fashion, making a good choice on when to roll the contract.
Let’s look at the code:
quandClean < function(stemCode, start_date=NULL, verbose=FALSE, ...) {
The arguments to the function are a stem code, a start date, end date, and two print arguments (for debugging purposes). The stem code takes the form of CHRIS/<<EXCHANGE>>_<<CONTRACT STEM>>, such as “CHRIS/CME_CL” for oil.
Require(Quandl)
if(is.null(start_date)) {start_date < Sys.Date()365*1000}
if(is.null(end_date)) {end_date < Sys.Date()+365*1000}
frontCode < paste0(stemCode, 1)
backCode < paste0(stemCode, 2)
front < Quandl(frontCode, type="xts", start_date=start_date, end_date=end_date, ...)
interestColname < colnames(front)[grep(pattern="Interest", colnames(front))]
front < front[,c("Open","High","Low","Settle","Volume",interestColname)]
colnames(front) < c("O","H","L","C","V","OI")
back < Quandl(backCode, type="xts", start_date=start_date, end_date=end_date, ...)
back < back[,c("Open","High","Low","Settle","Volume",interestColname)]
colnames(back) < c("BO","BH","BL","BS","BV","BI") #B for Back
#combine front and back for comparison
both < cbind(front,back)
This code simply fetches both futures contracts from Quandl and combines them into one xts. Although Quandl takes a type argument, I have programmed this function specifically for xts types of objects, since I will use xtsdependent functionality later.
Let's move along.
#impute NAs in open interest with 1
both$BI[is.na(both$BI)] < 1
both$OI[is.na(both$OI)] < 1
both$lagBI < lag(both$BI)
both$lagOI < lag(both$OI)
#impute bad back month openinterest prints 
#if it is truly a low quantity, it won't make a
#difference in the computation.
both$OI[both$OI==1] < both$lagOI[both$OI==1]
both$BI[both$BI==1] < both$lagBI[both$BI==1]
This is the first instance of countermeasures in the function taken to counteract messy data. This imputes any open interest NAs with the value 1, and then imputing the first NA after a non NA day with the previous day's open interest. Usually, days on which open interest is not available are days after which the contract is lightly traded, so the values that will be imputed in cases during which the contract was not traded will be negligible. However, imputing an NA value with a zero during the midst of heavy trading has the potential to display the wrong contract as the one with the higher open interest.
both$OIdiff < both$OI  both$BI
both$tracker < NA
#the formal open interest cross from front to back
both$tracker[both$OIdiff < 0] < 1
both$tracker < lag(both$tracker)
#since we have to observe OI cross, we roll next day
#any time we're not on the back contract, we're on the front contract
both$tracker[both$OIdiff > 0] < 1
both$tracker < na.locf(both$tracker)
This code sets up the system for keeping track of which contract is in use. When the difference in open interest crosses under zero, that's the formal open interest cross, and we roll a day later. On the other hand, when the open interest difference crosses back over zero, that isn't a cross. That is the back month contract becoming the front month contract. For instance, assume that you rolled to the June contract in the third week of May. Quandl would display the June contract as the back contract in May, but come June, that June contract is now the front contract instead. So therefore, there is no lag on the computation in the second instance.
frontRelevant < both[both$tracker==1, c(1:6)]
backRelevant < both[both$tracker==1, c(7:12)]
colnames(frontRelevant) < colnames(backRelevant) < c("Open","High","Low","Close","Volume","OpenInterest")
relevant < rbind(frontRelevant, backRelevant)
relevant[relevant==0] < NA
# remove any incomplete days, print a message saying
# how many removed days
# print them if desired
instrument < gsub("CHRIS/", "", stemCode)
relevant$Open[is.na(relevant$Open)] < relevant$Close[(which(is.na(relevant$Open))1)]
NAs < which(is.na(relevant$Open)  is.na(relevant$High)  is.na(relevant$Low)  is.na(relevant$Close))
if(verbose) {
if(verbose) { message(paste(instrument, "had", length(NAs), "incomplete days removed from data.")) }
print(relevant[NAs,])
}
if(length(NAs) > 0) {
relevant < relevant[NAs,]
}
Using the previous tracker variable, the code is then able to compile the relevant data for the futures contract. That is, front contract when the front contract is more heavily traded, and vice versa.
This code uses xtsdependent functionality with the rbind call. In this instance, there are two separate streams: the front month stream, and the back month stream. Through the use of xts functionality, it's possible to merge the two streams indexed by time.
Next, the code imputes all NA open values with the close (settle) from the previous trading day. In the case that opens are the only missing field, I opted for this over removing the observation entirely. Next, any observation with a missing open, high, low, or close value gets removed. This is simply my personal preference, rather than attempting to take some form of liberty with imputing data to the highs, lows, and closes based on the previous day, or some other pattern thereof.
If verbose is enabled, the function will print the actual data removed.
ATR < ATR(HLC=HLC(relevant))
#Technically somewhat cheating, but could be stated in terms of
#lag 2, 1,and 0.
#A spike is defined as a data point on Close that's more than
#5 ATRs away from both the preceding and following day.
spikes < which(abs((relevant$Closelag(relevant$Close))/ATR$atr) > 5
& abs((relevant$Closelag(relevant$Close, 1))/ATR$atr) > 5)
if(verbose) {
message(paste(instrument, "had", length(spikes),"spike days removed from data."))
print(relevant[spikes,])
}
if(length(spikes) > 0){
relevant < relevant[spikes,]
}
out < relevant
return(out)
}
Finally, some countermeasures against spiky types of data. I define a spike as a price move in the closing price which is 5 ATRs (in this case, n=14) away in either direction from both the previous and next day. Spikes are removed. After this, the code is complete.
To put this into perspective visually, here is a plot of the 30day Federal Funds rate (CHRIS/CME_FF), from 2008, demonstrating all the improvements my process makes to Quandl’s raw data in comparison to the front month continuous (current) contract.
The raw, frontmonth data is displayed in black (the long lines are missing data from quandl, displayed as zeroes, but modified in scale for the sake of the plot). The results of the algorithm are presented in blue.
At the very beginning, it’s apparent that the more intelligent rolling algorithm adapts to what would be the new contract prices sooner. Secondly, all of those long bars on which Quandl had missing data have been removed so as not to interfere with calculations. Lastly, at the very end, that downward “spike” in prices has also been dealt with, making for what appears to be a significantly more correct pricing series.
To summarize, here's what the code does:
1) Downloads the two data streams
2) Keeps track of the proper contract at all time periods
3) Imputes or removes bad data, bad data being defined as incomplete observations or spikes in the data.
The result is an xts object practically identical to one downloaded with more common to find data, such as equities or ETFs, which allows for a greater array of diversification in terms of the instruments on which to backtest trading strategies, such as with the quantstrat package.
The results of such backtests can be found on my blog, and my two R packages (this functionality will be available in my IKTrading package) can be found on my Github page.
by Daniel Hanson
Introduction
Last time, we used the discretization of a Brownian Motion process with a Monte Carlo method to simulate the returns of a single security, with the (rather strong) assumption of a fixed drift term and fixed volatility. We will return to this topic in a future article, as it relates to basic option pricing methods, which we will then expand upon.
For more advanced derivatives pricing methods, however, as well as an important topic in its own right, we will talk about implementing a term structure of interest rates using R. This will be broken up into two parts: 1) working with dates and interpolation (the subject of today’s article), and 2) calculating forward interest rates and discount factors (the topic of our next article), using the results presented below.
Working with Dates in R
The standard date objects in base R, to be honest, are not the most userfriendly when it comes to basic date calculations such as adding days, months, or years. For example, just to add, say, five years to a given date, we would need to do the following:
startDate < as.Date('20140527')
pDate < as.POSIXlt(startDate)
endDate < as.Date(paste(pDate$year + 1900 + 5, "", pDate$mon + 1, "", pDate$mday, sep = ""))
So, you’re probably asking yourself, “wouldn’t it be great if we could just add the years like this?”:
endDate < startDate + years(5) # ?
Well, the good news is that we can, by using the lubridate package. In addition, instantiating a date is also easier, simply by indicating the date format (eg ymd(.) for yearmoday) as the function. Below are some examples:
require(lubridate)
startDate < ymd(20140527)
startDate # Result is: "20140527 UTC"
anotherDate < dmy(26102013)
anotherDate # Result is: "20131026 UTC"
startDate + years(5) # Result is: "20190527 UTC"
anotherDate  years(40) # Result is: "19731026 UTC"
startDate + days(2) # Result is: "20140529 UTC"
anotherDate  months(5) # Result is: "20130526 UTC"
Remark: Note that “UTC” is appended to the end of each date, which indicates the Coordinated Universal Time time zone (the default). While it is not an issue in these examples, it will be important to specify a particular time zone when we set up our interpolated yield curve, as we shall see shortly.
Interpolation with Dates in R
When interpolating values in a time series in R, we revisit with our old friend, the xts package, which provides both linear and cubic spline interpolation. We will demonstrate this with a somewhat realistic example.
Suppose the market yield curve data on 20140514 appears on a trader’s desk as follows:
Overnight ON 0.08%
One week 1W 0.125%
One month 1M 0.15%
Two months 2M 0.20%
Three months 3M 0.255%
Six months 6M 0.35%
Nine months 9M 0.55%
One year 1Y 1.65%
Two years 2Y 2.25%
Three years 3Y 2.85%
Five years 5Y 3.10%
Seven years 7Y 3.35%
Ten years 10Y 3.65%
Fifteen years 15Y 3.95%
Twenty years 20Y 4.65%
Twentyfive years 25Y 5.15%
Thirty years 30Y 5.85%
This is typical of yield curve data, where the dates get spread out farther over time. Each rate is a zero (coupon) rate, meaning that, in US Dollar parlance, the rate paid on $1 of debt today at a given point in the future, with no intermediate coupon payments; principal is returned and interest is paid in full at the end date. In order to have a fully functional term structure  that is, to be able to calculate forward interest rates and forward discount factors off of the yield curve for any two dates  we will need to interpolate the zero rates. The date from which the time periods are measured is often referred to as the “anchor date”, and we will adopt this terminology.
To start, we will use dates generated with lubridate operations to replicate the above yield curve data schedule. We will then match the dates up with the corresponding rates and put them into an xts object. We will use 20140514 as our anchor date.
# ad = anchor date, tz = time zone
# (see http://en.wikipedia.org/wiki/List_of_tz_database_time_zones)
ad < ymd(20140514, tz = "US/Pacific")
marketDates < c(ad, ad + days(1), ad + weeks(1), ad + months(1), ad + months(2),
ad + months(3), ad + months(6), ad + months(9), ad + years(1), ad + years(2), ad + years(3), ad + years(5), ad + years(7), ad + years(10), ad + years(15),
ad + years(20), ad + years(25), ad + years(30))
# Use substring(.) to get rid of "UTC"/time zone after the dates
marketDates < as.Date(substring(marketDates, 1, 10))
# Convert percentage formats to decimal by multiplying by 0.01:
marketRates < c(0.0, 0.08, 0.125, 0.15, 0.20, 0.255, 0.35, 0.55, 1.65,
2.25, 2.85, 3.10, 3.35, 3.65, 3.95, 4.65, 5.15, 5.85) * 0.01
numRates < length(marketRates)
marketData.xts < as.xts(marketRates, order.by = marketDates)
head(marketData.xts)
# Gives us the result:
# [,1]
# 20140514 0.00000
# 20140515 0.00080
# 20140521 0.00125
# 20140614 0.00150
# 20140714 0.00200
# 20140814 0.00255
Note that in this example, we specified the time zone. This is important, as lubridate will automatically convert to your local time zone from UTC. If we hadn’t specified the time zone, then out here on the US west coast, depending on the time of day, we could get this result from the head(.) command; where the dates get converted to Pacific time; note how the dates end up shifted back one day:
[,1]
20140513 0.00000
20140514 0.00080
20140520 0.00125
20140613 0.00150
20140713 0.00200
20140813 0.00255
Some might call this a feature, and others may call it a quirk, but in any case, it is better to specify the time zone in order to get consistent results.
If we take a little trip back to our earlier post on plotting xts data (see section Using plot(.) in the xts package) a few months ago, we can have a look at a plot of our market data:
colnames(marketData.xts) < "ZeroRate"
plot(x = marketData.xts[, "ZeroRate"], xlab = "Time", ylab = "Zero Rate",
main = "Market Zero Rates 20140514", ylim = c(0.0, 0.06),
major.ticks= "years", minor.ticks = FALSE, col = "red")
From here, the next steps will be:
To create an empty xts object, we borrow an idea from the xts vignette, (Section 3.1, “Creating new data: the xts constructor”), and come up with the following function:
createEmptyTermStructureXtsLub < function(anchorDate, plusYears)
{
# anchorDate is a lubridate here:
endDate < anchorDate + years(plusYears)
numDays < endDate  anchorDate
# We need to convert anchorDate to a standard R date to use
# the "+ 0:numDays" operation.
# Also, note that we need a total of numDays + 1 in order to capture both end points.
xts.termStruct < xts(rep(NA, numDays + 1), as.Date(anchorDate) + 0:numDays)
return(xts.termStruct)
}
Then, using our anchor date ad (20140514), we generate an empty xts object going out daily for 30 years:
termStruct < createEmptyTermStructureXtsLub(ad, 30)
head(termStruct)
tail(termStruct)
# Results are (as desired):
# > head(termStruct)
# [,1]
# 20140514 NA
# 20140515 NA
# 20140516 NA
# 20140517 NA
# 20140518 NA
# 20140519 NA
# > tail(termStruct)
# [,1]
# 20440509 NA
# 20440510 NA
# 20440511 NA
# 20440512 NA
# 20440513 NA
# 20440514 NA
Next, substitute in the known rates from our market yield curve. While there is likely a slicker way to do this, a loop is transparent, easy to write, and doesn’t take all that long to execute in this case:
for(i in (1:numRates)) termStruct[marketDates[i]] < marketData.xts[marketDates[i]]
head(termStruct, 8)
tail(termStruct)
# # Results are as follows. Note that we capture the market rates
# at ON, 1W, and 30Y:
# > head(termStruct, 8)
# [,1]
# 20140514 0.00000
# 20140515 0.00080
# 20140516 NA
# 20140517 NA
# 20140518 NA
# 20140519 NA
# 20140520 NA
# 20140521 0.00125
# > tail(termStruct)
# [,1]
# 20440509 NA
# 20440510 NA
# 20440511 NA
# 20440512 NA
# 20440513 NA
# 20440514 0.0585
Finally, we use interpolation methods provided in xts to fill in the rates in between. We have two choices, either linear interpolation, using the xts function na.approx(.), or cubic spline interpolation, using the function na.spline(.). As the names suggest, these functions will replace NA values in the xts object with interpolated values. Below, we show both options:
termStruct.lin.interpolate < na.approx(termStruct)
termStruct.spline.interpolate < na.spline(termStruct, method = "hyman")
head(termStruct.lin.interpolate, 8)
head(termStruct.spline.interpolate, 8)
tail(termStruct.lin.interpolate)
tail(termStruct.spline.interpolate)
# Results are as follows. Note again that we capture the market rates
# at ON, 1W, and 30Y:
# > head(termStruct.lin.interpolate, 8)
# ZeroRate
# 20140514 0.000000
# 20140515 0.000800
# 20140516 0.000875
# 20140518 0.001025
# 20140519 0.001100
# 20140520 0.001175
# 20140521 0.001250
# > head(termStruct.spline.interpolate, 8)
# ZeroRate
# 20140514 0.0000000000
# 20140515 0.0008000000
# 20140516 0.0009895833
# 20140517 0.0011166667
# 20140518 0.0011937500
# 20140519 0.0012333333
# 20140520 0.0012479167
# 20140521 0.0012500000
# > tail(termStruct.lin.interpolate)
# ZeroRate
# 20440509 0.05848084
# 20440510 0.05848467
# 20440511 0.05848851
# 20440512 0.05849234
# 20440513 0.05849617
# 20440514 0.05850000
# > tail(termStruct.spline.interpolate)
# ZeroRate
# 20440509 0.05847347
# 20440510 0.05847877
# 20440511 0.05848407
# 20440512 0.05848938
# 20440513 0.05849469
# 20440514 0.05850000
We can also have a look at the plots of the interpolated curves. Note that the linearly interpolated curve (in green) is the same as what we saw when we did a line plot of the market rates above:
plot(x = termStruct.lin.interpolate[, "ZeroRate"], xlab = "Time", ylab = "Zero Rate", main = "Interpolated Market Zero Rates 20140514",
ylim = c(0.0, 0.06), major.ticks= "years", minor.ticks = FALSE,
col = "darkgreen")
lines(x = termStruct.spline.interpolate[, "ZeroRate"],
col = "darkblue")
legend(x = 'topleft', legend = c("Lin Interp", "Spline Interp"),
lty = 1, col = c("darkgreen", "darkblue"))
One final note: When we calculated the interpolated values using cubic splines earlier, we set method = "hyman" in the xts function na.spline(.). By doing this, we are able to preserve the monotonicity in the data points. Without it, using the default, we get dips in the curve between some of the data points, as shown here:
# Using the default method for cubic spline interpolation:
termStruct.spline.interpolate.default < na.spline(termStruct)
colnames(termStruct.spline.interpolate.default) < "ZeroRate"
plot(x = termStruct.spline.interpolate.default[, "ZeroRate"], xlab = "Time",
ylab = "Zero Rate",
main = "Interpolated Market Zero Rates 20140514 
Default Cubic Spline",
ylim = c(0.0, 0.06), major.ticks= "years",
minor.ticks = FALSE, col = "darkblue")
Summary
In this article, we have demonstrated how one can take market zero rates, place them into an xts object, and then interpolate the rates in between the data points using the xts functions for linear and cubic spline interpolation. In an upcoming post  part 2  we will discuss the essential term structure functions for calculating forward rates and forward discount factors.
For further and mathematically more detailed reading on the subject, the paper Smooth Interpolation of Zero Curves (Ken Adams, 2001) is highly recommended. A “financial cubic spline” as described in the paper would in fact be a useful option to have as a method in xts cubic spline interpolation.
by Joseph Rickert
I was very happy to have been able to attend R / Finance 2014 which wrapped up a couple of weeks ago. In general, the talks were at a very high level of play, some dealing with brand new ideas and many presented at a significant level of technical or mathematical sophistication. Fortunately, most of the slides from the presentations are quite detailed and available at the conference site. Collectively, these presentations provide a view of the boundaries of the conceptual space imagined by the leaders in quantitative finance. Some of this space covers infrastructure issues involving ideas for pushing the limits of R (Some Performance Improvements for the R Engine) or building a new infrasturcture (New Ideas for Large Network Analysis) or (Building Simple Data Caches) for example. Others are involved with new computational tools (Solving Cone Constrained Convex Programs) or attempt to push the limits on getting some actionable insight from the mathematical abstrations: (Portfolio Inference withthei One Wierd Trick) or (Twinkle twinkle litle STAR: Smooth Transition AR Models in R) for example.
But while the talks may be illuminating, the real takeaways from the conference are the R packages. These tools embody the work of the thought leaders in the field of computational finance and are the means for anyone sufficiently motivated to understand this cutting edge work. By my count, 20 of the 44 tutorials and talks given at the conference were based on a particular R package. Some of the packages listed in the following table are wellestablished and others are workinprogress sitting out on RForge or GitHub, providing opportunities for the interested to get involved.
R Finance 2014 Talk 
Package 
Description 
Introduction to data.table 
Extension of the data frame 

An ExampleDriven Handson introduction to Rcpp 
Functions to facilitate integrating R with C++ 

Portfolio Optimization: Utility, Computation, Equities Applications 
Environment for reaching Financial Engineering and Computational Finance 

ReEvaluation of the Low Risk Anomaly via Matching 
Implementation of the Coarsened Exact matching Algorithm 

BCP Stability Analytics: New Directions in Tactical Asset Management 
Bayesian Analysis of Change Point Problems 

On the Persistence of Cointegration in Pairs Trading 
EngleGranger Cointegration Models 

Tests for Robust Versus Least Squares Factor Model Fits 
robust methods 

The R Package cccp: Solving Cone Constrained Convex Programs 
Solver for convex problems for cone constraints 

Twinkle, twinkle little STAR: Smooth Transition AR Models in R 
Modeling smooth transition models 

Asset Allocaton with Higher Order Moments and Factor Models 
Global optimization by differential evolution / Numerical methods for portfolio optimization 

Event Studies in R 
Event study and extreme event analysis 

An R package on Credit Default Swaps 
Provides tools for pricing credit default swaps 

New Ideas for Large Network Analysis, Implemented in R 
Implicitly restarted Lanczos methods for R 

Package “Intermediate and Long Memory Time Series 

Simulate & Detect Intermediate and Long Memory Processes / in development 
Stochvol: Dealing with Stochastic Volatility in Time Series 
Efficient Bayesian Inference for Stochastic Volatility (SV) Models 

Divide and Recombine for the Analysis of Large Complex Data with R 
Package for using R with Hadoop 

gpusvcalibration: Fast Stochastic Volatility Model Calibration using GPUs 
Fast calibration of stochastic volatility models for option pricing models 

The FlexBayes Package 
Provides an MCMC engine for the class of hierarchical feneralized linear models and connections to WinBUGS and OpenBUGS 

Building Simple Redis Data Caches 
Rcpp bindings for Redis that connects R to the Redis key/value store 

Package pbo: Probability of Backtest Overfitting 
Uses Combinatorial Symmetric Cross Validation to implement performance tests. 
Many of these packages / projects also have supplementary material that is worth chasing down. Be sure to take a look at Alexios Ghalanos recent post that provides an accessible introduction to his stellar keynote address.
Many thanks to the organizers of the conference who, once again, did a superb job, and to the many professionals attending who graciously attempted to explain their ideas to a dilletante. My impression was that most of the attendies thoroughly enjoyed themselves and that the general sentiment was expressed by the last slide of Stephen Rush's presentation:
by Joseph Rickert
R/Finance 2014 is just about a week away. Over the past four or five years this has become my favorite conference. It is small (300 people this year), exceptionally wellrun, and always offers an eclectic mix of theoretical mathematics, efficient, practical computing, industry best practices and trading “street smarts”. This clip of Blair Hull delivering a keynote speech at R/Finance 2012 is an example of the latter. It ought to resonate with anyone who has followed some of the hype surrounding Michael Lewis recent book Flash Boys.
In any event, I thought it would be a good time to look at the relationship between R and Finance and to highlight some resources that are available to students, quants and data scientists looking to do computational finance with R.
First off, consider what computational finance has done for R. From the point of view of the development and growth of the R language, I think it is pretty clear that computational finance has played the role of the ultimate “Killer App” for R. This high stakes, competitive environment where a theoretical edge or a marginal computational advantage can mean big rewards has led to R package development in several areas including time series, optimization, portfolio analysis, risk management, high performance computing and big data. Additionally, challenges and crisis in the financial markets have helped accelerate R’s growth into big data. In this podcast, Michael Kane talks about the analysis of the 2010 Flash Crash he did with Casey King and Richard Holowczak and describes using R with large financial datasets.
Conversely, I think that it is also clear that R has done quite a bit to further computational finance. R’s ability to facilitate rapid data analysis and visualization, its great number of available functions and algorithms and the ease with which it can interface to new data sources and other computing environments has made it a flexible tool that evolves and adapts at a pace that matches developments in the financial industry. The list of packages in the Finance Task View on CRAN indicates the symbiotic relationship between the development of R and the needs of those working in computational finance. On the one hand, there are over 70 packages under the headings Finance and Risk Management that were presumably developed to directly respond to a problem in computational finance. But, the task view also mentions that packages in the Econometrics, Multivariate, Optimization, Robust, SocialSciences and TimeSeries task views may also be useful to anyone working in computational finance. (The High Performance Computing and Machine Learning task views should probably also be mentioned.) The point is that while a good bit of R is useful to problems in computational finance, R has greatly benefited from the contributions of the computational finance community.
If you are just getting started with R and computational finance have a look at John Nolan’s R as a Tool in Computational Finance. Other resources for R and computational finance that you may find helpful are::
Package Vignettes
Several of the Finance related packages have very informative vignettes or associated websites. For example have a look at those for the packages portfolio, rugarch, rquantlib (check out the cool rotating distributions), PerformanceAnalytics, and MarkowitzR.
Data
Quandl has become a major source for financial data, which can be easily accessed from R.
Websites
Relevant websites include the RMetrics site, The R Trader, Burns Statistics and Guy Yollin’s repository of presentations
YouTube
Three videos that.I found to be particularly interesting are recordings of the presentations “Finance with R” by Ronald Hochreiter, “Using R in Academic Finance” by Sanjiv Das and Portfolio Construction in R by Elliot Norma.
Blogs
Over the past couple of years, RBloggers has posted quite a few finance related applications. Prominent among these is the series on Quantitative Finance Applications in R by Daniel Harrison on the Revolutions Blog.
Books
Books on R and Finance include the excellent RMetrics series of ebooks, Statistics and Data Analysis for Financial Engineering by David Ruppert, Financial Risk Modeling and Portfolio Optimization with R by Bernard Pfaff, Introduction to R for Quantitative Finance by Daróczi et al. and a brand new title Computational Finance: An Introductory Course with R by Agrimiro Arratia.
Coursera
This August, Eric Zivot will teach the course Introduction to Computational Finance and Financial Econometrics which will emphasize R.
The R Journal
The R Journal frequently publishes finance related papers. The present issue: Volume 5/2, December 2013 contains three relevant papers. Performance Attribution for Equity Portfolios by Yang Lu, David Kane, Temporal Disaggregation of Time Series by Christoph Sax, Peter Steiner, and betategarch: Simulation, Estimation and Forecasting of BetaSkewtEGARCH Models by Genaro Sucarrat.
Conferences
in addition to R/Finance (Chicago) and useR!2014 (Los Angeles) look for R based, computational finance expertise at the 8th R/RMetrics Workshop (Paris).
Community
RSigFinance is one of R’s most active special interest groups.
by Daniel Hanson
Last time, we looked at the fourparameter Generalized Lambda Distribution, as a method of incorporating skew and kurtosis into an estimated distribution of market returns, and capturing the typical fat tails that the normal distribution cannot. Having said that, however, the Normal distribution can be useful in constructing Monte Carlo simulations, and it is still commonly found in applications such as calculating the Value at Risk (VaR) of a portfolio, pricing options, and estimating the liabilities in variable annuity contracts.
We will start here with a simple example using R, focusing on a single security. Although perhaps seemingly trivial, this lays the foundation for in more complexities such as multiple correlated securities and stochastic interest rates. Discussion of these topics is planned for articles to come, as well as topics in option pricing.
Single Security Example
Under the oftused assumption of Brownian Motion dynamics, the return of a single security (eg, an equity) over a period of time Δt is approximately [See Pelsser for example.]
μΔt + σZ・sqrt(Δt) (*)
where μ is the mean annual return of the equity (also called the drift), and σ is its annualized volatility (i.e., standard deviation). Z is a standard Normal random variable, which makes the second term in the expression stochastic. The time t is measured in units of years, so for quarterly returns, for example, Δt = 0.25.
As μ, σ, and Δt are all known values, generating a simulated distribution of returns is a simple task. As an example, suppose we are interested in constructing a distribution of quarterly returns, where μ = 10% and σ= 15%. In order to get a reasonable approximation of the distribution, we will generate n = 10,000 returns.
n < 10000
# Fixing the seed gives us a consistent set of simulated returns
set.seed(106)
z < rnorm(n) # mean = 0 and sd = 1 are defaults
mu < 0.10
sd < 0.15
delta_t < 0.25
# apply to expression (*) above
qtr_returns < mu*delta_t + sd*z*sqrt(delta_t)
Note that R is “smart enough” here by adding the scalar mu*delta_t to each element of the vector in the second term, thus giving us a set of 10,000 simulated returns. Finally, let’s check out results. First, we plot a histogram:
hist(qtr_returns, breaks = 100, col = "green")
This gives us the following:
The symmetric bell shape of the histogram is consistent with the Normal assumption. Checking the annualized mean and variance of the simulated returns,
stats < c(mean(qtr_returns) * 4, sd(qtr_returns) * 2) # sqrt(4)
names(stats) < c("mean", "volatility")
stats
We get:
mean volatility
0.09901252 0.14975805
which is very close to our original parameter settings of μ = 10% and σ= 15%.
Again, this is rather simple example, but in future discussions, we will see how it extends to using Monte Carlo simulation for option pricing and risk management models.
This post is currently unavailable.
The BitCoin cryptocurrency has been much in the news of late. What, you don't have BitCoins? (Don't worry, neither do I.) Unless you have a supercomputer in your back yard and a cheap source of power, it's no longer really feasible to mine them yourself. But if you want some, several online exhanges will let you buy BitCoins for real money. But be warned: the price of BitCoins has been wildly volatile over the last year or so, so it's not really clear whether buying BitCoins would be a good long term investment.
But what if you could make money with BitCoins, without having to hold any over the long term? Here's one way you could do it, right now:
Now, exchange rates (especially BitCoin exhange rates) vary all the time, so you'll need to do a realtime arbitrage analysis to find a profitable sequence of trades at any given moment. R programmer Tom Johnson showed at the Bay Area R User Group an R script he wrote to do exactly that, pulling realtime foreignexchange rates from Quandl (with the quandl package for R) and solving the necessary equations to find the arbitrage opportunity. He's wrapped this R script into an easytouse Shiny app, so you can find BitCoin arbitrage opportunities via JPY and USD at any time:
So why isn't everyone rushing to MtGox to profit from the "SushiBurger Shuffle?". Well, as with most freemoney schemes, reallife problems intervene. The main issue is that BitCoin currency exchanges routinely take upwards of 10 minutes to clear, by which time the exchange rates may have changed — elimitating the profit opportunity, or even leading to a loss. And that's on a good day: in recent days, it's been difficult to access BitCoin exchanges at all. So this is more of an interesting puzzle than a real moneymaking opportunity. Nonetheless, it's also a great example of realtime financial analysis using R.
Tom Johnson: Best 2Currency Arbitrage with Bitcoin
by Daniel Hanson, QA Data Scientist, Revolution Analytics
Last time, we included a couple of examples of plotting a single xts time series using the plot(.) function (ie, said function included in the xts package). Today, we’ll look at some quick and easy methods for plotting overlays of multiple xts time series in a single graph. As this information is not explicitly covered in the examples provided with xts and base R, this discussion may save you a bit of time.
To start, let’s look at five sets of cumulative returns for the following ETF’s:
SPY SPDR S&P 500 ETF Trust
QQQ PowerShares NASDAQ QQQ Trust
GDX Market Vectors Gold Miners ETF
DBO PowerShares DB Oil Fund (ETF)
VWO Vanguard FTSE Emerging Markets ETF
We first obtain the data using quantmod, going back to January 2007:
library(quantmod)
tckrs < c("SPY", "QQQ", "GDX", "DBO", "VWO")
getSymbols(tckrs, from = "20070101")
Then, extract just the closing prices from each set:
SPY.Close < SPY[,4]
QQQ.Close < QQQ[,4]
GDX.Close < GDX[,4]
DBO.Close < DBO[,4]
VWO.Close < VWO[,4]
What we want is the set of cumulative returns for each, in the sense of the cumulative value of $1 over time. To do this, it is simply a case of dividing each daily price in the series by the price on the first day of the series. As SPY.Close[1], for example, is itself an xts object, we need to coerce it to numeric in order to carry out the division:
SPY1 < as.numeric(SPY.Close[1])
QQQ1 < as.numeric(QQQ.Close[1])
GDX1 < as.numeric(GDX.Close[1])
DBO1 < as.numeric(DBO.Close[1])
VWO1 < as.numeric(VWO.Close[1])
Then, it’s a case of dividing each series by the price on the first day, just as one would divide an R vector by a scalar. For convenience of notation, we’ll just save these results back into the original ETF ticker names and overwrite the original objects:
SPY < SPY.Close/SPY1
QQQ < QQQ.Close/QQQ1
GDX < GDX.Close/GDX1
DBO < DBO.Close/DBO1
VWO < VWO.Close/VWO1
We then merge all of these xts time series into a single xts object (à la a matrix):
basket < cbind(SPY, QQQ, GDX, DBO, VWO)
Note that is.xts(basket)returns TRUE. We can also have a look at the data and its structure:
> head(basket)
SPY.Close QQQ.Close GDX.Close DBO.Close VWO.Close
20070103 1.0000000 1.000000 1.0000000 NA 1.0000000
20070104 1.0021221 1.018964 0.9815249 NA 0.9890886
20070105 0.9941289 1.014107 0.9682540 1.0000000 0.9614891
20070108 0.9987267 1.014801 0.9705959 1.0024722 0.9720154
20070109 0.9978779 1.019889 0.9640906 0.9929955 0.9487805
20070110 1.0012025 1.031915 0.9526412 0.9517923 0.9460847
> tail(basket)
SPY.Close QQQ.Close GDX.Close DBO.Close VWO.Close
20140110 1.302539 NA 0.5727296 1.082406 0.5118100
20140113 1.285209 1.989130 0.5893833 1.068809 0.5053915
20140114 1.299215 2.027058 0.5750716 1.074166 0.5110398
20140115 1.306218 2.043710 0.5826177 1.092707 0.5109114
20140116 1.304520 2.043941 0.5886027 1.089411 0.5080873
20140117 1.299003 2.032377 0.6070778 1.090647 0.5062901
Note that we have a few NA values here. This will not be of any significant consequence for demonstrating plotting functions, however.
We will now look how we can plot all five series, overlayed on a single graph. In particular, we will look at the plot(.) functions in both the zoo and xts packages.
The xts package is an extension of the zoo package, so coercing our xts object basket to a zoo object is a simple task:
zoo.basket < as.zoo(basket)
Looking at head(zoo.basket) and tail(zoo.basket), we will get output that looks the same as what we got for the original xts basket object, as shown above; the date to data mapping is preserved. The plot(.) function provided in zoo is very simple to use, as we can use the whole zoo.basket object as input, and the plot(.) function will overlay the time series and scale the vertical axis for us with the help of a single parameter setting, namely the screens parameter.
Let’s now look at the code and the resulting plot in the following example, and then explain what’s going on:
# Set a color scheme:
tsRainbow < rainbow(ncol(zoo.basket))
# Plot the overlayed series
plot(x = zoo.basket, ylab = "Cumulative Return", main = "Cumulative Returns",
col = tsRainbow, screens = 1)
# Set a legend in the upper left hand corner to match color to return series
legend(x = "topleft", legend = c("SPY", "QQQ", "GDX", "DBO", "VWO"),
lty = 1,col = tsRainbow)
We started by setting a color scheme, using the rainbow(.) command that is included in the base R installation. It is convenient as R will take in an arbitrary positive integer value and select a sequence of distinct colors up to the number specified. This is a nice feature for the impatient or lazy among us (yes, guilty as charged) who don’t want to be bothered with picking out colors and just want to see the result right away.
Next, in the plot(.) command, we assign to x our “matrix” of time series in the zoo.basket object, labels for the horizontal and vertical axes (xlab, ylab), a title for the graph (main), the the colors (col). Last, but crucial, is the parameter setting screens = 1, which tells the plot command to overlay each series in a single graph.
Finally, we include the legend(.) command to place a color legend at the upper left hand corner of the graph. The position (x) may be chosen from the list of keywords "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center"; in our case, we chose "topleft". The legend parameter is simply the list of ticker names. The lty parameter refers to “line type”, and by setting it to 1, the lines in the legend are shown as solid lines, and as in the plot(.) function, the same color scheme is assigned to the parameter col.
Back to the color scheme, we may at some point need to show our results to a manager or a client, so in that case, we probably will want to choose colors that are easier on the eye. In this case, one can just store the colors into a vector, and then use it as an input parameter. For example, set
myColors < c("red", "darkgreen", "goldenrod", "darkblue", "darkviolet")
Then, just replace col = tsRainbow with col = myColors in the plot and legend commands:
plot(x = zoo.basket, xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", col = myColors, screens = 1)
legend(x = "topleft", legend = c("SPY", "QQQ", "GDX", "DBO", "VWO"),
lty = 1, col = myColors)
We then get a plot that looks like this:
While the plot(.) function in zoo gave us a quick and convenient way of plotting multiple time series, it didn’t give us much control over the scale used along the horizontal axis. Using plot(.) in xts remedies this; however, it involves doing more work. In particular, we can no longer input the entire “matrix” object; we must add each series separately in order to layer the plots. We also need to specify the scale along the vertical axis, as in the xts case, the function will not do this on the fly as it did for us in the zoo case.
We will use individual columns from our original xts object, basket. By using basket rather than basket.zoo, this tells R to use the xts version of the function rather than the zoo version (à la an overloaded function in traditional object oriented programming). Let’s again look at an example and the resulting plot, and then discuss how it works:
plot(x = basket[,"SPY.Close"], xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", ylim = c(0.0, 2.5), major.ticks= "years",
minor.ticks = FALSE, col = "red")
lines(x = basket[,"QQQ.Close"], col = "darkgreen")
lines(x = basket[,"GDX.Close"], col = "goldenrod")
lines(x = basket[,"DBO.Close"], col = "darkblue")
lines(x = basket[,"VWO.Close"], col = "darkviolet")
legend(x = 'topleft', legend = c("SPY", "QQQ", "GDX", "DBO", "VWO"),
lty = 1, col = myColors)
As mentioned, we need to add each time series separately in this case in order to get the desired overlays. If one were to try x = basket in the plot function, the graph would only display the first series (SPY), and a warning message would be returned to the R session. So, we first use the SPY series as input to the plot(.) function, and then add the remaining series with the lines(.) command. The color for each series is also included at each step (the same colors in our myColors vector).
As for the remaining arguments in the plot command, we use the same axis and title settings in xlab, ylab, and main. We set the scale of the vertical axis with the ylim parameter; noting from our previous example that VWO hovered near zero at the low end, and that DBO reached almost as high as 2.5, we set this range from 0.0 to 2.5. Two new arguments here are the major.ticks and minor.ticks settings. The major.ticks argument represents the periods in which we wish to chop up the horizontal axis; it is chosen from the set
{"years", "months", "weeks", "days", "hours", "minutes", "seconds"}
In the example above, we chose "years". The minor.ticks parameter can take values of TRUE/FALSE, and as we don’t need this for the graph, we choose FALSE. The same legend command that we used in the zoo case can be used here as well (using myColors to indicate the color of each time series plot). Just to compare, let’s change the major.ticks parameter to "months" in the previous example. The result is as follows:
A new package, called xtsExtra, includes a new plot(.) function that provides added functionality, including a legend generator. However, while it is available on RForge, it has not yet made it into the official CRAN repository. More sophisticated time series plotting capability can also be found in the quantmod and ggplot2 packages, and we will look at the ggplot2 case in an upcoming post. However, for plotting xts objects quickly and with minimal fuss, the plot(.) function in the zoo package fills the bill, and with a little more effort, we can refine the scale along the horizontal axis using the xts version of plot(.). R help files for each of these can be found by selecting plot.zoo and plot.xts respectively in help searches.