by Joseph Rickert

In a recent post focused on plotting time series with the new dygraphs package, I did not show how easy it is to read financial data into R. However, in a thoughtful comment to the post, Achim Zeileis pointed out a number of features built into the basic R time series packages that everyone ought to know. In this post, I will just elaborate a little on what Achim sketched out. First off, I began the previous post with url strings that point to stock data for IBM and LinkedIn. Yahoo Finance make this sort of thing easy. A quick search for a stock will bring you to a page with historical stock prices. Then, its is only a matter of copying the url to the associated csv file, and as Achim points out:

If you already have the download URLs ibm_url and lnkd_url, then you can also simply use zoo::read.zoo() and merge the resulting closing prices:

And the ggplot2 figure can just be drawn with the autoplot() method for zoo series:

library("ggplot2") autoplot(z, facets = NULL)

The resulting plot takes you most of the way to the ggplot produced in the post.

The final formatting can be accomplished with the additional ggplot commands used in my post. This is just delightful: 3 lines of code that fetch the data and prepare it for plotting, and half-line to get a sophisticated default plot.

Of course, the manual step of hunting for urls is completely unnecessary. The get.historic.quote function in the tseries package will fetch a time series object for you that can also be plotted with the zoo autoplot function.

If you are really interested in working like a quant then the defaults in the quantmod package will give you the look and feel of a traders screen. The function:

getSymbols("IBM",src="google")

will bring the "xts" / "zoo" time series object, IBM, containing historic IBM stock data directly into your work space with no need even to make an assignment!

head(IBM) IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume 2007-01-03 97.18 98.40 96.26 97.27 9199500 2007-01-04 97.25 98.79 96.88 98.31 10557200 2007-01-05 97.60 97.95 96.91 97.42 7222900 2007-01-08 98.50 99.50 98.35 98.90 10340100 2007-01-09 99.08 100.33 99.07 100.07 11108900 2007-01-10 98.50 99.05 97.93 98.89 8744900

Moreover, IBM is an "OHLC" object that, with the right plotting function like chartSeries from the quantmod package, will produce the kind of open-high-low-close charts favored by stock analysts for charting financial instruments. (There is even an R function to determine if you have an OHLC object.)

is.OHLC(IBM) #[1] TRUE chartSeries(IBM,type="candle",subset='2010-08-24::2015-09-02')

The getSymbols function will fetch data from the Yahoo, Google, FRED and oanda financial services sites and also read as well as reading from MySQL data bases and .csv and RData files.

Quandl, however, is probably the best place to go for free (and premium) financial data. Once you signup for an account with Quandl the following code will get data frame with several columns of IBM stock information.

token <- "your_token_string" Quandl.auth(token) # Authenticate your token ibmQ = Quandl("WIKI/IBM", start_date="2010-08-24", end_date="2015-09-03") head(ibmQ) Date Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume 1 2015-09-02 144.92 145.08 143.18 145.05 4243473 0 1 144.92 145.08 143.18 145.05 4243473 2 2015-09-01 144.84 144.98 141.85 142.68 5258877 0 1 144.84 144.98 141.85 142.68 5258877 3 2015-08-31 147.26 148.40 146.26 147.89 4093078 0 1 147.26 148.40 146.26 147.89 4093078 4 2015-08-28 147.75 148.20 147.18 147.98 4058832 0 1 147.75 148.20 147.18 147.98 4058832 5 2015-08-27 148.63 148.97 145.66 148.54 4762003 0 1 148.63 148.97 145.66 148.54 4762003 6 2015-08-26 144.09 146.98 142.14 146.70 6186742 0 1 144.09 146.98 142.14 146.70 6186742

Here, I have presented just some of the very basics, still not coming close to describing all that R offers for acquiring and manipulating financial time series information.

As a final note: a couple of years ago I posted a short tutorial for getting started with the Quandl R API. Unfortunately, since that time Quandl has changed it's coding scheme so my R code from that tutorial will not run without changes. The code in the file Quandl_code, however, produces the following plot of Asian Currency exchange rates and may serve as an updated example.

Look here to decipher the Quandl currency codes.

by Joseph Rickert

I recently rediscovered the Timely Portfolio post on R Financial Time Series Plotting. If you are not familiar with this gem, it is well-worth the time to stop and have a look at it now. Not only does it contain some useful examples of time series plots mixing different combinations of time series packages (ts, zoo, xts) with multiple plotting systems (base R, lattice, etc.) but it provides an instructive, historical perspective that illustrates the non linear nature of progress in software development: new code is written to solve certain technical problems with the current software. Progress is made, and the new code makes it possible to do some things that couldn't be done before, but there were tradeoffs. Design choices for the new system make it a little more difficult to do something that was easy before. The net result: all of the software continues to advance in a messy mix, confusing the newcomer and providing critics with the opportunity to complain that there is not just one way to solve a problem.

Because this is turning out to be a week when more than a few people are likely lo be plotting financial time series, I thought I would be helpful to call attention to this time series resource and also take a look at the current state of the R art for performing a relatively simple task: plotting closing prices for two stocks on the same chart.

The following code just reads stock price data from Yahoo Finance for both IBM and LinkedIn from 8/24/2010 through 8/24/2015 and picks out the closing prices. I cheated a little here because I already knew the urls for the two series. I picked these two stocks because they both traded at about the same range for the period in question and because I wanted to see if the fact that one stock, LinkedIn, wasn't trading when at the beginning of the selected period caused any problems.

# Time Series Plotting library(ggplot2) library(xts) library(dygraphs) # Get IBM and Linkedin stock data from Yahoo Finance ibm_url <- "http://real-chart.finance.yahoo.com/table.csv?s=IBM&a=07&b=24&c=2010&d=07&e=24&f=2015&g=d&ignore=.csv" lnkd_url <- "http://real-chart.finance.yahoo.com/table.csv?s=LNKD&a=07&b=24&c=2010&d=07&e=24&f=2015&g=d&ignore=.csv" yahoo.read <- function(url){ dat <- read.table(url,header=TRUE,sep=",") df <- dat[,c(1,5)] df$Date <- as.Date(as.character(df$Date)) return(df)} ibm <- yahoo.read(ibm_url) lnkd <- yahoo.read(lnkd_url)

To my mind, the "go to" method for simple plotting that you will show to someone else is ggplot(). The following code suggested by Didzis Elferts, in answer to a StackOverflow question, accomplishes the task with great economy, using just a few more features than what the defaults would give you.

ggplot(ibm,aes(Date,Close)) + geom_line(aes(color="ibm")) + geom_line(data=lnkd,aes(color="lnkd")) + labs(color="Legend") + scale_colour_manual("", breaks = c("ibm", "lnkd"), values = c("blue", "brown")) + ggtitle("Closing Stock Prices: IBM & Linkedin") + theme(plot.title = element_text(lineheight=.7, face="bold"))

This next plot, which uses the dygraphs package, represents the new frontier for creating interactive time series plots in R.

# Plot with the htmlwidget dygraphs # dygraph() needs xts time series objects ibm_xts <- xts(ibm$Close,order.by=ibm$Date,frequency=365) lnkd_xts <- xts(lnkd$Close,order.by=lnkd$Date,frequency=365) stocks <- cbind(ibm_xts,lnkd_xts) dygraph(stocks,ylab="Close", main="IBM and Linkedin Closing Stock Prices") %>% dySeries("..1",label="IBM") %>% dySeries("..2",label="LNKD") %>% dyOptions(colors = c("blue","brown")) %>% dyRangeSelector()

Building on the work done for rCharts profiled in the Timely Portfolio piece, the dygraphs R package provides an interface to the dygraphs javascript library. With just a few lines of R code, it is now possible to produce charts that approach the polished look of the professional stock charting services - and no knowledge of JavaScript.

For anyone who works with financial data and has access to a Bloomberg terminal, there is a new R package to interface to Bloomberg data services: RBlpapi. (If you had searched for an R connection to Bloomberg you wouldn’t have found this one — Bloomberg is happy to have software that connects to its public API, but not to use its name, apparently.)

One of the authors, Dirk Eddelbuettel, gave a presentation at at the R/Finance conference last month (you can take a look at the slides here). Here are a few examples of the types of queries you can perform, which generate time series data objects in R:

bdp(c("ESA Index", "SPY US Equity"), c("PX_LAST", "VOLUME"))

bds("GOOG US Equity", "TOP_20_HOLDERS_PUBLIC_FILINGS")

bdh("SPY US Equity", c("PX_LAST", "VOLUME"), start.date=Sys.Date()-31)

getBars("ESA Index", startTime=ISOdatetime(2015,1,1,0,0,0))

getTicks("ESA Index", "TRADE", Sys.time()-60*60))

fieldSearch("VWAP")

The package is fast and lightweight (no Java required), and works well on Linux-based systems. Dirk reports that it's currently tricky to build on Windows, though: suggestions welcome in the comments or via the Github project linked below.

The R/Finance 2015 Conference wrapped up last Saturday at UIC. It has been seven years already, but R/Finance still has the magic! - mostly very high quality presentations and the opportunity to interact and talk shop with some of the most accomplished R developers, financial modelers and even a few industry legends such as Emanuel Derman and Blair Hull.

Emanuel Derman led off with a provocative but extraordinary keynote talk. Derman began way out there, somewhere well beyond the left field wall recounting the struggle of Johannes Kepler to formulate his three laws of planetary motion and closed with some practical advice on how to go about the business of financial modeling. Along the way he shared some profound, original thinking in an attempt to provide a theoretical context for evaluating and understanding the limitations of financial models. His argument hinged on making and defending the distinction between theories and models. Theories such as physical theories of Kepler, Newton and Einstein are ontological: they attempt to say something about how the world is. A theory attempts to provide "absolute knowledge of the world". A model, on the other hand, "tells you about what some aspect of the world is like". Theories can be wrong, but they are not the kinds of things you can interrogate with "why" questions.

Models work through analogies and similarities. They compare something we understand to something we don't. Spinoza's Theory of emotions is a theory because it attempts to explain human emotions axiomatically from first principles.

The Black Scholes equation, by contrast, is a model that tries to provide insight through the analogy with Brownian motion. As I understood it, the practical advice from all of this is to avoid the twin traps of attempting to axiomatize financial models as if they directly captured reality, and of believing that analyzing data, no matter how many terabytes you plow through, is a substitute for an educated intuition about how the world is.

The following table lists the remaining talks in alphabetical order by speaker.

Presentation | Package | Package Location | |

1 | Rohit Arora: Inefficiency of Modified VaR and ES | ||

2 | Kyle Balkissoon: A Framework for Integrating Portfolio-level Backtesting with Price and Quantity Information | PortFolioAnalytics | |

3 | Mark Bennett: Gaussian Mixture Models for Extreme Events | ||

4 | Oleg Bondarenko: High-Frequency Trading Invariants for Equity Index Futures | ||

5 | Matt Brigida: Markov Regime-Switching (and some State Space) Models in Energy Markets | code for regime switching | GitHub |

6 | John Burkett: Portfolio Optimization: Price Predictability, Utility Functions, Computational Methods, and Applications | DEoptim | CRAN |

7 | Matthew Clegg: The partialAR Package for Modeling Time Series with both Permanent and Transient Components | partialAR | CRAN |

8 | Yuanchu Dang: Credit Default Swaps with R (with Zijie Zhu) | CDS | GitHub |

9 | Gergely Daroczi: Network analysis of the Hungarian interbank lending market | ||

10 | Sanjiv Das: Efficient Rebalancing of Taxable Portfolios | ||

11 | Sanjiv Das: Matrix Metrics: Network-Based Systemic Risk Scoring | ||

12 | Emanuel Derman: Understanding the World | ||

13 | Matthew Dixon: Risk Decomposition for Fund Managers | ||

14 | Matt Dowle: Fast automatic indexing with data.table | data.table | CRAN |

15 | Dirk Eddelbuettel: Rblpapi: Connecting R to the data service that shall not be named | Rblpapi | GitHub |

16 | Markus Gesmann: Communicating risk - a perspective from an insurer | ||

17 | Vincenzo Giordano: Quantifying the Risk and Price Impact of Energy Policy Events on Natural Gas Markets Using R (with Soumya Kalra) | ||

18 | Chris Green: Detecting Multivariate Financial Data Outliers using Calibrated Robust Mahalanobis Distances | CerioliOutlierDetection | CRAN |

19 | Rohini Grover: The informational role of algorithmic traders in the option market | ||

20 | Marius Hofert: Parallel and other simulations in R made easy: An end-to-end study | simsalapar | CRAN |

21 | Nicholas James: Efficient Multivariate Analysis of Change Points | ecp | CRAN |

22 | Kresimir Kalafatic: Financial network analysis using SWIFT and R | ||

23 | Michael Kapler: Follow the Leader - the application of time-lag series analysis to discover leaders in S&P 500 | SIT | other |

24 | Ilya Kipnis: Flexible Asset Allocation With Stepwise Correlation Rank | ||

25 | Rob Krzyzanowski: Building Better Credit Models through Deployable Analytics in R | ||

26 | Bryan Lewis: More thoughts on the SVD and Finance | ||

27 | Yujia Liu and Guy Yollin: Fundamental Factor Model DataBrowser using Tableau and R | factorAnalytics | RFORGE |

28 | Louis Marascio: An Outsider's Education in Quantitative Trading | ||

29 | Doug Martin: Nonparametric vs Parametric Shortfall: What are the Differences? | ||

30 | Alexander McNeil: R Tools for Understanding Credit Risk Modelling | ||

31 | William Nicholson: Structured Regularization for Large Vector Autoregression | BigVAR | GitHub |

32 | Steven Pav: Portfolio Cramer-Rao Bounds (why bad things happen to good quants) | SharpeR | CRAN |

33 | Jerzy Pawlowksi: Are High Frequency Traders Prudent and Temperate? | HighFreq | GitHub |

34 | Bernhard Pfaff: The sequel of cccp: Solving cone constrained convex programs | cccp | CRAN |

35 | Stephen Rush: Information Diffusion in Equity Markets | ||

36 | Mark Seligman: The Arborist: a High-Performance Random Forest Implementation | Rborist | CRAN |

37 | Majeed Simaan: Global Minimum Variance Portfolio: a Horse Race of Volatilities | ||

38 | Anthoney Tsou: Implementation of Quality Minus Junk | qmj | GitHub |

39 | Marjan Wauters: Characteristic-based equity portfolios: economic value and dynamic style allocation | ||

40 | Hadley Wickham: Data ingest in R | readr | CRAN |

41 | Eric Zivot: Price Discovery Share-An Order Invariant Measure of Price Discovery with Application to Exchange-Traded Funds |

I particularly enjoyed Sanjiv Das' talks on *Efficient Rebalancing of Taxable Portfolios* and *Matrix Metrics: Network Based Systemic Risk Scoring*, both of which are approachable by non-specialists. Sanjiv became the first person to present two talks at an R/Finance conference, and thus the first person to win one of the best presentation prizes with the judges unwilling to say which of his two presentations secured the award.

Bryan Lewis' talk: *More thoughts on the SVD and Finance* was also notable for its exposition. Listening to Bryan you can almost fool yourself into believing that you could develop a love for numerical analysis and willingly spend an inordinate amount of your time contemplating the stark elegance of matrix decompositions.

Alexander McNeil's talk: *R Tools for Understanding Credit Risk Modeling* was a concise and exceptionally coherent tutorial on the subject, an unusual format for a keynote talk, but something that I think will be valued by students when the slides for all of the presentations become available.

Going out on a limb a bit, I offer a few un-researched, but strong impressions of the conference. This year, to a greater extent than I remember in previous years, talks were built around particular packages; talks 5, 7 and 8 for example. Also, it seemed that authors were more comfortable hightlighting and sharing packages that are work in progress; residing not on CRAN but on GitHub, R-Forge and other platforms. This may reflect a larger trend in R culture.

This is the year that cointegration replaced correlation as the operative concept in many models. The quants are way out ahead of the statisticians and data scientists on this one. Follow the money!

Speaking of data scientists: if you are a Random Forests fan do check out Mark Seligman's Rborist package, a high-performance and extensible implementation of the Random Forests algorithm.

Network analysis also seemed to be an essential element of many presentations. Gergely Daróczi's Shiny app for his analysis of the Hungarian interbank lending network is a spectacular example of how interactive graphics can enhance an analysis.

Finally, I'll finish up with some suggested reading in preparation for studying the slides of the presentations when they become available.

Sanjiv Das: Efficient Rebalancing of Taxable Portfolios

Sanjiv Das: Matrix Metrics: Network-based Systematic Risk Scoring

Emanuel Derman: Models.Behaving.Badly

Jurgen A. Doornik and R.J. O'Brien: Numerically Stable Cointegration Analysis (A recommendation from Bryan Lewis)

Arthur Koestler: The Sleepwalkers (I am certain this is the book whose title Derman forgot.)

Alexander J. McNeil and Rudiger Frey: Quantitative Risk Management Concepts, Techniques and Tools

Bernhard Pfaff: Analysis of Integrated and Cointegrated Time Series with R (Use R!)

Bruno Rodrigues teaches a class on applied econometrics at the University of Strasbourg, with a focus on implementing econometric concepts in the R language. Since many of the students don't have any previous programming background, he's put together a tutorial on the basics of applied econometrics with R. The first two chapters serve as a general-purpose beginners' introduction to R, while chapter 3 explores basic applied econometrics with R (primarily data summaries and linear models). A fourth chapter to come promises a focus on reproducible research, so check back for updates to this free document at the link below. (And if you're looking for a more advanced tutorial on econometrics with R, check out Econometrics in R by Grant Farnsworth.)

Bruno Rodrigues: Introduction to Programming Econometrics with R

by Tammer Kamel

Quandl's Founder

About 22 months ago I had the privilege of introducing Quandl to the world on this blog. At that time Quandl had about 2 million datasets and a few hundred users. (And we thought that was fabulous.) Now, at the end of 2014, we have some 12 million datasets on the site and tens of thousands of registered users. On most days we serve about 1 million API requests.

One thing that has not changed however, is the simplicity with which R users can access Quandl. Joseph’s post last year, and Ilya’s this year both demonstrated the ease of connecting to Quandl via R.

Adoption of Quandl in the R community was perhaps the biggest factor in our early success. Thus it is fitting that I am back guest-blogging here at this moment in time because we are actually at the dawn of a new chapter at Quandl: We’re adding commercial data to the site. We are going to make hundreds of commercial databases from domain experts like Zacks, ORATS, OptionWorks, Corre Group, MP Maritime, DelphX, Benzinga and many others available via the same simple API.

What makes this new foray interesting is that we won't be playing by the rules that the incumbent oligarchy of data distributors have established. Their decades-old model has not served consumers well: it keeps data prices artificially high, it cripples innovation, and it is antithetical to modern patterns of data consumption and usage. In fact, the business models around commercial data predate the internet itself. They can and should be disrupted. So we're going to give that a go.

Our plan is nothing less than democratizing supply and demand of commercial data. Anyone will be able to buy data on Quandl. There will be no compulsory bundling, forcing you to pay for extra services you don’t need; no lock-in to expensive long-term contracts; no opaque pricing; no usage monitoring or consumption limits; no artificial scarcity or degradation. Users will be able to buy just the datasets they need, a la carte, as and when they need them. They will get their data delivered precisely the way they want, with generous free previews, minimal usage restrictions and all the advantages of the Quandl platform. And of course, the data itself will be of the highest quality; professional grade data manufactured by the best curators in the world.

We will also democratize the supply of data. Anyone, from existing data vendors and primary data producers to individuals and entrepreneurs, will have equal access to the Quandl platform and the unmet demand of the Quandl user base. We want to create a situation where anyone capable of curating and maintaining a database can monetize their work. In time, we hope that competition among vendors will force prices to their economic minimum. This is the best possible way to deliver the lowest possible prices to our users.

At the same time this democratization should empower capable curators to realize the full value of their skills: If someone can build and maintain a database that commands $25 a month from 1000 people, then Quandl can be the vehicle that transforms that person from skilled analyst to successful data vendor.

If you were to characterize what we are doing as a marketplace for data you would be absolutely correct. We are convinced that fair and open competition will do great things, both for data consumers who are, frankly, being gouged, and for existing and aspirational data vendors who are disempowered. Open and fair competition is a panacea for both ills: it effects lower prices, wider distribution, better data quality, better documentation and better customer service.

Our foray into commercial data has already started with 6 pilot vendors. They range from entrepreneurially-minded analysts who are building databases to rival what the incumbents currently sell for exorbitant fees, to long-established data vendors progressive enough to embrace Quandl’s modern paradigm. We have no less than 25 vendors coming online in Q1 2015.

So, Quandl in 2015 should very quickly become everything an analyst needs: A free and unlimited API, dozens of package connections including to R, 12 million (and growing) free and open datasets, and access to commercial data from the best companies in the world at ever decreasing prices. Wish us luck!

by Don Boyd, Senior Fellow, Rockefeller Institute of Government

The Rockefeller Institute of Government is excited to be developing models to simulate the finances of public pension funds, using R.

Public pension funds invest contributions from governments and public sector workers in an effort to ensure that they can pay all promised benefits when due. State and local government pension funds in the United States currently have more than $3 trillion invested, more than $2 trillion of which is in equity-like investments. For example, NYC, has over $158 billion invested. Governments usually act as a backstop: if pension fund investment returns do better than expected, governments will be able to contribute less, but if investment returns fall short they will have to contribute more. When that happens, politicians must raise taxes or cut spending programs. These risks often are not well understood or widely discussed. (For a discussion of many of the most significant issues, see *Strengthening the Security of Public Sector Defined Benefit Plans**.)*

We are building stochastic simulation models in R to help quantify the investment risks and their potential consequences. We are modeling the finances of specific pension plans, taking into account all of the main flows such as current and expected benefit payouts to workers, contributions from governments and from workers, and investment returns, and how they affect liabilities and investible assets. The models will take into account the changing demographics of the workforce and retiree populations. We are modeling investment returns stochastically, examining different return scenarios and different economic environments, as well as different governmental contribution policies. We will use these models to evaluate the risks currently being taken and to help provide policy advice to governments, pension funds, and others. (For a full description of our approach, see *Modeling and Disclosing Public Pension Fund Risk, and Consequences for Pension Funding Security*)

We have chosen R because:

- It is extremely flexible, allowing us to do data collection, data management, exploratory data analysis, and other essential non-modeling tasks.
- Manipulating matrices is easy.
- It has sophisticated tools for modeling investment returns and for analyzing and presenting results of simulations. And it has great tools for visualizing results.
- The work can be completely open and reproducible, which is essential to the success of this project.

All programming languages have weaknesses. R’s great flexibility means that it is easy to write ill-organized programs that are hard to understand and debug. And poorly written programs that do not take advantage of R’s strengths can be extremely slow. We believe we can compensate for these weaknesses by making our programs modular, using a consistent programming style with appropriate documentation, and by using R features smartly and speed-testing where appropriate.

R analysts and programmers interested in learning about the opportunity to work on this project should examine the programmer/analyst position description and related materials at the Rockefeller Institute’s web site.

*The latest in a series by Daniel Hanson*

**Introduction**

Correlations between holdings in a portfolio are of course a key component in financial risk management. Borrowing a tool common in fields such as bioinformatics and genetics, we will look at how to use heat maps in R for visualizing correlations among financial returns, and examine behavior in both a stable and down market.

While base R contains its own heatmap(.) function, the reader will likely find the heatmap.2(.) function in the R package gplots to be a bit more user friendly. A very nicely written companion article entitled A short tutorial for decent heat maps in R (Sebastian Raschka, 2013), which covers more details and features, is available on the web; we will also refer to it in the discussion below.

We will present the topic in the form of an example.

**Sample Data**

As in previous articles, we will make use of R packages Quandl and xts to acquire and manage our market data. Here, in a simple example, we will use returns from the following global equity indices over the period 1998-01-05 to the present, and then examine correlations between them:

S&P 500 (US)

RUSSELL 2000 (US Small Cap)

NIKKEI (Japan)

HANG SENG (Hong Kong)

DAX (Germany)

CAC (France)

KOSPI (Korea)

First, we gather the index values and convert to returns:

library(xts) library(Quandl) my_start_date <- "1998-01-05" SP500.Q <- Quandl("YAHOO/INDEX_GSPC", start_date = my_start_date, type = "xts") RUSS2000.Q <- Quandl("YAHOO/INDEX_RUT", start_date = my_start_date, type = "xts") NIKKEI.Q <- Quandl("NIKKEI/INDEX", start_date = my_start_date, type = "xts") HANG_SENG.Q <- Quandl("YAHOO/INDEX_HSI", start_date = my_start_date, type = "xts") DAX.Q <- Quandl("YAHOO/INDEX_GDAXI", start_date = my_start_date, type = "xts") CAC.Q <- Quandl("YAHOO/INDEX_FCHI", start_date = my_start_date, type = "xts") KOSPI.Q <- Quandl("YAHOO/INDEX_KS11", start_date = my_start_date, type = "xts") # Depending on the index, the final price for each day is either # "Adjusted Close" or "Close Price". Extract this single column for each: SP500 <- SP500.Q[,"Adjusted Close"] RUSS2000 <- RUSS2000.Q[,"Adjusted Close"] DAX <- DAX.Q[,"Adjusted Close"] CAC <- CAC.Q[,"Adjusted Close"] KOSPI <- KOSPI.Q[,"Adjusted Close"] NIKKEI <- NIKKEI.Q[,"Close Price"] HANG_SENG <- HANG_SENG.Q[,"Adjusted Close"] # The xts merge(.) function will only accept two series at a time. # We can, however, merge multiple columns by downcasting to *zoo* objects. # Remark: "all = FALSE" uses an inner join to merge the data. z <- merge(as.zoo(SP500), as.zoo(RUSS2000), as.zoo(DAX), as.zoo(CAC), as.zoo(KOSPI), as.zoo(NIKKEI), as.zoo(HANG_SENG), all = FALSE) # Set the column names; these will be used in the heat maps: myColnames <- c("SP500","RUSS2000","DAX","CAC","KOSPI","NIKKEI","HANG_SENG") colnames(z) <- myColnames # Cast back to an xts object: mktPrices <- as.xts(z) # Next, calculate log returns: mktRtns <- diff(log(mktPrices), lag = 1) head(mktRtns) mktRtns <- mktRtns[-1, ] # Remove resulting NA in the 1st row

**Generate Heat Maps**

As noted above, heatmap.2(.) is the function in the gplots package that we will use. For convenience, we’ll wrap this function inside our own generate_heat_map(.) function, as we will call this parameterization several times to compare market conditions.

As for the parameterization, the comments should be self-explanatory, but we’re keeping things simple by eliminating the dendogram, and leaving out the trace lines inside the heat map and density plot inside the color legend. Note also the setting Rowv = FALSE, this ensures the ordering of the rows and columns remains consistent from plot to plot. We’re also just using the default color settings; for customized colors, see the Raschka tutorial linked above.

require(gplots) generate_heat_map <- function(correlationMatrix, title) { heatmap.2(x = correlationMatrix, # the correlation matrix input cellnote = correlationMatrix # places correlation value in each cell main = title, # heat map title symm = TRUE, # configure diagram as standard correlation matrix dendrogram="none", # do not draw a row dendrogram Rowv = FALSE, # keep ordering consistent trace="none", # turns off trace lines inside the heat map density.info="none", # turns off density plot inside color legend notecol="black") # set font color of cell labels to black }

Next, let’s calculate three correlation matrices using the data we have obtained:

- Correlations based on the entire data set from 1998-01-05 to the present
- Correlations of market indices during a reasonably calm period -- January through December 2004
- Correlations of falling market indices in the midst of the financial crisis - October 2008 through May 2009

Now, let’s call our heat map function using the total market data set:

generate_heat_map(corr1, "Correlations of World Market Returns, Jan 1998 - Present")

And then, examine the result:

As expected, we trivially have correlations of 100% down the main diagonal. Note that, as shown in the color key, the darker the color, the lower the correlation. By design, using the parameters of the heatmap.2(.) function, we set the title with the main = title parameter setting, and the correlations shown in black by using the notecol="black" setting.

Next, let’s look at a period of relative calm in the markets, namely the year 2004:

generate_heat_map(corr2, "Correlations of World Market Returns, Jan - Dec 2004")

This gives us:

generate_heat_map(corr2, "Correlations of World Market Returns, Jan - Dec 2004")

Note that in this case, at a glance of the darker colors in each of the cells, we can see that we have even lower correlations than those from our entire data set. This may of course be verified by comparing the numerical values.

Finally, let’s look at the opposite extreme, during the upheaval of the financial crisis in 2008-2009:

generate_heat_map(corr3, "Correlations of World Market Returns, Oct 2008 - May 2009")

This yields the following heat map:

Note that in this case, again just at first glance, we can tell the correlations have increased compared to 2004, by the colors changing from dark to light nearly across the board. While there are some correlations that do not increase all that much, such as the SP500/Nikkei and the Russell 2000/Kospi values, there are others across international and capitalization categories that jump quite significantly, such as the SP500/Hang Seng correlation going from about 21% to 41%, and that of the Russell 2000/DAX moving from 43% to over 57%. So, in other words, portfolio diversification can take a hit in down markets.

**Conclusion**

In this example, we only looked at seven market indices, but for a closer look at how correlations were affected during 2008-09 -- and how heat maps among a greater number of market sectors compared -- this article, entitled* Diversification is Broken*, is a recommended and interesting read.

by Joseph Rickert

If I had to pick just one application to be the “killer app” for the digital computer I would probably choose Agent Based Modeling (ABM). Imagine creating a world populated with hundreds, or even thousands of agents, interacting with each other and with the environment according to their own simple rules. What kinds of patterns and behaviors would emerge if you just let the simulation run? Could you guess a set of rules that would mimic some part of the real world? This dream is probably much older than the digital computer, but according to Jan Thiele’s brief account of the history of ABMs that begins his recent paper, *R Marries NetLogo: Introduction to the RNetLogo Package* in the *Journal of Statistical Software,* academic work with ABMs didn’t really take off until the late 1990s.

Now, people are using ABMs for serious studies in economics, sociology, ecology, socio-psychology, anthropology, marketing and many other fields. No less of a complexity scientist than Doyne Farmer (of Dynamic Systems and Prediction Company fame) has argued in *Nature* for using ABMs to model the complexity of the US economy, and has published on using ABMs to drive investment models. in the following clip of a 2006 interview, Doyne talks about building ABMs to explain the role of subprime mortgages on the Housing Crisis. (Note that when asked about how one would calibrate such a model Doyne explains the need to collect massive amounts of data on individuals.)

Fortunately, the tools for building ABMs seem to be keeping pace with the ambition of the modelers. There are now dozens of platforms for building ABMs, and it is somewhat surprising that NetLogo, a tool with some whimsical terminology (e.g. agents are called turtles) that was designed for teaching children, has apparently become a defacto standard. NetLogo is Java based, has an intuitive GUI, ships with dozens of useful sample models, is easy to program, and is available under the GPL 2 license.

As you might expect, R is a perfect complement for NetLogo. Doing serious simulation work requires a considerable amount of statistics for calibrating models, designing experiments, performing sensitivity analyses, reducing data, exploring the results of simulation runs and much more. The recent *JASS* paper* Facilitating Parameter Estimation and Sensitivity Analysis of Agent-Based Models: a Cookbook Using NetLogo and R *by Thiele and his collaborators describe the R / NetLogo relationship in great detail and points to a decade’s worth of reading. But the real fun is that Thiele’s RNetLogo package lets you jump in and start analyzing NetLogo models in a matter of minutes.

Here is part of an extended example from Thiele's *JSS* paper that shows R interacting with the Fire model that ships with NetLogo. Using some very simple logic, Fire models the progress of a forest fire.

Snippet of NetLogo Code that drives the Fire model

to go if not any? turtles ;; either fires or embers [ stop ] ask fires [ ask neighbors4 with [pcolor = green] [ ignite ] set breed embers ] fade-embers tick end ;; creates the fire turtles to ignite ;; patch procedure sprout-fires 1 [ set color red ] set pcolor black set burned-trees burned-trees + 1 end

The general idea is that turtles represent the frontier of the fire run through a grid of randomly placed trees. Not shown in the above snippet is the logic that shows that the entire model is controlled by a single parameter representing the density of the trees.

This next bit of R code shows how to launch the Fire model from R, set the density parameter, and run the model.

# Launch RNetLogo and control an initial run of the # NetLogo Fire Model library(RNetLogo) nlDir <- "C:/Program Files (x86)/NetLogo 5.0.5" setwd(nlDir) nl.path <- getwd() NLStart(nl.path) model.path <- file.path("models", "Sample Models", "Earth Science","Fire.nlogo") NLLoadModel(file.path(nl.path, model.path)) NLCommand("set density 70") # set density value NLCommand("setup") # call the setup routine NLCommand("go") # launch the model from R

Here we see the Fire model running in the NetLogo GUI after it was launched from RStudio.

This next bit of code tracks the progression of the fire as a function of time (model "ticks"), returns results to R and plots them. The plot shows the non-linear behavior of the system.

# Investigate percentage of forest burned as simulation proceeds and plot library(ggplot2) NLCommand("set density 60") NLCommand("setup") burned <- NLDoReportWhile("any? turtles", "go", c("ticks", "(burned-trees / initial-trees) * 100"), as.data.frame = TRUE, df.col.names = c("tick", "percent.burned")) # Plot with ggplot2 p <- ggplot(burned,aes(x=tick,y=percent.burned)) p + geom_line() + ggtitle("Non-linear forest fire progression with density = 60")

As with many dynamical systems, the Fire model displays a phase transition. Setting the density lower than 55 will not result in the complete destruction of the forest, while setting density above 75 will very likely result in complete destruction. The following plot shows this behavior.

RNetLogo makes it very easy to programatically run multiple simulations and capture the results for analysis in R. The following two lines of code runs the Fire model twenty times for each value of density between 55 and 65, the region surrounding the pahse transition.

d <- seq(55, 65, 1) # vector of densities to examine res <- rep.sim(d, 20) # Run the simulation

The plot below shows the variability of the percent of trees burned as a function of density in the transition region.

My code to generate plots is available in the file: Download NelLogo_blog while all of the code from Thiele's JSS paper is available from the journal website.

Finally, here are a few more interesting links related to ABMs.

- On validating ABMs
- ABMs and

by Daniel Hanson

**Recap and Introduction**

Last time in part 1 of this topic, we used the xts and lubridate packages to interpolate a zero rate for every date over the span of 30 years of market yield curve data. In this article, we will look at how we can implement the two essential functions of a term structure: the forward interest rate, and the forward discount factor.

**Definitions and Notation**We will apply a mix of notation adopted in the lecture notes Interest Rate Models: Introduction, pp 3-4, from the New York University Courant Institute (2005), along with chapter 1 of the book Interest Rate Models — Theory and Practice (2nd edition, Brigo and Mercurio, 2006). A presentation by Damiano Brigo from 2007, which covers some of the essential background found in the book, is available here, from the Columbia University website.

First, t ≧ 0 and T ≧ 0 represent time values in years.

P(t, T) represents the forward discount factor at time t ≦ T, where T ≦ 30 years (in our case), as seen at time = 0 (ie, our anchor date). In other words, again in US Dollar parlance, this means the value at time t of one dollar to be received at time T, based on continuously compounded interest. Note then that, trivially, we must have P(T, T) = 1.

R(t, T) represents the continuously compounded forward interest rate, as seen at time = 0, paid over the period [t, T]. This is also sometimes written as F(0; t, T) to indicate that this is the forward rate as seen at the anchor date (time = 0), but to keep the notation lighter, we will use R(t, T) as is done in the NYU notes.

We then have the following relationships between P(t, T) and R(t, T), based on the properties of continuously compounded interest:

P(t, T) = exp(-R(t, T)・(T - t)) (A)

R(t, T) = -log(P(t, T)) / (T - t) (B)

Finally, the interpolated the market yield curve we constructed last time allows us to find the value of R(0, T) for any T ≦ 30. Then, since by properties of the exponential function we have

P(t, T) = P(0, T) / P(0, t) (C)

we can determine any discount factor P(t, T) for 0 ≦ t ≦ T ≦ 30, and therefore any R(t, T), as seen at time = 0.

**Converting from Dates to Year Fractions**By now, one might be wondering -- when we constructed our interpolated market yield curve, we used actual dates, but here, we’re talking about time in units of years -- what’s up with that? The answer is that we need to convert from dates to year fractions. While this may seem like a rather trivial proposition -- for example, why not just divide the number of days between the start date and maturity date by 365.25 -- it turns out that, with financial instruments such as bonds, options, and futures, in practice we need to be much more careful. Each of these comes with a specified day count convention, and if not followed properly, it can result in the loss of millions for a trading desk.

For example, consider the Actual / 365 Fixed day count convention:

Year Fraction (ie, T - t) = (Days between Date1 and Date2) / 365

This is one commonly used convention and is very simple to calculate; however, for certain bond calculations, it can become much more complicated, as leap years are considered, as well as local holidays in the country in which the bond is traded, plus more esoteric conditions that may be imposed. To get an idea, look up day count conventions used for government bonds in various countries.

In the book by Brigo and Mercurio noted above, the authors in fact replace the “T - t” expression with a function (tau) τ(t, T), which represents the difference in time based upon the day count convention in effect.

Equation (A) then becomes

P(t, T) = exp(-R(t, T)・ τ(t, T))

where τ(t, T) might be, for example, the Actual / 365 Fixed day count convention.

For the remainder of this article, we will implement to the “T - t” above as a day count function, as demonstrated in the example to follow.

**Implementation in R**We will first revisit the example from our previous article on interpolation of market zero rates, and then use this to demonstrate the implementation of term structure functions to calculate forward discount factors and forward interest rates.

*a) The setup from part 1*

Let’s first go back to the example from part 1 and construct our interpolated 30-year market yield curve, using cubic spline interpolation. Both the xts and lubridate packages need to be loaded. The code is republished here for convenience:

require(xts)

require(lubridate)

ad <- ymd(20140514, tz = "US/Pacific")

marketDates <- c(ad, ad + days(1), ad + weeks(1), ad + months(1),

ad + months(2), ad + months(3), ad + months(6),

ad + months(9), ad + years(1), ad + years(2),

ad + years(3), ad + years(5), ad + years(7),

ad + years(10), ad + years(15), ad + years(20),

ad + years(25), ad + years(30))

# Use substring(.) to get rid of "UTC"/time zone after the dates

marketDates <- as.Date(substring(marketDates, 1, 10))

# Convert percentage formats to decimal by multiplying by 0.01:

marketRates <- c(0.0, 0.08, 0.125, 0.15, 0.20, 0.255, 0.35, 0.55, 1.65,

2.25, 2.85, 3.10, 3.35, 3.65, 3.95, 4.65, 5.15, 5.85) * 0.01

numRates <- length(marketRates)

marketData.xts <- as.xts(marketRates, order.by = marketDates)

createEmptyTermStructureXtsLub <- function(anchorDate, plusYears)

{

# anchorDate is a lubridate here:

endDate <- anchorDate + years(plusYears)

numDays <- endDate - anchorDate

# We need to convert anchorDate to a standard R date to use

# the "+ 0:numDays" operation

# Also, note that we need a total of numDays + 1

# in order to capture both end points.

xts.termStruct <- xts(rep(NA, numDays + 1),

as.Date(anchorDate) + 0:numDays)

return(xts.termStruct)

}

termStruct <- createEmptyTermStructureXtsLub(ad, 30)

for(i in (1:numRates)) termStruct[marketDates[i]] <-

marketData.xts[marketDates[i]]

termStruct.spline.interpolate <- na.spline(termStruct, method = "hyman")

colnames(termStruct.spline.interpolate) <- "ZeroRate"

*b) Check the plot*

plot(x = termStruct.spline.interpolate[, "ZeroRate"], xlab = "Time",

ylab = "Zero Rate",

main = "Interpolated Market Zero Rates 2014-05-14 -

Cubic Spline Interpolation",

ylim = c(0.0, 0.06), major.ticks= "years",

minor.ticks = FALSE, col = "darkblue")

This gives us a reasonably smooth curve, preserving the monotonicity of our data points:

*c) Implement functions for discount factors and forward rates*

We will now implement these functions, utilizing equations (A), (B), and (C) above. We will also take advantage of the functional programming feature in R, by incorporating the Actual / 365 Fixed day count as a functional argument, as an example. One could of course implement any other day count convention as a function of two lubridate dates, and pass it in as an argument.

First, let’s implement the Actual / 365 Fixed day count as a function:

# Simple example of a day count function: Actual / 365 Fixed

# date1 and date2 are assumed to be lubridate dates, so that we can

# easily carry out the subtraction of two dates.

dayCountFcn_Act365F <- function(date1, date2)

{

yearFraction <- as.numeric((date2 - date1)/365)

return(yearFraction)

}

Next, since the forward rate R(t, T) depends on the forward discount factor P(t, T), let’s implement the latter first:

# date1 and date2 are again assumed to be lubridate dates.

fwdDiscountFactor <- function(anchorDate, date1, date2, xtsMarketData, dayCountFunction)

{

# Convert lubridate dates to base R dates in order to use as xts indices.

xtsDate1 <- as.Date(date1)

xtsDate2 <- as.Date(date2)

if((xtsDate1 > xtsDate2) | xtsDate2 > max(index(xtsMarketData)) |

xtsDate1 < min(index(xtsMarketData)))

{

stop("Error in date order or range")

}

# 1st, get the corresponding market zero rates from our

# interpolated market rate curve:

rate1 <- as.numeric(xtsMarketData[xtsDate1]) # R(0, T1)

rate2 <- as.numeric(xtsMarketData[xtsDate2]) # R(0, T2)

# P(0, T) = exp(-R(0, T) * (T - 0)) (A), with t = 0 <=> anchorDate

discFactor1 <- exp(-rate1 * dayCountFunction(anchorDate, date1))

discFactor2 <- exp(-rate2 * dayCountFunction(anchorDate, date2))

# P(t, T) = P(0, T) / P(0, t) (C), with t <=> date1 and T <=> date2

fwdDF <- discFactor2/discFactor1

return(fwdDF)

}

Finally, we can then write a function to compute the forward interest rate:

# date1 and date2 are assumed to be lubridate dates here as well.

fwdInterestRate <- function(anchorDate, date1, date2, xtsMarketData, dayCountFunction)

{

if(date1 == date2) {

fwdRate = 0.0 # the trivial case

} else {

fwdDF <- fwdDiscountFactor(anchorDate, date1, date2,

xtsMarketData, dayCountFunction)

# R(t, T) = -log(P(t, T)) / (T - t) (B)

fwdRate <- -log(fwdDF)/dayCountFunction(date1, date2)

}

return(fwdRate)

}

*d) Calculate discount factors and forward interest rates*

As an example, suppose we want to get the five year forward three-month discount factor and interest rates:

# Five year forward 3-month discount factor and forward rate:

date1 <- anchorDate + years(5)

date2 <- date1 + months(3)

fwdDiscountFactor(anchorDate, date1, date2, termStruct.spline.interpolate,

dayCountFcn_Act365F)

fwdInterestRate(anchorDate, date1, date2, termStruct.spline.interpolate,

dayCountFcn_Act365F)

# Results are:

# [1] 0.9919104

# [1] 0.03222516

We can also check the trivial case for P(T, T) and R(T, T), where we get 1.0 and 0.0 respectively, as expected:

# Trivial case:

fwdDiscountFactor(anchorDate, date1, date1, termStruct.spline.interpolate,

dayCountFcn_Act365F) # returns 1.0

fwdInterestRate(anchorDate, date1, date1, termStruct.spline.interpolate,

dayCountFcn_Act365F) # returns 0.0

Finally, we can verify that we can recover the market rates at various points along the curve; here, we look at 1Y and 30Y, and can check that we get 0.0165 and 0.0585, respectively:

# Check that we recover market data points:

oneYear <- anchorDate + years(1)

thirtyYears <- anchorDate + years(30)

fwdInterestRate(anchorDate, anchorDate, oneYear,

termStruct.spline.interpolate,

dayCountFcn_Act365F) # returns 1.65%

fwdInterestRate(anchorDate, anchorDate, thirtyYears,

termStruct.spline.interpolate,

dayCountFcn_Act365F) # returns 5.85%

**Concluding Remarks**

We have shown how one can implement a term structure of interest rates utilizing tools available in the R packages lubridate and xts. We have, however, limited the example to interpolation within the 30 year range of given market data without discussing extrapolation in cases where forward rates are needed beyond the endpoint. This case does arise in risk management for longer term financial instruments such as variable annuity and life insurance products, for example. One simple-minded -- but sometimes used -- method is to fix the zero rate that is given at the endpoint for all dates beyond that point. A more sophisticated approach is to use the financial cubic spline method as described in the paper by Adams (2001), cited in part 1 of the current discussion. However, xts unfortunately does not provide this interpolation method for us out of the box. Writing our own implementation might make for an interesting topic for discussion down the road -- something to keep in mind. For now, however, we have a working term structure implementation in R that we can use to demonstrate derivatives pricing and risk management models in upcoming articles.