« Where Ichiro Hits | Main | Because it's Friday: Keep Calm, and Carry on Charting »

June 17, 2011

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b01543303cd05970c

Listed below are links to weblogs that reference Big-Data PCA: 50 years of stock data:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

would be nice to see the time sries of the two components, we might be able to relate them to some observables

I might have said this here before and I will say this again. When doing "Big Data" kind of stuff...do show timing comparison..other wise it's meaningless. PCA in general is not that interesting. Also 9 million times a handful of columns dont make big data. Pick some gene data (billion rows by thousands cols) and then see how good this is compared to some other tools (for eg SAS, SPSS, R etc). Then this PCA would be interesting....

I suspect you want to be looking at log prices as otherwise your errors are going to be dominated by recent prices.

I would even say you have to look at returns not prices. The nonstationarity in the stock prices (or log-prices) will make the correlation coefficient meaningless. After you obtain the principal component of the returns you can obtain the principal component of the stock prices by transforming returns back to prices.

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog