« Adoption of R by large Enterprise Software Vendors | Main | Quantitative Finance Applications in R - 2 »

January 08, 2014


Feed You can follow this conversation by subscribing to the comment feed for this post.

This is very cool! Could you do an example of GMM following the same principles?


this is a nice parallel method. however with simple models like a normal distribution, you should check if sufficient statistics exist. for the normal distribution, there is no need to loop through each data pointt, get each data points log pdf, and sum the resuts. if you work out the math you will find that you only need the sum and sum of squares from the entire data set, regardless of size, to get the likelihood. therefore you only need one initial loop to get the sufficient statistics, as opposed to looping for each optimization iteration.

however this is dependent on the underlying statistical model. cimplicated or convoluted likelihoods or distributions without a full set of sufficient statistics will need something like this.

Richard, thanks for the comment, however, your point about sufficient statistics is already mentioned in the post, i.e. "This is of course silly since there is a closed form for the mean and standard deviation, but it does help clearly demonstrate optimization scaled up."

The comments to this entry are closed.

R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog