« Quantitative Finance applications in R - 8 | Main | Model building with the iris data set for Big Data »

August 13, 2014


Feed You can follow this conversation by subscribing to the comment feed for this post.

SAS "beautiful graphics"?! I think they meant Stata.

I would not trust this. For example, Stata does support nonlinear regression, at least in some form, but the corresponding field does not have an entry (not even "limited").

For other methods, it does not take intco account user written programs for the commercial packages. Does that make any sense? Almost all methods in R are user written so why count them for R but not for the others?

I miss weighting. We use it quite often and despite a lot of statistics in R, weighting can be used only rarely whereas in SPSS everywhere. From this point of view SPSS contains much more statistics than R.

to Martin: Why do you think weighting can be used "rarely" in R? Which kind of weighting do you think is not available in R? It is almost certainly there and easy to do, but it may be a little hard to find out how to do it. I don't think there is anything in SPSS that you can't do in R.

Dear Thomas Speidel, the Stata implementation of non-linear regression is quite inflexible. If you got a project about implementing a non-linear regression for a complex functional form, you would use R, Matlab or a similar programming language. Following the general vibe of responses, I changed the “Non-linear Regression / Stata” field to “Limited” to avoid potential misinterpretations of the table. However, the truth is: the Stata implementation of non-linear regression is unsatisfactory for most industry-level research.

Also, as I mention right above the table on the original web-site, the “table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.” Most R libraries fall into that category, but for most Stata libraries you would need to browse over numerous (semi-)personal pages of obscure people with little documented programming experience. If you were to apply their libraries to any serious cause, you would have to go through their code line by line, making sure they do the right thing. There are some notable exceptions and I do include those in my table.

On a different note, I have updated the table recently to include recent advances, like attempts of the SAS Institute to address Boosting and Random Forests…. Any further feedback is appreciated.

Sergey - Just to clarify, you were responding to user Baumark, not me (Thomas Speidel).

Regarding the non-linear regression comment, non-linear is a broad generalization. One can add most splines in Stata simply by using official Stata commands. Overall, I do agree that R is more suitable for this.

Also, notice that the appropriate terminology is not libraries, but rather user written commands (or programs) for Stata and packages for R. For Stata, the SSC archive hosted at Boston College is the main repository.

Comparing statistical software packages is very tricky. For instance, where would R stand without packages? Also, if one wants the latest, chances are it will be in R. Stata's philosophy is not to keep up with the newest blleding edge method. I think the comparison could be improved by pointing out the general areas of strengths of each software.

Finally, regarding graphs R obviously has an edge, especially with ggplot2, rCharts/d3, gvis etc. Stata has a some solid graphing capabilities and follows the spirit of best visual evidence display as suggested by Edward Tufte.


Thomas, sorry for confusing you with another user...

Thank you for your comments. I guess some aspects of the software can be assessed only subjectively, like the visualization capabilities. I, personally, would still view the linked Stata graphs as not so cute compared to their counterparts in SAS or Matlab. But that is a matter of taste.

Regarding the terminology for libraries, I am open to other terms. We can call them programs, add-ons, packages, etc. The point is that there is a central depository for such libraries in R. The depository which is regularly checked, tested, commented on and updated. There is no such central and reliable depository in Stata in the following sense: oftentimes when we need a feature not implemented in the base version, we find ourselves on a semi-professional page of a user whose skills and work have to be verified. Even if there is a link to the library from the Boston depository, the Boston depository community is not very active in checking the code and pushing the authors to correct any flows.

I do think Stata has many strengths. This is my favorite tool for panel data analysis. Many methods developed on the econometrics side have rich and flexible implementations. The ability to handle large data sets is very helpful at times...

I am co-author of "R for Stata Users" (Springer, 2010), wrote the initial logistic and glm functions for Stata back in the early 1990s, and a host of later programs, many of which are now in the official Stata pacakge. I also have three CRAN packages, and usually employ both Stata and R for examples in my books. "Methods of Statistical Model Estimation" (2013, Chapman & Hall/CRC) is for R programmers. I am rather surprised to see that user authored Stata statistical programs are thought to be unreliable, but user authored R functions are presumably error free. Not so. Software that is published in the Stata Journal (an indexed publication) is tested and retested by referees. Also software that is produced for Stata user-group conferences that exist around the world. I have used a number of user-authored functions in my books, finding them to be accurate and well structured, eg fmm (finite mixture models). I always test these procedures against other software to assure accuracy. There are some well used functions in the default R download that are quite poor and provide users with what I regard as poor statistics. I have moved to R from Stata, but usually use the function which works best for a particular research study. Believe me, Stata (extended by user programs - typically authored by professors who have a need to use that statistical function in their own reseach, and they choose to share it with others) has a number of functions that are not in R, or that are superior to R in terms of associated fit statistics that are provided with the output. On the other hand, R has capabilities that Stata does not yet have -- written by R users. Your statistics are not at all fair to the capabilities of Stata.

@Joseph, just to be clear, I didn't create the table, it was created "Stanford PhD". But it has generated some very interesting discussion -- thanks for contributing.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr