In a poll with 570 respondents conducted last month at KDNuggets, the R software was the most frequent response to the question, "What programming languages you used for data mining / data analysis in the past 12 months?". The results are tabled below (respondents could select more than one response):
Getting a plane boarded quickly is important: you, the passenger, get to spend less time stuck on the ground and more time getting to where you need to go when the plane boards quickly. It's also important for cost-sensitive airlines to minimizing turnaround time: the average cost to an airline company for eachminute of time spent at the terminal is roughly $30. Airlines employ various methods to speed up the boarding process: boarding from the rear of the plane first; boarding by pre-assigned "zones", or even doing away with assigned seating altogether (Southwest). But can they do better?
An interesting paper submitted to the Journal of Air Transport Management tries to find out. Using 72 actor "passengers" and a mock single-aisle cabin on a Hollywood soundstage, the investigators timed boardings under the standard methods used by airlines today, plus one new boarding process: the Steffen method:
The Steffen method ... orders the passengers in such a way that adjacent passengers in line are sitting in corresponding seats two rows apart from each other (e.g., 12A, 10A, 8A, 6A, etc.). This method trades a small number of aisle interferences at the front of the cabin, for the benefit of having multiple passengers stowing their luggage simultaneously.
You can see the Steffen Method in action below:
In these tests, the worst-performing method was boarding in blocks (at 6 minutes 54 seconds for these 72 passengers), closely followed by boarding from back to front (6:11). (The authors also claim, but did not test, that boarding from the back to the front of the cabin is nearly as bad as boarding from the front to the back.) Having passengers board in random order and then find their pre-assigned seats did better than either of these methods (4:44): sometimes no structure is better, after all. Boarding windows first, then middle seats, then aisles did give a small improvement (4:11). But the best method overall was this Steffen method, at 3 minutes 36. The benefit seems to come mainly from making it possible for many passengers to stow their bags in parallel. I do wonder how this would work in practice though, especially for larger planes: you'd need to assign 12 distinct boarding zones to passengers (odd windows left, even windows right, even windows left, etc.) which seems like it might discretize the boarding process too much. But given the time and money savings at stake, it's worth more investigation.
There's an interesting discussion thread on LinkedIn going on now on the relative benefits of R versus SAS in the commercial sector. Oleg Okun kicks off the discussion with this question:
Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you think, despite being free and having a lot of packages, R is still not a favorite in Data Mining/Predictive Analytics in the corporate world?
What follows is an in-depth discussion (more than 130 comments so far) comparing the two statistical software systems. Steve Miller condenses the discussion in a great post at the Information Management blog. Themes covered include: the benefits and purported risks of using open-source software vs commercial software; dealing with large data sets (one R user notes: "I've used a very fast (~16Tb RAM) computer to run simulations on hundreds of billions of observations"); availability of skills for new hires ("Many of our customers have the problem of needing to spend the time and money to train new hires in SAS because their new hires have only used R"); availibility of support for R (Revolution Analytics provides support for R); and many other topics. One sub-thread focused on quality in open-source software, for which Steve had an excellent riposte:
There's little argument that the vast international R community provides access to the latest statistical models and procedures before they're available in proprietary SAS. But SAS proponents counter that R users assume more risk with software quality than do those of SAS. In fact, an oft-quoted comment from a SAS executive on the "benefits" of R goes something like “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” My take after 8 years of heavy R usage is that I've never worked with a more stable, bug-free piece of software.
Check out the full thread and contribute to the discussion on LinkedIn.
R is already used in manycompanies around the world, but many people who could benefit from using R still don't know what it is or how it could help them. That's why we're reaching out to the expertise of the R community to help us showcase the many applications of R in business. We're putting up a pool of $20,000 in prizes to encourage members of the R community to create "use cases" of R applied to business problems, that would help convince statisticians, data analysts, IT administrators and project leaders switch to R. Applications which make use of large data sets are especially encouraged, and we're also offering a bonus prize for applications which make use of the unique features of the Revolution R Enterprise, such as its big-data statistics package or the Web Services API.
You can read more about the contest and its rules at R community site inside-R.org, which is also where contest entries will be published (under a Creative Commons license). You can also learn more in this press release. The contest is open now, and the deadline for entries is October 31.