You're at the supermarket. Which line should you choose for fastest service?
(The numbers are the number of items in each cart.) Dan Meyer wondered about this, and rather than merely speculating or diving into queue theory, went out and collected data. (A very Mythbusters attitude, of which I approve.) He spent ninety minutes watching the checkout lines at his local supermarket, counting the number of items in each shopper's cart and the amount of time it took them to be checked out (from loading up their first item to completing the financial transaction), and the method of payment.The conclusion? In the example above, you're likely better off in the shorter line with one loaded cart, rather than the "express" lane with several carts with a few items. The reason is that it takes about 3 seconds to scan each item, but on average about 35 seconds to process each shopper. In the example above we have 11 items and 4 shoppers in the express line for a wait time of 176 seconds, versus 1 shopper and 19 items for a wait time of 96 seconds. Here's the calculation, as done in R:
> shopping <- read.csv(
+ "http://spreadsheets.google.com/pub?key=tE9pXlYLwTAeiDWxL8h_viA&single=true&gid=0&range=A1%3AE37&output=csv",
+ as.is=TRUE)
> shopping$seconds <- as.numeric(as.difftime(shopping$Total.Time))
> p1 <- coef(lm(seconds ~ Number.of.Items, shopping,subset=-8))
> p1
(Intercept) Number.of.Items
35.309942 3.191313
> p1 %*% c(4,11) # express lane
[,1]
[1,] 176.3442
> p1 %*% c(1,19) # short lane
[,1]
[1,] 95.9449
Note that there's one outlier (row 8) in the data file which I've deleted (and Dan apparently did so, too). The "Intercept" in our regression is the average processing time, and "Number.of.items" is the time to process each item. I used matrix math to calculate the wait times in our example, because I'm that awesome (or lazy, depending on your perspective).
The situation is complicated a bit by the method of payment: cash is by far the fastest (about 18 seconds) compared to credit card or check payments (41 and 54 seconds respectively). If you're curious, here's how I got those numbers in R:
> fit <- lm(seconds ~ Number.of.Items + Payment - 1, shopping,subset=-8)
> coef(summary(fit))
Estimate Std. Error t value Pr(>|t|)
Number.of.Items 2.955684 0.2997510 9.860464 6.340141e-11
Paymentcard 41.198885 6.7016800 6.147546 9.237407e-07
Paymentcard/cash 128.310215 22.1575721 5.790807 2.505366e-06
Paymentcash 17.974740 7.4723845 2.405489 2.252260e-02
Paymentcheck 53.997121 16.9930735 3.177596 3.430851e-03
Even if all the express lane patrons were paying with cash, the shorter line is still better even when the single patron pays with a credit card (104 seconds versus 97 seconds waiting).
Another reason to choose the line with one loaded cart when you have but a few items: the person with the loaded cart will usually offer that you go ahead since you have relatively few items...
Posted by: Avram | September 25, 2009 at 14:17
I believe someone has done this study using scanner data, the Moretti and Mas "Peers at work" paper in:
http://www.stanford.edu/group/SITE/archive/SITE_2006/Web%20Session%207/Session_7_Program.htm
The paper used the results to test whether there was peer pressure among cashiers. ( I admired the creativity of using scanner data, used generally to measure preferences only, as a source of timestamps.)
Cheers,
JCS
Posted by: Jose C Silva | September 25, 2009 at 21:09
Right after I clicked "post," I realized that my contribution would be made a lot more positive by pointing out that there's publicly available scanner data (it may require an email asking for it) at the University of Chicago Business School Center for Marketing.
It may make an interesting exercise for a class on R in business analytics -- again, for the creative use of data that is usually used only for measuring preferences or response functions (to marketing actions).
JCS
Posted by: Jose C Silva | September 25, 2009 at 21:22
I am most fascinated by the reading of data directly from a spreadsheet published in Google Docs. I have been using RGoogleDocs package which lets me read data but then I have to sign in.
I am fascinated by your line
shopping <- read.csv(
+ "http://spreadsheets.google.com/pub?key=tE9pXlYLwTAeiDWxL8h_viA&single=true&gid=0&range=A1%3AE37&output=csv",
+ as.is=TRUE)
Where I can I read more about the arguments (or whatever you call them, such as the "single", "gid", "range")?
Posted by: Farrel Buchinsky | September 26, 2009 at 16:21
Farrell, I got that link from Dan Meyer's post. I'll investigate more about how to read a Google Docs spreadsheet in R using this method, and post about it.
Posted by: David Smith | September 28, 2009 at 09:26
There was surprisingly little information on the Web on how to link to a Google Spreadsheet export, but I think I've figured it out.
Posted by: David Smith | September 28, 2009 at 11:49
I saw your post. Thank you very much.
RGoogleDocs is a little bit more of a pain but sometimes it is imperative that the data not by publicly available, or at least not yet.
Posted by: Farrel Buchinsky | September 28, 2009 at 12:25