« Welcome guest blogger, Joseph Rickert | Main | Learning R »

June 16, 2010

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b01348414d42f970c

Listed below are links to weblogs that reference The distribution of online data usage:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Cellphone data usage is almost certainly a fat-tailed distribution, not a normal one. The second graph says absolutely nothing about the first, and shame on this article for promoting the idea that every distribution is normal.

I hate the cellphone companies too, but this is silly,
Bill Mill

It would be great to see the data because this must not be normal and must be fat tailed (as Bill stated) because with that SD then the 99% is less than 700, not the 800 in the chart.

Not to mention that the probability of using less than 0 MB of data is 0. With an "average" of 200 MB and a standard deviation of 159 MB, a normal distribution is going to be significantly wrong. My guess is a gamma distribution, but others could make sense.

Also, I'd take the "average" of 200 MB as the mode, not the mean, given the graph.

You can't model these data by a normal distribution. Note that your model implies that a large number of cellphone users use NEGATIVE megabytes. The real usage is bounded below by zero, but is unbounded above. Simple models for this situation include lognormal and gamma.

Use "Empirical Cumulative Distribution Function", ecdf() or something instead.

I use about 18G of mobile data monthly with 7 Mbit/s modem. New modems are 21 Mbit/s. Theoretically my plan is capped to 3G/month but the operator doesn't seem to care. This costs 13 euros in a month.

hmmmm....

Its highly likely (ie its a regular mistake) that the top distribution is a fit to data with some super user peaks that are not apparent (should have loessed it).

So the 97 percentile is (probably) in the right place but the curve itself a poor representation.

Agree with the above comments: This is not a normal distribution. The leftmost bound of x axis should be 0 (not negative) and using standard deviation of normal distribution is not correct. Need to use another distribution...

-Ralph Winters

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog