News is starting to leak that the Large Hadron Collider may have accomplished its primary mission of confirming the existence of the hypothesised and heretofore elusive subatomic particle, the Higgs Boson. And sure, billions of Euros worth of state-of-the-art high-energy machinery and an army of experimental and theoretical physicists probably had something to do with the discovery. But did you know Statistics played a part as well? Check out this explainer video from PhD comics, below (an R chart even appears at the 00:27 mark):
The basic method the LHC uses to detecting the Higgs Boson is to generate decay products from subatomic collisions, and to generate charts like the one below:
Depending on whether or not the Higgs Boson exists, this chart will look different. But the difference between what this chart looks like given that the Higgs Boson exists, or given that it doesn't, is very very small:
As the physicist interviewed in the video says, to resolve this difference "what you need is a HUGE amount of data". And we're talking REALLY huge: experiments like this are run 40 million times a second every day. (That's more than 40 terabytes of new data every day: now, that's Big Data!) Every day since the LHC was turned on, more evidence for the Higgs model has been accumulating, and it seems that now enough has accumulated for the researchers to be confident the Higgs Boson does indeed exist. Look for the formal confirmation in the next couple of days.
I read it more as, "They found a trace of the God particle" but did not find the nail it down. Someone of a difference but none the less, they have been making steady improvements during the past few years.
Which is good considering the money they have spent. In the billions.
Posted by: Steve (Construction Contractor) | July 03, 2012 at 13:34
I really hope that, with all those billions of dollars and cutting-edge technologies, they don't use an outdated/backward statistical approach like null-hypothesis testing, which is what I suspect when I hear them talking about "sigma" levels...
Posted by: Mike Lawrence | July 04, 2012 at 04:47
This discovery asssumes that the measurement errors are distributed normally, right? What are the chances that this is not the case (dependent/unbounded errors etc.)?
Posted by: Statistician | July 05, 2012 at 10:23
Let's assume shall we that these people are smart enough to know what kind of statistical methods they should be using
Posted by: Another Statistician | July 06, 2012 at 03:29
Btw this:
"what you need is a HUGE amount of data"
reminded me of this
“In some ways I think that scientists have
misled themselves into thinking that if you col-
lect enormous amounts of data you are bound
to get the right answer. You are not bound to
get the right answer unless you are enormously
smart. You can narrow down your questions;
but enormous sets of data often consist of
enormous numbers of small sets of data, none
of which by themselves are enough to solve the
thing you are interested in, and they fit together
in some complicated way."
Brad Efron (2010)- The significance magazine
Posted by: Statistician | July 06, 2012 at 09:50