« Because it's Friday: Meet the Neighbors | Main | R data concepts, for Excel users »

March 06, 2018


Feed You can follow this conversation by subscribing to the comment feed for this post.

Thank you for this wonderful article. I read Bradley Effort’s article Bayes’ Theorem in the 21st Century” that you recommend but I had a problem following his reasoning and I would greatly appreciate if you could help me.

Mr. Effort present a case where 6,033 cases are examined and 28 out of them have a z score higher than 3.4. He then make the following statements: **“The determining fact here is that if indeed all the genes were Null, we would expect only 2.8 z-values exceeding 3.40, that is, only 10% of the actual number observed.
This brings us back to Bayes. Another interpretation of the FDR algorithm is that the Bayesian probability of nullness given a z-value exceeding 3.40 is 10%.”**

To begin with the probability of z > 3.4 is 0.00034 in the standard normal distribution. Therefore we would expect on average 6,033 x 0.00034 = 2.03 genes out of the 6,033 genes to have z > 3.4 assuming that all of those were Null. Why Mr. Effort raises this number to 2.8?

Now let’s assume that 2.8 is the correct value and continue with Mr. Effort’s argument. Mr. Effort says that the “Bayesian probability of nullness given a z-value exceeding 3.40 is 10%.” How he defines the Bayesian probability in this case? In general the Bayesian Theorem states that P(A|B) = P(A) * P(B|A)/P(B), where A and B are Events. How he defines A and B in this context and how he assigns probabilities to them?

I took a different tack and reasoned as follows: Given Null to be true the probability p of an observation with z > 3.4 is p = 0.00034. Then the number of observations with z > 3.4 under the Null when the Sample is 6,033 can be viewed as the number of Successes in 6,033 trials where each Success has probability 0.00034. This number is modeled by the Binomial Distribution with p = 0.0034 and size = 6,033. Using the Binomial Distribution one can show that the probability of observing 28 cases with z > 3.4 in a Sample of 6,033 examples is practically zero and certainly much less than 10%.

I would greatly appreciate your response.

I am sorry I have mispelled Mr. Efron's name as Effort.

Dear Alexander, To make a long story short : The local fdr value near z=3.4 is locfdr(z,nulltype = 0)$fdr approx .2; thus, Fdr value is half of it. (see Exercise 2.3: http://statweb.stanford.edu/~ckirby/brad/LSI/chapter2.pdf).​

However, the main message of Prof. Efron, I guess, is something much deeper than just computing some FDR numbers (for mere thresholding!). There is something much more serious at stake: How can we operationalize Bayes theorem for data analysis in the 21st century? How can we distill a sensible prior that we can defend? How can we uncover the blind spots of the conventional wisdom-based prior? A casual “go-as-you-like” attitude in prior-building can potentially undermine the whole statistical findings (compare the accuracy of [Bayesian] Presidential election forecast in 2008 and 2016 by Nate Silver).

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr