Comments on Pairwise-complete correlation considered dangerousTypePad2015-05-30T16:50:57ZBlog Administratorhttps://blog.revolutionanalytics.com/tag:typepad.com,2003:https://blog.revolutionanalytics.com/2015/06/pairwise-complete-correlation-considered-dangerous/comments/atom.xml/Robert commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01b8d12b81d6970c2015-06-21T15:34:25Z2015-06-23T20:36:40ZRoberthttp://twitter.com/riquiapazaLook at this simple simulation, your point may have same value only for very simple simulations and very high missings....<p>Look at this simple simulation, your point may have same value only for very simple simulations and very high missings. In other cases this procedure is better than mean or bootstrap imputation.<br />
The plots are here https://twitter.com/riquiapaza/status/612642731530301440</p>Bryan W. Lewis commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01b8d12a93b2970c2015-06-19T12:06:11Z2015-06-19T17:31:20ZBryan W. Lewishttp://illposed.netThanks for these thoughtful comments. The intentionally silly pathological example, like most textbook examples, is intended to get to the...<p>Thanks for these thoughtful comments. The intentionally silly pathological<br />
example, like most textbook examples, is intended to get to the heart of the<br />
matter and clearly illustrate the problem that pairwise deletion of cases leads<br />
to incomparable correlation values.</p>
<p>I very much agree that mean imputation can often be a bad idea, that's why I<br />
suggest a number of alternatives (including multiple imputation).<br />
But really the whole point of the article, I<br />
hope, is to provoke the reader to *think carefully* about their problem when<br />
missing values are involved!<br />
</p>Kovla123 commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01bb0844b6d2970d2015-06-18T12:28:39Z2015-06-18T12:28:45ZKovla123http://profile.typepad.com/kovla123I will join the others in respectfully disagreeing. The two widely accepted state-of-art techniques for treating missing data in various...<p>I will join the others in respectfully disagreeing. The two widely accepted state-of-art techniques for treating missing data in various settings are (1) maximum likelihood estimation and (2) multiple imputation. You provide some good advise by discouraging the use of pairwise deletion, which can lead to some issues both in terms of bias and calculation. Yet that good advice is outright cancelled out by the recommendation to use a single imputation method, which has been long discredited in the statistical literature. </p>Casper Albers commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01bb0843fbe8970d2015-06-17T07:19:34Z2015-06-18T22:08:50ZCasper Albershttp://www.casperalbers.nlI want to join Prof. Matloff in respectfully disagreeing. In this n=tiny example, things indeed become silly. But this is...<p>I want to join Prof. Matloff in respectfully disagreeing.</p>
<p>In this n=tiny example, things indeed become silly. But this is not due to the pairwise.complete.obs-problem, but due to that computing correlations for n=3 in itself is silly.</p>
<p><br />
Furthermore, your suggestion for imputing the mean can lead to *serious* underestimation of the correlation.<br />
When you have a lot of data, e.g. n = 100 for x[,1] and x[,2] and n = 90 for x[,3], there really is no methodological issue with computing a correlation between x[,2] and x[,3] on basis of the 90 overlapping observations. Indeed, n = 90 is lower than the n = 100 observations you have for x[,1] and x[,2], but also n = 100 is nothing more than a (random) sample from a larger population. As long as there is not a specific mechanism that decides which values are missing (Missing-Not-At-Random, MNAR), you can simply regard the n = 90 as a regular random sample and compute r for it.</p>
<p><br />
Your suggestion to impute the mean is plain wrong. By imputing the mean, you make the vector of observations as flat as possible (suppose x[,3] would have been -2, -1, 0, 1, 2; a straight line with slope +1 and correlation +1 with x[,1]. Because -2 and -1 are unknown, you propose to impute them with +1, making the series +1, +1, 0, +1, +2; much flatter and now with correlation with x[,1] much closer to zero (.44). Of course you do not know whether the two missing values are -2 and -1 or something else but, as long as you assume that the values are missing-completely-at-random (MCAR) or missing-at-random (MAR) you can easily prove that this imputation method will yield, on average, estimates of the correlation biased towards zero.<br />
</p>D commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01b7c79fbd38970b2015-06-16T23:24:13Z2015-06-16T23:24:14ZDhttp://profile.typepad.com/d123787003640256267Sorry, that last post got mangled. The URL for my paper is http://www.amstat.org/meetings/jsm/2015/onlineprogram/AbstractDetails.cfm?abstractid=316343<p>Sorry, that last post got mangled. The URL for my paper is http://www.amstat.org/meetings/jsm/2015/onlineprogram/AbstractDetails.cfm?abstractid=316343</p>D commented on 'Pairwise-complete correlation considered dangerous'tag:typepad.com,2003:6a010534b1db25970b01b7c79fbd12970b2015-06-16T23:22:48Z2015-06-16T23:22:49ZDhttp://profile.typepad.com/d123787003640256267Hi, Bryan, Norm Matloff here. I must respectfully disagree with your post. One should never rely on correlations in small...<p>Hi, Bryan, Norm Matloff here. I must respectfully disagree with your post.</p>
<p>One should never rely on correlations in small data sets (or tiny ones, as you call your example). Second-order moments are a lot harder to estimate than first-order ones, e.g. the variance of a sample variance is large.</p>
<p>The Available Cases method for dealing with missing values, as exemplified in the <b>pairwise.complete.obs<b> option you cite, is a lot more useful than many people realize. It does tacitly make strong assumptions, but so do all of the missing-value methods, including in <b>Amelia</b>.</p>
<p><a href="http://www.amstat.org/meetings/jsm/2015/onlineprogram/AbstractDetails.cfm?abstractid=316343" rel="nofollow"> on this topic at the JSM in August, and will release an R package in the next couple of weeks.</p>