by Juan M. Lavista Ferres , Senior Director of Data Science at Microsoft
In what was one of the most viral episodes of 2017, political science Professor Robert E Kelly was live on BBC World News talking about the South Korean president being forced out of office when both his kids decided to take an easy path to fame by showing up in their dad’s interview.
The video immediately went viral, and the BBC reported that within five days more than 100 million people from all over the world had watched it. Many people around the globe via Facebook, Twitter and reporters from reliable sources like Time.com thought the woman that went after the children was her nanny, when in fact, the woman in the video was Robert’s wife, Jung-a Kim, who is Korean.
The confusion over this episode caused a second viral wave calling out that people that thought she was the nanny should feel bad for being stereotypical.
We decided to embrace the uncertainty and take a data science based approach to estimating the chances that the person was the nanny or the mother of the child, based on the evidence people had from watching the news.
@David_Waddell What would that mean, please? Re-broadcasting it on BBC TV, or just here on Twitter? Is this kinda thing that goes 'viral' and gets weird?
— Robert E Kelly (@Robert_E_Kelly) March 10, 2017
What evidence did viewers of the video have?
- the person is American Caucasian
- the person is professional
- there are two kids
- the caretaker is Asian
We then look for probability values for these statistics. (Given that Professor Kelly is American, all statistics are based on US data.)
- Probability (Asian Wife | Caucasian Husband) = 1% [Married couples in the United States in 2010]
- Probability of (Household has Nanny | husband is professional) = 3.5% [The Three Faces of Work-Family Conflict, page 9, Figure 3]
- Probability of (Asian | Nanny) = 6% [Caregiver Statistics: Demographics]
- Probability of (Stay at home mom) = 14% and Probability of (Stay at home mom | Asian Wife) = 30% [Stay-at-Home Mothers by Demographic Group ]
We define the following Bayesian network using the bnlearn package for R. We create the network using the model2network function and then we input the conditional probability tables (CPTs) that we know at each node.
library(bnlearn) set.seed(3) net <- model2network("[HusbandDemographics][HusbandIsProfessional][NannyDemographics][WifeDemographics|HusbandDemographics][StayAtHomeMom|HusbandIsProfessional:WifeDemographics][HouseholdHasNanny|StayAtHomeMom:HusbandIsProfessional][Caretaker|StayAtHomeMom:HouseholdHasNanny][CaretakerEthnicity|WifeDemographics:Caretaker:NannyDemographics]") plot(net)
The last step is to fit the parameters of the Bayesian network conditional on its structure, the bn.fit
function runs the EM algorithm to learn CPT for all different nodes in the above graph.
yn <- c("yes", "no") ca <- c("caucacian","other") ao <- c("asian","other") nw <- c("nanny","wife") cptHusbandDemographics <- matrix(c(0.85, 0.15), ncol=2, dimnames=list(NULL, ca)) #[1] cptHusbandIsProfessional <- matrix(c(0.81, 0.19), ncol=2, dimnames=list(NULL, yn)) #[2] cptNannyDemographics <- matrix(c(0.06, 0.94), ncol=2, dimnames=list(NULL, ao)) # [3] cptWifeDemographics <- matrix(c(0.01, 0.99, 0.33, 0.67), ncol=2, dimnames=list("WifeDemographics"=ao, "HusbandDemographics"=ca)) #[1] cptStayAtHomeMom <- c(0.3, 0.7, 0.14, 0.86, 0.125, 0.875, 0.125, 0.875) #[4] dim(cptStayAtHomeMom) <- c(2, 2, 2) dimnames(cptStayAtHomeMom) <- list("StayAtHomeMom"=yn, "WifeDemographics"=ao, "HusbandIsProfessional"=yn) cptHouseholdHasNanny <- c(0.01, 0.99, 0.035, 0.965, 0.00134, 0.99866, 0.00134, 0.99866) #[5] dim(cptHouseholdHasNanny) <- c(2, 2, 2) dimnames(cptHouseholdHasNanny) <- list("HouseholdHasNanny"=yn, "StayAtHomeMom"=yn, "HusbandIsProfessional"=yn) cptCaretaker <- c(0.5, 0.5, 0.999, 0.001, 0.01, 0.99, 0.001, 0.999) dim(cptCaretaker) <- c(2, 2, 2) dimnames(cptCaretaker) <- list("Caretaker"=nw, "StayAtHomeMom"=yn, "HouseholdHasNanny"=yn) cptCaretakerEthnicity <- c(0.99, 0.01, 0.99, 0.01, 0.99, 0.01, 0.01, 0.99, 0.01,0.99,0.99,0.01,0.01,0.99,0.01,0.99) dim(cptCaretakerEthnicity) <- c(2, 2, 2,2) dimnames(cptCaretakerEthnicity) <- list("CaretakerEthnicity"=ao,"Caretaker"=nw, "WifeDemographics"=ao ,"NannyDemographics"=ao) net.disc <- custom.fit(net, dist=list(HusbandDemographics=cptHusbandDemographics, HusbandIsProfessional=cptHusbandIsProfessional, WifeDemographics=cptWifeDemographics, StayAtHomeMom=cptStayAtHomeMom, HouseholdHasNanny=cptHouseholdHasNanny, Caretaker=cptCaretaker, NannyDemographics=cptNannyDemographics,CaretakerEthnicity=cptCaretakerEthnicity))
Once we have the model, we can query the network using cpquery
to estimate the probability of the events and calculate the probability that the person is the nanny or the wife based on the evidence we have (husband is Caucasian and professional, caretaker is Asian). Based on this evidence the output is that the probability that she is the wife is 90% vs. 10% that she is the nanny.
probWife <- cpquery(net.disc, (Caretaker=="wife"),HusbandDemographics=="caucacian" & HusbandIsProfessional=="yes" & CaretakerEthnicity=="asian",n=1000000) probNanny <- cpquery(net.disc, (Caretaker=="nanny"),HusbandDemographics=="caucacian" & HusbandIsProfessional=="yes" & CaretakerEthnicity=="asian",n=1000000) [1] "The probability that the caretaker is his wife = 0.898718647764449" [1] "The probability that the caretaker is the nanny = 0.110892031547457"
In conclusion, if you thought the woman in the video was the nanny, you may need to review your priors!
The bnlearn package is available on CRAN. You can find the R code behind this post here on GitHub or here as a Jupyter Notebook.
Twenty years ago, R did not exist. You took pen and paper and you applied Bayes theorem. If you need to use a program with a lot of code lines you are lost in any city without google maps. Is this AI or A stupidity?
On the other hand, I feel that the stereotypical thinkings are in the mind of Robert Kelly. Why does not he stop to talk and take his son out of the room? His wife appears as a nanny, really, surprised, submissive, scared... Are we in XXI century?
Posted by: Lluis | May 28, 2017 at 06:49
Very interesting post!
When i first saw the video i find in hilarious, a very nice reminder that we should not only focus on gloomy things as situation in North Korea.
However i think you missed a rather important piece of 'evidence'. Namely, that Mrs Kim looks so much younger than Professor Kelly. Yes, i assumed that Mrs Kim is a nanny, not because of ethnicity but because she looks so young. Would be interesting what the result would be if you consider even this aspect.
On a side note, an irrelevant information. i'm proud father of a mixed girl, European father Asian mother.
Posted by: Crane | May 30, 2017 at 04:52