by Michael Helbraun
Michael is member of Revolution Analytics Sales Support team. In the following post, he shows how to synthesize a probability distribution from the opinion of multiple experts: an excellent way to construct a Bayesian prior.
There are lots of different ways to forecast. Depending on whether there’s historical data, trend, or seasonality you might choose to start with a particular technique. Assuming good domain expertise one effective method is combining expert opinion via Monte Carlo simulation to generate a stochastic forecast. While this example is set up to combine 3 different people’s perspectives of what the number might be, this technique could also be used to combine domain expertise with traditional analytic techniques like time series, regression, neural networks, etc.
First we grab some estimates from our three experts:
Next we generate triangular distributions based on each of our expert’s opinions; we then randomly select one value from each trial:
The end result – a nicely merged stochastic estimate:
Michael's code (below) uses Revolution's RevoScaleR library. Notice that the rxSetComputeContext() function (line 22) instructs the computer to set up for parallel computation using the resources on the local machine, and the rxExec() function in line 26 executes the rtriangle() function in parallel. By just changing the compute context this same code could run in parallel using all of the resources of and LSF or Hadoop cluster.
############################################################################### ## ## ## Revolution R Enterprise - MCS Forecasting, combining expert opinion ## ## ## ############################################################################### # Clear out memory for a fresh run and load required packages rm(list = ls()) library(triangle); library(distr); library (ggplot2) # read input parameters bigDataDir <- "C:/Data/Demos/Datasets" bigDataDir <- "C:/..." inDataFile <- file.path(bigDataDir, "/Expert Estimates.csv") expertOpinion <- rxImport(inData = inDataFile) View(expertOpinion) # Set simulation parameters trials <- 1000 rxOptions(numCoresToUse = -1) rxSetComputeContext("localpar") # create individual triangular distributions orderedTri <- function(expertNum, trials) { revoFcast <- rxExec(FUN = rtriangle, timesToRun = 1, n = trials, a = expertOpinion$Min[expertNum], b = expertOpinion$Max[expertNum], c = expertOpinion$MostLikely[expertNum], packagesToLoad = "triangle") return(revoFcast) } # create distribution for each of our experts revoFcast = NA for (i in 1:nrow(expertOpinion)) { if (is.na(revoFcast)) {revoFcast <- orderedTri(i,trials)} else revoFcast <-c(revoFcast,orderedTri(i,trials)) } # prepare the results revoFcast <-(data.frame(revoFcast)) names(revoFcast) <- paste("Expert", 1:nrow(expertOpinion), sep="") # ensure that the results are uncorrelated cor(revoFcast) # create a combined probability distribution and select a forecast value from the prob weighted dist combinedDist <- function(trialNum) { cDist <- DiscreteDistribution(supp = as.double(revoFcast[trialNum,]), prob = expertOpinion$Weighting/sum(expertOpinion$Weighting)) rD <- r(cDist) # variable to generate values from the dist return(rD(1)) # generate/select 1 value } merged <- rxExec(FUN = combinedDist, trialNum = rxElemArg(c(1:trials)), execObjects = c("revoFcast","expertOpinion"), packagesToLoad = "distr") # add the forecast to our working data set merged <- data.frame(merged) names(merged) <- NULL revoFcast$merged <- t(merged) # chart the output View(revoFcast) # Look at our combined data set # restructure the data for plotting histVals <- data.frame(Value = c(revoFcast$Expert1, revoFcast$Expert2, revoFcast$Expert3, revoFcast$merged), Source = c(rep(c("Expert1", "Expert2", "Expert3","Merged Opinion"), each = trials ))) names(histVals) = c("Value", "Source") # draw our combined plot ggplot(histVals, aes(Value, fill = Source)) + geom_density(alpha = 0.25) + ggtitle("Combined Expert Opinion")
Created by Pretty R at inside-R.org
Download Expert Estimates the small data file used to drive Michael's simulation.
Hi Michael,
nice post! When I look at the plot of the distributions, I wonder why they're not symmetric. With 1000 trials, I would expect the simulated distribution much closer to the symmetrical shape of the ideal triangle distribution. Furthermore, the fact that the deviation from symmetry is very similar for all three expert distributions point to some systematic reason for this deviation. Do you have an idea what this reason could be?
Kind regards,
Michael
Posted by: Michael Allgöwer | January 10, 2014 at 11:03
Hi Michael,
Nice post. It really helpful. Thanks
Posted by: Aly | January 12, 2014 at 07:41
Hi
Can you explain how the Weights is used in computing merged columns. Little urgent
Posted by: RAJESH P | January 30, 2014 at 23:57