by Andrie de Vries
Every once in a while I try to remember how to do interpolation using R. This is not something I do frequently in my workflow, so I do the usual sequence of finding the appropriate help page:
?interpolate
Help pages:
stats::approx Interpolation Functions
stats::NLSstClosestX Inverse Interpolation
stats::spline Interpolating Splines
So, the help tells me to use approx() to perform linear interpolation. This is an interesting function, because the help page also describes approxfun() that does the same thing as approx(), except that approxfun() returns a function that does the interpolation, whilst approx() returns the interpolated values directly.
(In other words, approxfun() acts a little bit like a predict() method for approx().)
Other functions in the interpolation family
The help page for approx() also points to stats::spline() to do spline interpolation and from there you can find smooth.spline() for smoothing splines.
Talking about smoothing, base R also contains the function smooth(), an implementation of running median smoothers (algorithm proposed by Tukey).
Finally I want to mention loess(), a function that estimates Local Polynomial Regression Fitting. (The function loess() underlies the stat_smooth() as one of the defaults in the package ggplot2.)
Trying the different interpolation and smoothing methods
I set up a little experiment to see how the different functions behave. To do this, I simulate some random data in the shape of a sine wave. Then I use each of these functions to interpolate or smooth the data.
Results
On my generated data, the interpolation functions approx() and spline() gives a quite ragged interpolation. The smoothed median function smooth() doesn't do much better - there simply is too much variance in the data.
The smooth.spline() function does a great job at finding a smoother using default values.
The last two plots illustrate loess(), the local regression estimator. Notice that loess() needs a tuning parameter (span). The lower the value of the smoothing parameter, the smaller the number of points that it functions on. Thus with a value of 0.1 you can see a much smoother interpolation than at a value of 0.5.
The code
Here is the code:
For all those on the bleeding edge, you're going to get some (or most/all) of these for "free" in the next ggplot2 release with the forthcoming ggalt package: https://github.com/hrbrmstr/ggalt
Right now it has a 'geom_xspline()' but once I get some cycles the others will be added (or y'all can follow the idiom in geom_xspline() and submit a PR :-)
Posted by: Bob Rudis | September 23, 2015 at 11:38
What is the preferred approach to interpolate points in three dimensional space in order to create a smooth estimated plane/wireframe out of noisy data sample? Is there an elegant way to get rid of outliers in order to avoid local distortions in the interpolated plane shape and allow for smooth curvature in the result?
Posted by: Maxim | September 23, 2015 at 13:51
Good summary! I just want to mention that smoothing can have detrimental effects if you want to quantify (predict) some response value on the smoothed values, especially when using running means.
See our paper on that:
http://www.clinchem.org/content/61/2/379.abstract
Cheers,
Andrej
Posted by: D | September 23, 2015 at 23:18