by Srini Kumar, VP of Product Management and Data Science, LevaData; and Bob Horton, Senior Data Scientist, Microsoft
A rational function is defined as the ratio of two functions. The Padé Approximant uses a ratio of polynomials to approximate functions:
$$
R(x)= \frac{\sum_{j=0}^m a_j x^j}{1+\sum_{k=1}^n b_k x^k}=\frac{a_0+a_1x+a_2x^2+\cdots+a_mx^m}{1+b_1 x+b_2x^2+\cdots+b_nx^n}
$$
Here we show a way to fit this type of function to a set of data points using the lm
function in R.
We'll use a data-simulating function based on a ratio of polynomials to generate a set of y
values for some random x
values, add some noise, then see how well we can fit a curve to these noisy points.
set.seed(123) N <- 1000 x <- rnorm(N) f <- function(x) 50*x^2/(1 + 4*x) # data-simulating function y <- f(x) + rnorm(N, sd=3) point_data <- data.frame(x, y) library("tidyverse") ggplot(point_data, aes(x=x, y=y)) + geom_point() + ylim(-100, 100) + ggtitle("simulated data points")
Note that in the formula for the Padé approximant the numerator polynomial has an zeroth-order coefficient (\(a_0\)), where the denominator just has a constant \( 1\). This makes it easy to rearrange the equation to express \(y\) as a linear combination of terms.
$$ y = \frac{a_0 + a_1 x + a_2 x^2}{1 + b_1 x + b_2 x^2} $$ $$ y (1 + b_1 x + b_2 x^2) = a_0 + a_1 x + a_2 x^2 $$ $$ y = a_0 + a_1 x + a_2 x^2 - b_1 x y - b_2 x^2 y $$
Now we have a form that lm
can work with. We just need to specify a set of inputs that are powers of x
(as in a traditional polynomial fit), and a set of inputs that are y
times powers of x
. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y
in order to predict y
. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade
function below takes a dataframe with x
and y
values, fits an lm
model, and returns a function of x
that uses the coefficents from the model to predict y
:
fit_pade <- function(point_data){ fit <- lm(y ~ x + I(x^2) + I(y*x) + I(y*x^2), point_data) lm_coef <- as.list(coef(fit)) names(lm_coef) <- c("a0", paste0(rep(c('a','b'), each=2), 1:2)) with(lm_coef, function(x)(a0 + a1*x + a2*x^2)/(1 - b1*x - b2*x^2)) }
We'll use ggplot2
to visualize the results; since it drops points outside the plotted range, it does not try to connect the points on either side of the discontinuity, as long as the data gives expected y values past those limits.
plot_fitted_function <- function(x_data, fitted_fun, title){ x_data$y_hat <- fitted_fun(x_data$x) g <- ggplot(x_data, aes(x=x, y=y)) + geom_point() + ylim(-100, 100) + geom_line(aes(y=y_hat), col="red", size=1) + ggtitle(title) plot(g) }
When we draw the fitted values over the input data points, we see that in this case the function describes the points quite well:
pade_approx <- fit_pade(point_data) plot_fitted_function(point_data, pade_approx, title="fitted function")
Here are some more examples, showing that this approach can work with some interesting patterns of points:
function_list <- list( function(x) (100 - 50*x - 100*x^2)/(1 - 10*x - 10*x^2), function(x) (100 - 50*x - 100*x^2)/(1 - 10*x - 5*x^2), function(x) (100 - 50*x - 100*x^2)/(1 - 50*x - 5*x^2) ) for (f in function_list){ sim_data <- point_data %>% mutate(y=f(x) + rnorm(nrow(point_data), sd=3)) plot_fitted_function(sim_data, fit_pade(sim_data), title=as.character(deparse(f))[2]) }
Here we have used linear regression by ordinary least squares (with lm
) to fit distinctly nonlinear rational functions. For simplicity, these examples focus on equations of second order (or less) in both numerator and denominator, but the idea extends to higher orders. On a cautionary note, this approach seems to have numerical stability issues with some inputs. For example, if you take one of the data-simulating equations above and make selected coefficients much larger, you can create datasets that are fitted poorly by this method. And if the data-simulating function does not have the correct form (for example, if the zeroth order term in the denominator is not 1), the fitted curves can be completely wrong. For practical purposes it might be preferable to use a nonlinear least squares approach (e.g., the nls
function). Still, this approach works well in many examples, and it lets you fit some curves that cannot be represented by ordinary polynomials.
Nice methodology. I'll just point out that this only works with single-valued functions. I've been playing with algorithms to fit conic sections, and they are quite a different kind of mess!
Posted by: Carl Witthoft | April 06, 2017 at 11:02
Agree that this is cool. If I'm not mistaken, this works nicely for getting the expected relationship but will mess up inference (because the assumptions about independence, Gaussianity, etc. no longer hold), by analogy with older methods of fitting hyperbolic functions as 1/y ~ 1/x ?
Posted by: Ben Bolker | April 07, 2017 at 14:46