Loren on the Art of MATLAB

Turn ideas into MATLAB

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Natural Neighbor – A Superb Interpolation Method 16

Posted by Loren Shure,

I'm happy to welcome Damian Sheehy as this week's guest blogger. Damian works on the development of geometry-related features at MathWorks. He will provide answers to two frequently asked questions; one on scattered data interpolation that he will cover in this blog and the other on Delaunay triangulation that he will cover in the next. Over to you, Damian...

Contents

An Email from Customer Support

I occasionally get an email from customer support with a title similar to this one: "griddata gives different results. . . ". The support engineers are great, they really know how to choose a good subject line that will get a developer's attention and get a response back to the customer quickly. The subject line could equally well cite scatteredInterpolant as it shares the same underlying code as griddata. Before I open the email I have a strong suspicion about the cause of the difference. If the first line opens with this: "A customer upgraded to MATLAB R20** and griddata gives different results to the previous . . .", then I'm fairly confident what the problem is. I look at the customer's dataset, perform a couple of computations, create a plot, and bingo!, I have a canned response and recommendation at the ready and I turn around the question in a matter of minutes.

Why griddata or scatteredInterpolant May Be Inconsistent

So why should griddata or scatteredInterpolant give different answers after upgrading MATLAB? What has MathWorks done to address this problem? Are there issues with scattered data interpolation that users should be aware of? Yes, there are some subtle behaviors associated with the Nearest Neighbor and Linear interpolation methods for scattered data interpolation. These problems present themselves in specific datasets and the effects may show up as numerical differences after a MATLAB upgrade. I will explain the origin of these problems and the options you have as a user to avoid them altogether. I will also highlight what MathWorks has done to address the problems.

First, let's take a look at the behavior and the data that triggers the problem. To demonstrate, I will choose a simple data set where we have four points at the corners of a square. Each sample point has a different value associated with it and our goal is to compute an interpolated value at some query point within the square. Here's a diagram:

Px = [0; 1; 1; 0];
Py = [0; 0; 1; 1];
V = [10; 1000; 50; 100];
plot(Px, Py, 'or')
hold on
text(Px+0.02, Py+0.02, {'P1 (10)', 'P2 (1000)', 'P3 (50)', 'P4 (100)'})
pq = [0.5 0.5];
plot(pq(:,1), pq(:,2), '*b')
hold off
axis equal

First, let's consider the Nearest Neighbor interpolation method. For any query point within the square, the interpolated value is the value associated with the nearest neighbor. The figure shown above illustrates the configuration and sample values in parenthesis. We can see an ambiguity arises when the query point lies at the center of the square. There are four possible interpolation solutions and based on the definition of the method any of the four values is a valid solution. Ideally, we would like to have the same result, no matter what computer MATLAB is running on and no matter what version.

This type of problem can also arise with the Linear interpolation method. To perform linear interpolation, the scattered dataset is first triangulated using a Delaunay triangulation. The interpolated value at a query point is then derived from the values of the vertices of the triangle that enclose the point. But a Delaunay triangulation of this dataset is not unique, the illustration below shows two valid configurations.

subplot(1,2,1);
Px = [0; 1; 1; 0; 0; 1];
Py = [0; 0; 1; 1; 0; 1];
pq = [0.5 0.25];
plot(Px, Py, '-b')
hold on
plot(pq(:,1), pq(:,2),'or')
hold off
axis equal
subplot(1,2,2);
Px = [0; 0; 1; 1; 0; 1];
Py = [1; 0; 0; 1; 1; 0];
plot(Px, Py, '-b')
hold on
plot(pq(:,1), pq(:,2),'or')
hold off
axis equal

Example of Inconsistent Behavior in Linear Interpolation

The interpolated result is different in each scenario. The code to demonstrate this is given in the frame below, I have perturbed one data point to flip the diagonal of the triangulation and illustrate the effect.

P = [0 0; 1 0; 1 1; eps (1-eps);];
V = [10; 1000; 50; 100];
pq = [0.5 0.25];

F = scatteredInterpolant(P,V);
linearVq = F(pq)

P = [0 0; 1 0; 1 1; -eps (1+eps);];
F.Points = P;
linearVq = F(pq)
linearVq =
        527.5
linearVq =
        267.5

Why Natural Neighbor Interpolation is Superior

The interpolated value at the query point, linearVq, is sensitive to how the triangulation edge is created in the tiebreak case. This tiebreak is called a degeneracy and the problem arises in Delaunay triangulations when we have 4 or more cocircular points in 2-D or 5 or more cospherical points in 3-D. Now observe the behavior when we choose the Natural Neighbor interpolation method. This method is also based on an underlying Delaunay triangulation, but it produces the same result in the presence of a diagonal edge swap. Here it is:

P = [0 0; 1 0; 1 1; eps (1-eps);];
F.Points = P;
F.Method = 'natural';
naturalVq = F(pq)

P = [0 0; 1 0; 1 1; -eps (1+eps);];
F.Points = P;
naturalVq = F(pq)
naturalVq =
        397.5
naturalVq =
        397.5

Natural Neighbor is also a smoother interpolating function, so it makes a lot of sense to favor Natural Neighbor over Linear. Well that begs the question: shouldn’t the Natural Neighbor method be the default? This would be our preferred choice of default, but this method was added to MATLAB long after the griddata Linear method was introduced. Changing a default would have a significant impact, so the change would be more disruptive than the problem it addresses. The Natural Neighbor method is also more computationally expensive, so for large datasets Linear may be preferred for performance reasons.

The example we just reviewed highlights the nature of the problem and gives you a more stable alternative to avoid potential differences from scattered data interpolation after you upgrade MATLAB. For our part here in development, we have long recognized these types of issues create problems for users and we have adopted better underlying algorithms to address them. If you are a long-time user of MATLAB and the griddata function, you may recall more annoying past behavior. Prior to R2009a, repeated calls to the function using the now redundant {'QJ'} Qhull option gave potentially different results on each call. That problem was resolved in R2009a along with the introduction of Natural Neighbor interpolation for its stability and superior interpolation properties. Since then, improvements in the underlying triangulation algorithms have led to stable and consistent results across all platforms, first for 2-D and then for 3-D. Unfortunately, introducing these improvements meant a potential change in behavior of the upgrade for specific datasets, but that's like having to break eggs to make cake. Looking ahead, the behavior of Delaunay triangulation based functions should be much more stable. Of course, fundamental changes to the coded algorithm are likely to trigger changes like the ones we've seen, though the scope of the problem has been reduced substantially.

You Tell Me!

I haven't received many support escalations questions related to Natural Neighbor interpolation since it shipped in R2009a. I often wonder if this is because it is working well for users or if users may not be using it. The similar naming to Nearest Neighbor interpolation may cause some to misunderstand the method. How about you, have you used Natural Neighbor interpolation and how has it worked out? Let me know here.


Get the MATLAB code

Published with MATLAB® R2015a

Note

Comments are closed.

16 CommentsOldest to Newest

Raphaël replied on : 1 of 16
Hi Loren, we use 3D natural neighbor interpolation extensively with one of our sensors and it works great! As you mentioned, it is computationally expensive and this step is often the bottleneck of our algorithms. It seems that natural neighbor interpolation is not multithreaded yet in 2015a, that would be a great improvement.
Damian Sheehy replied on : 2 of 16
Hi Raphaël, Thank you for sharing your experience with Natural Neighbor interpolation, this is very useful feedback! You're correct, the interpolation methods are not currently multithreaded. This appears to be a viable and certainly very worthwhile enhancement. Support for the Natural Neighbor method was prompted by a user request. User-driven feedback like this helps us prioritize the features you need most. Great suggestion!
Eric S replied on : 3 of 16
Damian, I'm just happy not to have to use Qhull any more :) I noticed that griddedInterpolant has some additional nice information about the speed/memory tradeoffs of the various algorithms, but this is missing for the scatteredInterpolant doc page - maybe that would be nice to add? Also a general question re function user interface design, what are your thoughts about not having defaults for parameters similar to interpolation method, so that users have to explicitly choose a method which prevents the problem you describe where you can't choose to improve/change the way it works due to backwards compatibility requirements?
Matthew Bondy replied on : 4 of 16
Is scatteredInterpolant suitable for searching for the closest node in a finite element mesh to a given location and reporting the nodal ID number? I tried both scatteredInterpolant with nearest and natural interpolation and the results were very different and completely unreasonable. It is a very large data set, just under 2 million nodes.
Damian Sheehy replied on : 5 of 16
Hi Eric, You’re right, the Method table for griddedInterpolant contains useful memory/performance information that the scatteredInterpolant table doesn’t have. That’s a good observation and good suggestion, we’ll align the scatteredInterpolant documentation to improve that. Requesting a user like you to choose an interpolation method is perfectly fine, because you understand the pros and cons of the various methods. But for some users, interpolating scattered data is something that just happened to crop up when working on a larger data analysis project. That user’s immediate interest is interpolating the data and moving on to the next step. The default helps to keep things moving without getting sidetracked into different subject matter. Exploring the various methods may then be something they do at the end when refining the overall workflow. Some users like to work like that; it’s gratifying to get something working and gratifying again to improve it. It’s great to hear from you again, Eric! Damian
Damian Sheehy replied on : 6 of 16
Hi Matthew, That’s a good question and it’s something that comes up from time to time. You could use scatteredInterpolant and the nearest method, not natural. The value assigned to each point would need to be the node ID. This approach involves recomputing the triangulation in order to perform the search for the nearest points. If you are working with a triangle or tetrahedral mesh, then I would recommend the triangulation class in MATLAB, it has a nearestNeighbor and pointLocation method that was added in R2014b for your exact use case! If your mesh is quadratic you will need to convert it to linear by removing the mid-side nodes from connectivity matrix. For example, if you have quadratic triangles you can convert them to linear by removing columns 3-6, but that depends on the numbering convention. . . You do not need to strip the mid-side nodes from the points matrix. Give it a shot if you have R2014b or later. Damian
Eric S replied on : 7 of 16
Likewise Damian! I understand what you mean regarding defaults being very useful when doing exploratory analysis/programming. Here's a blue-sky idea that has crossed my mind a number of times in order to achieve a balance between frictionless exploration and robustness in programs; having function arguments that have defaults when used interactively at the command prompt, but not when called from another function/script... What do you think? :)
Sean de Wolski replied on : 8 of 16
Eric, this makes it so you can't drag and drop command history into a script while expecting it to work.. It's a fairly common workflow: prototype at command line, when ready move to script.
Damian Sheehy replied on : 9 of 16
Hi Eric, I see where you're coming from. Your suggestion would work well in a GUI-based application that journals commands. The GUI could supply a default and journal that default in the command. Inputting the command directly would require an explicit choice. I've seen this approach used in some applications, your experience with these apps may be influencing your thoughts :-)
Eric S replied on : 10 of 16
Sean, I understand what you're saying but I'd argue that what you describe is exactly the point - with my thought proposal, when you drag a section of prototype code into a script/function, running it would error out wherever default parameters were used, forcing the programmer to then take a look at the functions in question and decide if the defaults are truly appropriate, and either way making an explicit choice in the 'production' code. Now maybe in an ideal world it would be a little more subtle such that when the code snippet is dragged into the editor the latter would auto-insert default parameters into the code and then M-lint would underline them to bring them to the programmer's attention. Anyway it's a fun thought experiment :)
Sean de Wolski replied on : 11 of 16
Eric, I think the problem is this doesn't scale well and could be confusing because of the lack of consistency. Here's the other extreme. At the command prompt: plot(x,y) Now I move this into a script and you want me to specify: Color LineStyle LineWidth Marker MarkerSize MarkerEdgeColor MarkerFaceColor Clipping AlignVertexCenters XData YData ZData UIContextMenu BusyAction BeingDeleted Interruptible CreateFcn DeleteFcn ButtonDownFcn Type Tag UserData Selected SelectionHighlight HitTest PickableParts DisplayName Parent Visible HandleVisibility These properties all have defaults :) Even for things like "fmincon". Most of the time, I don't want to change the algorithm, the default is fine and I'm getting good results. So why force me to in order to reproduce it? So the argument is, well with interpolation this could matter because the user doesn't necessarily want linear. But if this was the only place MATLAB exhibits this behavior, people are going to be very confused. It's hard enough to encourage users to read the documentation or error messages and I wouldn't want it to be any more confusing. I think your Code Analyzer idea is the best one but just have it on by default. For a function like this, where other defaults could commonly be wise, provide a harmless orange underline that you can ignore, turn off, or listen to religiously (like me!). % My $0.02, not the views of my employer
Eric S replied on : 12 of 16
Good point Sean, as usual. I was envisioning that this behavior would be opt-in on a per-parameter basis for the developers of the called function. For functions like PLOT whose main purpose is to provide defaults to pass to low-level implementation functions, it wouldn't make much sense to require the end user to specify all arguments.
Jan replied on : 13 of 16
Hi Loren, When I'm performing (natural-neighbor) interpolation, I often have to perform the interpolation multiple times using the same data point locations and the same interpolation locations, however with different values for the data points. Is it possible to obtain the interpolation-functions from matlab? What I do now, to get this is a bit of a dirty hack: - I set all of the values for the data points to zero, except datapoint i, which I set to one, - Then I perform the interpolation and store it as P(:,i) - Then I set all data points to zero except point i+1, and interpolate again - Until I have all the functions for all data points. - Now I can interpolate each field as A = P * V, where V are the values of the field I want to interpolate Obviously, I'm still computing the shape-functions N times (for each point) while in the scatteredInterpolant each of them is already known. It would be nice if I could split the interpolation provided by the matlab function in two steps: 1: get the basis-functions 2: apply the basis functions. Anyhow, thanks for the blog-post, natural neighbor interpolation is an interesting topic. Greetings Jan,
Damian Sheehy replied on : 14 of 16
Hi Jan, We currently do not expose the natural neighbor coordinates for natural neighbor interpolation, but we do expose the Barycentric coordinates used for linear interpolation. The delaunayTriangulation feature provides the Barycentric computation and the following example from the documentation illustrates usage: https://www.mathworks.com/help/matlab/math/interpolation-using-a-specific-delaunay-triangulation.html If we were to expose NN coordinates, you would use them in a similar way to Barycentric coordinates as each of these systems represent a partition of unity. When we compute Barycentric coordinates for a query point in 2-D, we get three weights corresponding to the three vertices of a triangle that encloses the query. Each weight is applied to the corresponding value at each vertex and the sum is the interpolated value. Natural neighbor coordinates are used in a similar way, except you have approximately six vertices in the neighborhood of the query point and six corresponding weights in 2-D. The product and sum gives the interpolated value as before. One nuance with NN coordinates is the number of coordinates for each query point can vary, so a cell array would be required to hold the data. Also, you would have to loop over the query points and the NN coordinates to do the interpolation. In contrast, we can do the Barycentric interpolation more elegantly with a dot product. We debated exposing this feature as we weren’t sure if it would help users like you or if it would be a feature that never gets used. If you feel this is a feature you would use, please let me know and I will file an enhancement request and revisit this case. As you probably know, you can mitigate the overhead in your scenario by replacing values in the interpolation. The documentation provides an example that shows how to do that when interpolating complex data, here’s the link: https://www.mathworks.com/help/matlab/math/interpolating-scattered-data.html#btdzf16 Basically, this is done in two passes; first pass interpolates the real component and second pass interpolates the complex component. This avoids computing the triangulation twice, but you still have the overhead of computing the Natural Neighbor coordinates twice and I understand that’s what you would like to avoid. Thanks for taking the time to report your experience and highlight your use case! Damian
Jan replied on : 15 of 16
Hi Damian, Thank you for your quick response, also, the links you provided are very nice, and good reading material, thank you for those. Tor my use case the NN coordinates (or as I would call them, basis-functions) are more useful than the interpolation results. Since, like you highlighed, they have the partition of unity property, and thus, can be applied in many more ways than just interpolation. However, I feel my case is not representative of the Matlab user base. So I think your decision was correct, in not providing the feature. If I (or somebody in my lab) would need such a feature, we would just program the NN algorithm in a mex file. Maybe, an intermediate solution would be to allow giving multiple value fields to the scatteredInterpolatant, for example, one column for each field. This way, the method is similar to the complex example you gave, but would allow more than two fields. P.S. I would not store the NN coordinates as a cell, but rather as a sparse matrix, but that's a side issue.
Damian Sheehy replied on : 16 of 16
Hi Jan, You're right, NN coordinates have useful applications outside of data interpolation and I've needed them myself for prototyping one of those applications. Developing an efficient implementation of the algorithm in a mex file is a bit tedious and a digression from what users want to focus on. It's tricky to come up with an elegant way to handle multiple value fields that makes sense to many users who deal with a single set of values. It's the balance of compromises like that make these design cases challenging. . . I like your idea on representing the NNCs using a sparse matrix! Damian