Finding a matching row in two separate datasets in matlab

Question

I have two correlated Nx3 datasets (one is xyz points, the other is the normal vector for those points). I have a point in my first dataset and now I want to find the matching row in the second dataset. What's the best way to do this? I was thinking print out the row number but not sure exactly what the code is to do that?

score 1 · Accepted Answer · edited May 23 '17 at 12:21

Given that you have a point in your one dataset that is size 1 x 3, there are two possible ways that you can do this.

Method #1 - Using `knnsearch`

The easiest way would be to use knnsearch from the Statistics Toolbox.

knnsearch stands for K-Nearest Neighbour search. Given an input query point, knnsearch finds the k closest points to your dataset given the input query point. In your case, k=1. Also, the distance metric is the Euclidean distance, but seeing how your points are in 3D Cartesian space, I don't see this being a problem.

Therefore, assuming your xyz points are stored in X and the query point (normal vector) is in y, just do this:

IDX = knnsearch(X, y);

The above defaults to k=1. If you'd like more than 1 point returned, you'd do this:

IDX = knnsearch(X, y, 'K', n);

n is the number of points you want returned or the n closest points given the query y. IDX contains the index of which point in X is closest to y. I would also like to point out that X is arranged such that each row is a point and each column is a variable.

Therefore, the closest point using IDX would be:

closest_point = X(IDX,:);

Method #2 - Using `bsxfun`

If you don't have the Statistics Toolbox, you can very easily achieve the same thing using bsxfun. Bear in mind that the code I will write is only for returning the closest point, or k=1:

dists = sqrt(sum(bsxfun(@minus, X, y).^2, 2));
[~,IDX] = min(dists);

The bsxfun call first determines the component-wise distance between y and every point in X. Once we do this, we square each component, add up all of the components together then take the square root. This essentially finds the Euclidean distance with y and all of the points in X. This gives us N distances where N is the total number of points in the dataset. We then find the minimum distance with min and determine the index of where the closest matching point is, which corresponds to the closest point between y and the dataset.

If you'd like to extend this to more than one point, you'd sort the distances in ascending order, then retrieve those number of points with the smallest distances. Remember, smaller Euclidean distances mean that the points are similar, which is why we sort in ascending order. Something like this:

dists = sqrt(sum(bsxfun(@minus, X, y).^2, 2));
[~,ind] = sort(dists);
IDX = ind(1:n);

Just a small step upwards from what we had before. Instead of using min, you'd use sort and get the second output of sort to determine the locations of the minimum distances. We'd then index into ind to get the n closest indices and finally index into X to get our actual points.

You would again do the same thing to retrieve the actual points that are closest:

closest_point = X(IDX,:);

Some Bonus Material

If you'd like to read more about how K-Nearest Neighbour works, I encourage you to read my post about it here:

Finding K-nearest neighbors and its implementation

Good luck!

Finding a matching row in two separate datasets in matlab

1 Answers1

Method #1 - Using knnsearch

Method #2 - Using bsxfun

Some Bonus Material

Method #1 - Using `knnsearch`

Method #2 - Using `bsxfun`