1

This is kind of unusual question.

There are several objects (from, say, 3 to maybe many 20) that are characterized by 2d coordinates. Then there is a set of new points with 2d coordinates (same number as objects), which objects must occupy.

And there has to be an algorithm (that I don't have) which decides which object goes where, making it so the total movement is minimal. Or at least as close to minimal as reasonably possible.

As far as I know, there's nothing to process that data in Unity engine (which I`m using), so I thought of using outside python/pandas script. It seemed trivial at first, calculate distances from each object to each of new points, fill matrix with results, then find minimums across rows/columns, aaand... suddenly, it's not that simple.

Because one object may be closest to several new points. And one new point can be closest to several old object positions. Which means, when minimums are calculated, some points are not taken at all, and more than one object try to occupy single point. Or, if another way around, some objects are not closest to any points, while some has multiple choice.

For example, one iteration that has to be processed:

             0          1          2   ...         9          10         11
18     7.305648   8.026363   5.710035  ...  17.495671  23.595815  17.204084
86    25.021771  29.697289  21.702557  ...  26.896933  40.934203  38.877976
201   34.131078  25.249168  25.399301  ...  45.216286  42.278724  24.386605
284   11.190541  20.365760  22.147781  ...   2.798607  18.822864  26.294566
351   34.563530  28.478652  21.878108  ...  45.000284  48.300513  33.026741
393   33.080871  33.124070  22.894585  ...  39.383093  50.000740  41.647761
586   22.026731  15.960542   9.826235  ...  32.721902  35.796940  21.730920
657   22.539747  26.626457  34.888454  ...  18.464566   6.979699  24.862667
664   18.628092  18.880408   8.922881  ...  26.259204  35.471028  27.797514
1067  21.810369  12.634722  17.729529  ...  32.417406  27.951390  10.075525
1113  16.393673  24.246701  20.782216  ...  14.116179  29.544246  32.455269
1196  17.042118  13.581524  23.814823  ...  23.102679  12.043330   7.017820

0       18
1       18
2       18
3       18
4     1067
5      351
6      393
7       86
8     1113
9      284
10     657
11    1196

18       2
86       7
201      4
284      9
351      5
393      6
586      5
657     10
664      2
1067     4
1113     8
1196    11

As can be seen, for points 0, 1, 2, 3 closest object is #18. While, say, object #201 isn't the closest one to any of new points. Point 2 is closest for object #18, but, at the same time, it`s closest for object #664... while point 1 is not closest for any of the objects.

I've been thinking about finding solution that is better than "pick them at random!" for a workday now, but didn't come out with anything at all.

Any thoughts would be appreciated, pretty please.

Paul Alex
  • 65
  • 8
  • 1
    _"... suddenly, it's not that simple"_ True! I'm no expert, but I'm thinking this may be an example of the [Assignment Problem](https://en.wikipedia.org/wiki/Assignment_problem). Given your distances matrix, it seems [`scipy.optimize.linear_sum_assignment`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html) looks promising. – rickhg12hs Jul 07 '23 at 02:59
  • It pretty much **IS** Assignment problem! Currently, I'm using this https://stackoverflow.com/questions/53294115/get-row-and-column-with-minimum-value-in-entire-pandas-dataframe and dropping column/row pair each iteration, like: `obj2pos = {} while 0 < distances.shape[0] : ot, pt = distances.stack().idxmin() obj2pos[ot] = pt distances.drop(columns=[pt], index=[ot], inplace=True)` ...but I have a _very bad feeling_ it's not optimal, and does not generate best possible result... – Paul Alex Jul 07 '23 at 03:47
  • 1
    If you want to stick to c# can probably look into [linearassignment](https://github.com/fuglede/linearassignment) (basically c# level based on scipy.optimize), for performance going ether native or using Unity Job System might be nicer though – derHugo Jul 07 '23 at 09:36
  • I don't understand your data. 0 through 11 are object indices. Where are your 2D coordinates? – Reinderien Jul 07 '23 at 12:20
  • 1
    I'm guessing that what you've shown are _not_ 2D coordinates, but already-calculated Euclidean distances. Which, fine. Unless I'm missing something, this is quite straightforward. – Reinderien Jul 07 '23 at 12:26
  • Yes, dataframe contains already calculated distances from every object's current position to every newly generated point. As I said, "calculate distances between everything, fill matrix..." and **then** it goes complicated. – Paul Alex Jul 09 '23 at 01:03

2 Answers2

2

Here's one way you could do it using SciPy's linear_sum_assignment.

import numpy as np
from scipy.spatial.distance import cdist
from scipy.optimize import linear_sum_assignment

# initialize random number generator (needed for random objects and new points)
rng = np.random.default_rng(seed=42)

# create 10 random x,y pairs for objs
objs = rng.integers(low=-10, high=10, size=(10,2), endpoint=True)

# create 10 random x,y pairs for new_points
new_points = rng.integers(low=-10, high=10, size=(10,2), endpoint=True)

# create distance matrix for every obj to every new_point
distances = cdist(objs, new_points)

# get assignments to minimize the sum of distances for all points
obj_idx, new_point_idx = linear_sum_assignment(distances)

# print the total distance for best assignment
print(distances[obj_idx, new_point_idx].sum())
# 43.421262032835045
rickhg12hs
  • 10,638
  • 6
  • 24
  • 42
  • 1
    Thank you very much! It does exactly what I needed! Never thought I'll need scipy for some unity scripting ;) – Paul Alex Jul 09 '23 at 01:13
0

Lots of different methods; this is not exhaustive:

from io import StringIO

import numpy as np
import pandas as pd
from scipy.optimize import linprog, linear_sum_assignment
import scipy.sparse


def method_greedy(df: pd.DataFrame) -> dict[int, int]:
    """
    Method A:
    keep a set of available targets. greedy-pop them until everyone is assigned.
    not terribly efficient, but it works and it's easy to understand.
    """
    available = list(df.index)
    assigned = {}
    for col_name, full_col in df.items():
        col = full_col.loc[available]
        y = col.argmin()
        assigned[int(col_name)] = col.index[y]
        del available[y]
    return assigned


def method_lp(df: pd.DataFrame) -> np.ndarray:
    """
    Method B: heavy-ish.
    Run a mixed-integer linear program to find the true minimum sum.
    """

    m, n = df.shape
    result = linprog(
        c=df.values.ravel(), integrality=True, bounds=(0, 1),
        # Column sums must equal 1
        A_eq=scipy.sparse.hstack((scipy.sparse.eye(n),)*m),
        b_eq=np.ones(n),
        # Row sums must be at most 1 (kronecker)
        A_ub=scipy.sparse.kron(scipy.sparse.eye(m), np.ones(n)),
        b_ub=np.ones(m),
    )
    assert result.success
    return np.broadcast_to(
        df.index.values[np.newaxis, :],
        (n, m),
    )[(result.x.reshape(df.shape) > 0.5).T]


def method_lsa(df: pd.DataFrame) -> pd.Int64Index:
    """
    Method C (courtesy @rickhg12hs): linear sum assignment
    """
    dest, source = linear_sum_assignment(cost_matrix=df.values)
    dest = dest[source.argsort()]
    return df.index[dest]


def test() -> None:
    with StringIO('''              0          1          2          9          10         11
18     7.305648   8.026363   5.710035  17.495671  23.595815  17.204084
86    25.021771  29.697289  21.702557  26.896933  40.934203  38.877976
201   34.131078  25.249168  25.399301  45.216286  42.278724  24.386605
284   11.190541  20.365760  22.147781   2.798607  18.822864  26.294566
351   34.563530  28.478652  21.878108  45.000284  48.300513  33.026741
393   33.080871  33.124070  22.894585  39.383093  50.000740  41.647761
586   22.026731  15.960542   9.826235  32.721902  35.796940  21.730920
657   22.539747  26.626457  34.888454  18.464566   6.979699  24.862667
664   18.628092  18.880408   8.922881  26.259204  35.471028  27.797514
1067  21.810369  12.634722  17.729529  32.417406  27.951390  10.075525
1113  16.393673  24.246701  20.782216  14.116179  29.544246  32.455269
1196  17.042118  13.581524  23.814823  23.102679  12.043330   7.017820
'''
    ) as f:
        df = pd.read_fwf(f)

    df.columns.name = 'source_point'
    df = df.set_index('Unnamed: 0')
    df.index.name = 'dest_point'

    print(method_greedy(df))
    print(method_lp(df))
    print(method_lsa(df))


if __name__ == '__main__':
    test()
{0: 18, 1: 1067, 2: 664, 9: 284, 10: 657, 11: 1196}
[  18 1067  664  284  657 1196]
Int64Index([18, 1067, 664, 284, 657, 1196], dtype='int64', name='dest_point')
Reinderien
  • 11,755
  • 5
  • 49
  • 77
  • Thank you! I already came with similar algorithm as what you called `method_greedy`. But since I don't know scipy (other than "_it exists, it's used for data analysis and complex mathematics_"), producing other two on my own are beyond my current level of pythonic knowledge. `method_lsa` is similar to answer @rickhg12hs provided above. I've run some tests, method that uses linear_sum_assignment seems to be fastest. Thank you very much! – Paul Alex Jul 09 '23 at 01:11