0

I have a Series of Shapely Polygons (>1000), which do not overlap. And I want to introduce a new shapely point and want to know fast, in which polygon the point would be. I have a for loop for this but I am looking for a method that would be faster.

from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

test_points  = pd.Series([[(0,1), (1,1), (1,0)], [(0,0), (0,1), (1,0)]])

# a Dataframe containing my polygons and an id
polygonized_points = pd.DataFrame({"polygons" : test_points.map(lambda x : Polygon(x)), "id" : range(0, len(test_points), 1)})

# a new point
new_point = Point(0.4, 0.3)

# allocation of point to hexes (which I want to be faster)
for idx, row in polygonized_points.iterrows() :
    if row.polygons.contains(new_point) :
        new_point_in_id = row.id # imagine this would be a df with an empty column for the id variable

I'm seriously sure I missed something to speed this up b/c I don't think the for loop scales well. Thank you for your help! Best

Racooneer
  • 329
  • 1
  • 2
  • 11

1 Answers1

0

The for-loop is not the problem in this case: The point in polygon test is slow. Optimizing your code means optimizing the number of point in polygon tests, which is typically done using a spatial index. This answer: https://gis.stackexchange.com/a/119935 from GIS Stack-Exchange does a good job to list a number of possible spatial index strategies. The for loop is with some 1000's of repetitions of no concern. A very good possibility is to use an R-Tree, like from this Python package: https://toblerity.org/rtree/. The R-Tree searches efficiently after fitting bounding boxes (of your Polygons). After that you perform the costly point in polygon test only for Polygons having the point in their boundion box (usually 2-5).

oekopez
  • 1,080
  • 1
  • 11
  • 19
  • That's an interesting strategy, which will take me some time to test... I tried splitting the area into subsections already, what accelerated the processes significatnly and I thought about using the min euclidean distance instead of the polygon test as a kind of fuzzy method... but everthing seems to end in bulky code that could potentially produce edge cases I cannot predict. Thanks a lot! – Racooneer Aug 06 '20 at 13:46
  • after intense testing, I have to say that rtree is quite expensive in calculation time which leads it to be slower than simple splitting methods of the initial map – Racooneer Aug 29 '20 at 14:52