My answer below reformats the data slightly to store lat long in tuples for some of the columns, hope this is ok but if not please respond and we'll work up the answer.
1. Simulate some plausible patient locations
from haversine import haversine
import pandas as pd
import numpy as np
import random
# number of simulated patient data
NUM_PATIENTS = 10
# a grid for sampling some patient locations from
COORD_LL = (51.578099, -0.232274)
COORD_UR = (52.797460, 1.556070)
GRID_LAT = np.linspace(COORD_LL[0], COORD_UR[0], num=NUM_PATIENTS)
GRID_LONG = np.linspace(COORD_LL[1], COORD_UR[1], num=NUM_PATIENTS)
GRID_LAT = np.around(GRID_LAT, decimals=4)
GRID_LONG = np.around(GRID_LONG, decimals=4)
2. Store locations of hospitals
Next up we'll store the names and latlong coords of the hospitals. In your example above this would be your 24 UK hospitals, again I've just made some things up here.
# names and locations of hospitals
HOSPITALS = dict(
ADDBR=(52.1779, 0.1464),
BURY=(52.2412, 0.6939),
PBOROUGH=(52.5548, -0.2613),
NWICH=(52.6091, 1.2609),
LONDON=(51.5553, -0.0993),
)
3. Assemble dataframe
Now we use the data above to create some lists of data and a dataframe.
# Simulate patient data: generate lists + dataframe
patient_latlongs = tuple(zip(GRID_LAT, GRID_LONG))
patient_id = [i for i in range(len(patient_latlongs))]
unit_visited = [random.choice(list(HOSPITALS.keys())) for x in range(len(patient_latlongs))]
unit_visited_latlong = [HOSPITALS.get(x) for x in unit_visited]
df = pd.DataFrame.from_dict(
{
"patient_ID": patient_id,
"patient_latlong": patient_latlongs,
"unit_visited": unit_visited,
"unit_visited_latlong": unit_visited_latlong,
}
)
Output:
patient_ID patient_latlong unit_visited unit_visited_latlong
0 0 (51.5781, -0.2323) LONDON (51.5553, -0.0993)
1 1 (51.7136, -0.0336) PBOROUGH (52.5548, -0.2613)
2 2 (51.8491, 0.1651) ADDBR (52.1779, 0.1464)
3 3 (51.9846, 0.3638) LONDON (51.5553, -0.0993)
4 4 (52.12, 0.5625) BURY (52.2412, 0.6939)
5 5 (52.2555, 0.7613) PBOROUGH (52.5548, -0.2613)
6 6 (52.391, 0.96) LONDON (51.5553, -0.0993)
7 7 (52.5265, 1.1587) LONDON (51.5553, -0.0993)
8 8 (52.662, 1.3574) LONDON (51.5553, -0.0993)
9 9 (52.7975, 1.5561) BURY (52.2412, 0.6939)
4. Find nearest hospital
We write a function for finding the nearest hospital. This is probabaly a bit bespoke to our example. As mentioned above comments, haversine is a very convenient library for this. This function returns the key of the nearest hospital. We can look this up in our hospitals
dict.
def find_nearest_hospital(latlong: tuple, hospitals: dict) -> str:
"""
Calculate nearest hospital and return name of it. Assumes a dict
storing hospital names as keys + latlong as tuples.
latlong: tuple
input latlong tuple
hospitals: dict
key / value pairs storing hospital names as keys and locations in latlong tuple values
returns:
name of closest hospital
"""
distances = {}
for hospital, location in hospitals.items():
distances.update({hospital: haversine(latlong, location)})
return min(distances, key=distances.get)
5. Compute distances
Assign new columns in dataframe calculating the nearest hospitals to the patients. Transform is a bit faster than apply, ideally we'd probably use a numpy vectorized function but this might be fast enough for your use case. If not, write back and we can take a look.
df = df.assign(
closest_unit=df["patient_latlong"].transform(lambda x: find_nearest_hospital(x, HOSPITALS)),
closest_unit_lat=lambda x: x["closest_unit"].replace(
{k: v[0] for k, v in HOSPITALS.items()},
),
closest_unit_long=lambda x: x["closest_unit"].replace(
{k: v[1] for k, v in HOSPITALS.items()},
),
visited_closest=lambda x: (x["closest_unit"] == x["unit_visited"]),
)
Output:
patient_ID patient_latlong unit_visited unit_visited_latlong \
0 0 (51.5781, -0.2323) PBOROUGH (52.5548, -0.2613)
1 1 (51.7136, -0.0336) BURY (52.2412, 0.6939)
2 2 (51.8491, 0.1651) ADDBR (52.1779, 0.1464)
3 3 (51.9846, 0.3638) NWICH (52.6091, 1.2609)
4 4 (52.12, 0.5625) ADDBR (52.1779, 0.1464)
5 5 (52.2555, 0.7613) ADDBR (52.1779, 0.1464)
6 6 (52.391, 0.96) NWICH (52.6091, 1.2609)
7 7 (52.5265, 1.1587) LONDON (51.5553, -0.0993)
8 8 (52.662, 1.3574) LONDON (51.5553, -0.0993)
9 9 (52.7975, 1.5561) PBOROUGH (52.5548, -0.2613)
closest_unit closest_unit_lat closest_unit_long visited_closest
0 LONDON 51.5553 -0.0993 False
1 LONDON 51.5553 -0.0993 False
2 ADDBR 52.1779 0.1464 True
3 ADDBR 52.1779 0.1464 False
4 BURY 52.2412 0.6939 False
5 BURY 52.2412 0.6939 False
6 BURY 52.2412 0.6939 False
7 NWICH 52.6091 1.2609 False
8 NWICH 52.6091 1.2609 False
9 NWICH 52.6091 1.2609 False
On a related/unreleated note, writing from a neurological ward.