I would like to match names and emails that are spelled differently in various datasets on our BigQuery Data Warehouse.
I've done cursive research on Fuzzy Matching on BigQuery.
- This question is 6 years old.
- I do not think I can use metaphones as the majority of user names are transliterated from non-English names.
- I found this exciting article series but it seems that this is way advanced and perhaps over the top for this modest use (reconciling user identities) and it also depends on non-SQL scripts.
So back to SO: how can I fuzzy match emails and or names? Here are fictive examples:
- benjamin.mckinnen ~ benjamin.mackinnen
- muhammad.aktar ~ mohammad.akhtar
- georgerobert.martin ~ george.martin
- voravut.pisombhon ~ worawuth.pisombong
Can Levenshtein Distance be calculate in pure SQL? Are there alternatives?