I have been trying to solve this issue for a while but I can't seem to think of a right solution.
Basically, I am parsing few pdfs and depending on the source of the pdf, the terminology used is different. For example, source A1 writes 'Batman' as 'The Batman'. Source B2 writes it as 'bat man'.
So what I tried to do is create a dictionary:
Voc_dict = {'Batman':'Batman',
'the Batman': 'Batman',
'bat man': 'Batman'}
Assume this dictionary extends to other superhero names.
So, I am trying to standardize the following 2d list:
Super_list = [['among the heros with daddy issues, the bat man shines'], ['Bat man protects the city with everything he gots']]
You get the picture.
Apologies for the format and stupid example. I can't find more relatable one and it is my first time using mobile app.
Thank guys.
What I did is the following: Loop through the list and loop through dictionary.
For i in super_list:
For key, value in voc_dict.items():
i.replace(voc_dict[key], voc_dict[value])