I always have had trouble with loops so I am asking here. 2 dataframes. 1 very large and 1 much smaller. Sample versions below.
Dataframe 1
ID Value
1 apples
1 apples
1 bananas
1 grapes
1 mangoes
1 oranges
1 grapes
1 apples
1 grapes
2 apples
2 apples
2 passionfruits
2 bananas
2 apples
2 apples
2 passionfruits
2 grapes
2 mangoes
2 apples
3 apples
3 bananas
3 oranges
3 apples
3 grapes
3 grapes
3 passionfruits
3 passionfruits
3 oranges
4 apples
4 oranges
4 mangoes
4 bananas
4 grapes
4 grapes
4 grapes
4 apples
4 oranges
4 grapes
4 mangoes
4 mangoes
4 apples
4 oranges
5 passionfruits
5 apples
5 oranges
5 oranges
5 mangoes
5 grapes
5 apples
5 bananas
Dataframe 2
Value
apples
apples
bananas
grapes
mangoes
mangoes
grapes
apples
apples
The different IDs in dataframe 1 are considered as sets. The dataframe 2 in its entirety will be an approximate or exact match to one of the sets. I know there is plenty of code to filter using the entire dataframe 2 to match with 1. But that is not what I require. I require it to filter sequentially value by value with conditions attached. The condition should be whether the previous value matches.
So in this example with the first value nothing happens because all IDs have 'apples'. The second value = 'apples' given that previous value='apples' filters out ID = 4 because it doesnt contain 'apples' occurring twice in a row. Now in the filtered dataframe 1 we search for the third value and so on. It stops only when 1 ID set remains in Dataframe 1. So in this case after the 3rd iteration. Result should be
Dataframe 1
ID Value
1 apples
1 apples
1 bananas
1 grapes
1 mangoes
1 oranges
1 grapes
1 apples
1 grapes