1

Suppose my dictionary contains > 100 elements and one or two elements have values different than other values; most values are the same (12 in the below example). How can I remove these a few elements?

Diction = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}

I want a dictionary object:

Diction = {1:12,2:12,4:12,5:12,6:12,7:12}
Z. Zhang
  • 637
  • 4
  • 16
  • What defines “ different than other values”? Doesn’t end in `:12`? – Chase McDougall Nov 18 '22 at 02:56
  • the value of 12 are the same for most elements; and two elements are different, I expect to remove these two. It is a python dictionary. – Z. Zhang Nov 18 '22 at 02:59
  • Does this answer your question? [Removing entries from a dictionary based on values](https://stackoverflow.com/questions/15158599/removing-entries-from-a-dictionary-based-on-values) – Chase McDougall Nov 18 '22 at 03:02
  • I do not think so. That example have fixed value to remove an element; but my example does not have a fixed value; I want to remove elements with outlier values. For example, most elements (>100 in number) have value 2345, one or two elements have value 1223; and these minor elements should be removed. – Z. Zhang Nov 18 '22 at 03:05

2 Answers2

0
d = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}
new_d = {}

unique_values = []
unique_count = []
most_occurence = 0

# Find unique values
for k, v in d.items():
    if v not in  unique_values:
        unique_values.append(v)

# Count their occurrences
def count(dict, unique_value):
    count = 0
    for k, v in d.items():
        if v == unique_value:
            count +=1

    return count

for value in unique_values:
    occurrences = count(d, value)
    unique_count.append( (value, occurrences) )

# Find which value has most occurences
for occurrence in unique_count:
    if occurrence[1] > most_occurence:
        most_occurence = occurrence[0]

# Create new dict with keys of most occurred value
for k, v in d.items():
    if v == most_occurence:
        new_d[k] = v

print(new_d)

Nothing fancy, but direct to the point. There should be many ways to optimize this.

Output: {1: 12, 2: 12, 4: 12, 5: 12, 6: 12, 7: 12}
Niko
  • 3,012
  • 2
  • 8
  • 14
0

It may be a bit slow because of the looping (especially as the size of the dictionary gets very large) and have to use numpy, but this will work

import numpy as np

Diction = {1:12,2:12,3:23,4:12,5:12,6:12,7:12,8:2}

dict_list = []
for x in Diction:
    dict_list.append(Diction[x])
    
dict_array = np.array(dict_list)
unique, counts = np.unique(dict_array, return_counts=True)
most_common = unique[np.argmax(counts)]

new_Diction = {}
for x in Diction:
    if Diction[x] == most_common:
        new_Diction[x] = most_common
        
print(new_Diction)

Output

{1: 12, 2: 12, 4: 12, 5: 12, 6: 12, 7: 12}