0

I have a list of lists that looks like:

[['chr1', '3088', '1', 744, 'L1MCc_dup1']
['chr1', '3089', '1', 744, 'L1MCc_dup1']
['chr1', '3090', '1', 744, 'L1MCc_dup1']
['chr1', '15037', '1', 96, 'MER63B']
['chr1', '15038', '1', 96, 'MER63B']
['chr1', '15039', '1', 96, 'MER63B']
['chr1', '15040', '1', 96, 'MER63B']
['chr1', '19465', '1', 418, 'MLT2B4_dup1']
['chr1', '19466', '1', 418, 'MLT2B4_dup1']
['chr1', '19467', '1', 418, 'MLT2B4_dup1']]

I need to make the equivalent of a sumifs function in python (as the file is too big for excel) to sum the contents of column 3 based on the identifier in column 5 (output can be some version of L1MCc_dup1 is 3, MER63B is 4 and MLT2B4_dup1 is 3).

Any advice/help to make this function?

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
user1456501
  • 11
  • 1
  • 1
  • 2
    http://www.whathaveyoutried.com - SO is not a place to get someone else to write your code for you. What have you tried, why doesn't it work? – Gareth Latty Jun 14 '12 at 14:56

2 Answers2

1

Use a dictionary:

d = {}
for row in my_list:
    key = row[4]
    value = int(row[2])
    d[key] = d.get(key, 0) + value

After this loop, d will map the key values in the last column to the desired sums.

You could also use collections.defaultdict instead of a normal dictionary.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
1
>>> d =[['chr1', '3088', '1', 744, 'L1MCc_dup1'],
['chr1', '3089', '1', 744, 'L1MCc_dup1'],
['chr1', '3090', '1', 744, 'L1MCc_dup1'],
['chr1', '15037', '1', 96, 'MER63B'],
['chr1', '15038', '1', 96, 'MER63B'],
['chr1', '15039', '1', 96, 'MER63B'],
['chr1', '15040', '1', 96, 'MER63B'],
['chr1', '19465', '1', 418, 'MLT2B4_dup1'],
['chr1', '19466', '1', 418, 'MLT2B4_dup1'],
['chr1', '19467', '1', 418, 'MLT2B4_dup1']]
>>> sum(map(lambda x: x[3], filter(lambda x: x[4] == 'MLT2B4_dup1', d)))
1254

Sum of all column 4 values (I assume you meant that because it was the only int column), where the last column equals to 'MLT2B4_dup1'. You can change that to any other condition of course.

poke
  • 369,085
  • 72
  • 557
  • 602