-1

I am trying to extract values between two underscores. For that I have written this code:

patient_ids = []
for file in files:
    print(file)
    patient_id = re.findall("_(.*?)_", file)
    patient_ids.append(patient_id)

print(patient_ids) 

Output:

PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]

So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.

What change should I make to my code?

Debbie
  • 911
  • 3
  • 20
  • 45

4 Answers4

2

Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:

patient_ids = []
for file in files:
    print(file)
    patient_ids.extend(re.findall("_(.*?)_", file))

print(patient_ids) 
miike3459
  • 1,431
  • 2
  • 16
  • 32
1

Just replace the last line of your for loop by :

patient_ids.extend(int(patient_id))

extend will flatten your results, and int(patient_id) will convert the string to int

Matina G
  • 1,452
  • 2
  • 14
  • 28
1

You need to flatten your results, e.g. like that:

 patient_ids = [item for sublist in patient_ids for item in sublist]
 print flat_list
 # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']
mrzasa
  • 22,895
  • 11
  • 56
  • 94
1

You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:

patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
#                       ^^^^^ ^^^^^^  modified to have 2 ids for demo-purposes


# if you want to keep the boxing
numms   = [ list(map(int,m)) for m in patient_ids]  

# converted and flattened
numms2  = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]  


print(numms) 

print(numms2) 

Output:

# this keeps the findall results together in inner lists
[[112], [114, 4711], [115], [116], [117], [118], [119]]

# this flattens all results
[112, 114, 4711, 115, 116, 117, 118, 119]

Doku:

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69