1

I have a data set of the format key1=value1, key2=value2, key3=value3... where each key-value pair is separated from the others by a ", ".

However, some values are long strings that contain ", " as part of the value.

How can I correctly go through this data and convert it to csv?

I've tried using a csv.reader, but it doesn't work.

 data = row.lstrip('(').rstrip(')\n')                               
 reader = csv.reader(StringIO(data))                                
 for row2 in reader:                                                
     my_dict = {}                                                   
     for d in row2:                                                 
         my_dict[d.split('=')[0].lstrip()] = d.split('=', 1)[1]                                               
Jonathan Allen Grant
  • 3,408
  • 6
  • 30
  • 53
  • 5
    I think a few sample rows would be useful. – it's-yer-boy-chet Sep 20 '19 at 20:24
  • 2
    Can the value contain a `=`? If so, how can you tell if `key1=1, key2=2` sets `key1` to `1` and `key2` to `2`, or if it sets `key1` to `1, key2=2`? – chepner Sep 20 '19 at 20:34
  • Possible duplicate of [How to split comma-separated key-value pairs with quoted commas](https://stackoverflow.com/questions/27575708/how-to-split-comma-separated-key-value-pairs-with-quoted-commas) – Dishin H Goyani Oct 07 '19 at 05:44

2 Answers2

2

You can use re.findall with itertools.groupby:

import re, itertools as it
def get_vals(d):
   r = [(a, list(b)) for a, b in it.groupby(re.findall('\w+\=|[^\s,]+', d), key=lambda x:x[-1] == '=')]
   return {r[i][-1][0][:-1]:', '.join(r[i+1][-1]) for i in range(0, len(r), 2)}

tests = ['key1=value1, key2=value2, key3=value3', 'key1=va, lue1, key2=valu, e2, test, key3=value3']
print(list(map(get_vals, tests)))

Output:

[{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}, 
{'key1': 'va, lue1', 'key2': 'valu, e2, test', 'key3': 'value3'}]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
1

Using @Ajax1234's sample, re.split() and lookahead:

import re
str="key1=value1, key2=value2, key3=value3, key1=va, lue1, key2=valu, e2, test, key3=value3"
re.split(", (?=[^ ]+=)",str)
['key1=value1', 'key2=value2', 'key3=value3', 'key1=va, lue1', 'key2=valu, e2, test', 'key3=value3']
James Brown
  • 36,089
  • 7
  • 43
  • 59