Split a single excel cell content into different cells using Python or Excel vba

Question

I have a data in a format as shown below which can be genarated using the Python code given below

df = pd.DataFrame({'Person_id':[1,2,3,4],
'Values':['father:1.Yes 2.No 3.Do not Know','Mother:1.Yes 2.No 3.Do not 
 Know','sons:1.Yes 2.No 3.Do not Know','daughter:1.Yes 2.No 3.Do not Know'],
   'Ethnicity':['dffather','dfmother','dfson','dfdaughter']})

What I would like to do is split the 'Values' cell content into 3 separate rows/ n seperate rows based on the number of values available in the cell. In this case, we have 3 options (1.Yes, 2.No and 3. Do not Know). I don't wish to retain the text like father, mother, son etc. I only wish to have the options

How can I get my output to be like as shown below

Please note that the options and values might differ in real time. What I have shown is a sample and there is no pattern that exists in terms of answer options.

Cool. A task. Your task? What did you do to solve it? Where is your specific problem with your solution? Please show us your [mce]. — Patrick Artner, Jun 08 '19 at 16:30
I did some Pandas work and I got the data back into VBA as a Variant array (ready to be pasted into cells). https://exceldevelopmentplatform.blogspot.com/2018/06/python-pandas-vba-return-pandas-pivot.html That's my blog and I use the Python gateway class pattern. `toList()` is key to converting a Python data structure to be something consumable by VBA — S Meaden, Jun 08 '19 at 18:52
Both the answers provided below is working perfectly fine. I am marking Python approach as it's easy to understand for a beginner like me. Nonetheless, Excel VBA is also a valid and working solution. Upvoted both the answers — The Great, Jun 09 '19 at 05:09

anky · Accepted Answer · 2019-06-08T16:34:28.263

1

One way can be (pandas):

s=df.Values.str.findall('(\D+)').str[1:]
m=(df.reindex(df.index.repeat(s.str.len()))
   .assign(new_Value=np.concatenate(s.values)).drop('Values',1))
m.new_Value=m.groupby('Person_id').cumcount().add(1).astype(str)+m.new_Value
print(m)

   Person_id   Ethnicity      new_Value
0          1    dffather         1.Yes 
0          1    dffather          2.No 
0          1    dffather  3.Do not Know
1          2    dfmother         1.Yes 
1          2    dfmother          2.No 
1          2    dfmother  3.Do not Know
2          3       dfson         1.Yes 
2          3       dfson          2.No 
2          3       dfson  3.Do not Know
3          4  dfdaughter         1.Yes 
3          4  dfdaughter          2.No 
3          4  dfdaughter  3.Do not Know

edited Jun 08 '19 at 16:34

answered Jun 08 '19 at 16:29

anky

74,114
11
41
70

1

Hello @anky_91 - I did try your answer. It works fine. However to get the output as expected, I added the below two lines as I don't wish to repeat the values in Ethnicity and Person_id columns. Though I did this, felt it is a lengthy/round about way to do. Here is my additional piece of code to make it NA for repeating values "m['Ethnicity']=m[('Ethnicity')].mask(~m['new_Value'].str.startswith('1.')) m['Person_id']=m[('Person_id')].mask(~m['new_Value'].str.startswith('1.'))". – The Great Jun 09 '19 at 05:05
1

@AVLES i guess you then need `m.loc[m.duplicated(['Person_id','Ethnicity']),['Person_id','Ethnicity']]=''` and then print m to check. :) – anky Jun 09 '19 at 05:08
In the above example, if I would like to order inputs based on their source value, can you let me know how can it be done? Ex: 1:Yes, 2: No, 777:Not Applicable, 999: Do not Know. As you can see instead of 3 and 4 it is 777 and 999. I would like to retain these values as it is – The Great Jun 11 '19 at 04:05
@AVLES umm.. i think that would require a different(complex) approach. Since this question is answered, i would suggest you to post a fresh question. – anky Jun 11 '19 at 04:26
Thank you @anky_91. Here is the link https://stackoverflow.com/questions/56536954/split-rows-of-dataframe-and-store-them-as-individual-rows-in-the-same-dataframe – The Great Jun 11 '19 at 05:20
@AVLES thanks i will try solving this once i an free at work – anky Jun 11 '19 at 05:26

score 1 · Answer 2 · answered Jun 08 '19 at 16:29

From a VBA perspective, this is one way you could do it:

Option Explicit

Sub splitVals()

Dim ws As Worksheet: Set ws = ActiveWorkbook.Sheets("Sheet Name")
Dim arrData As Variant, arrValues() As String
Dim arrTmp() As String: ReDim arrTmp(1 To 2, 1 To 1)
Dim arrFinal() As String

Dim lrow As Long: lrow = ws.Cells(Rows.Count, 1).End(xlUp).Row
Dim R As Long, C As Long, X As Long, Z As Long

arrData = ws.Range("A1:C" & lrow)

For R = LBound(arrData) + 1 To UBound(arrData)
    arrData(R, 2) = Replace("df" & arrData(R, 2), arrData(R, 3) & ":", "")
    arrValues = Split(arrData(R, 2), ".")
    For X = LBound(arrValues) To UBound(arrValues)
        If X + 1 = UBound(arrValues) Then
            arrValues(X) = X + 1 & "." & arrValues(X + 1)
            ReDim Preserve arrValues(X)
            Exit For
        Else
            arrValues(X) = X + 1 & "." & Left(arrValues(X + 1), Len(arrValues(X + 1)) - 2)
        End If
    Next X

    For X = LBound(arrValues) To UBound(arrValues)
        Z = Z + 1
        ReDim Preserve arrTmp(1 To 2, 1 To Z)
        If X = 0 Then arrTmp(1, Z) = R - 1
        arrTmp(2, Z) = arrValues(X)
    Next X
Next R

ReDim arrFinal(LBound(arrTmp, 2) To UBound(arrTmp, 2), LBound(arrTmp) To UBound(arrTmp))
For R = LBound(arrFinal) To UBound(arrFinal)
    For C = LBound(arrFinal, 2) To UBound(arrFinal, 2)
        arrFinal(R, C) = arrTmp(C, R)
    Next C
Next R

With ws.Range("E1")
    .Resize(UBound(arrFinal), UBound(arrFinal, 2)) = arrFinal
End With

End Sub

And the result:

Split a single excel cell content into different cells using Python or Excel vba

2 Answers2