1

I have a data in a format as shown below which can be genarated using the Python code given below

df = pd.DataFrame({'Person_id':[1,2,3,4],
'Values':['father:1.Yes 2.No 3.Do not Know','Mother:1.Yes 2.No 3.Do not 
 Know','sons:1.Yes 2.No 3.Do not Know','daughter:1.Yes 2.No 3.Do not Know'],
   'Ethnicity':['dffather','dfmother','dfson','dfdaughter']})

enter image description here

What I would like to do is split the 'Values' cell content into 3 separate rows/ n seperate rows based on the number of values available in the cell. In this case, we have 3 options (1.Yes, 2.No and 3. Do not Know). I don't wish to retain the text like father, mother, son etc. I only wish to have the options

How can I get my output to be like as shown below

enter image description here

Please note that the options and values might differ in real time. What I have shown is a sample and there is no pattern that exists in terms of answer options.

The Great
  • 7,215
  • 7
  • 40
  • 128
  • Cool. A task. Your task? What did you do to solve it? Where is your specific problem with your solution? Please show us your [mce]. – Patrick Artner Jun 08 '19 at 16:30
  • 1
    I did some Pandas work and I got the data back into VBA as a Variant array (ready to be pasted into cells). https://exceldevelopmentplatform.blogspot.com/2018/06/python-pandas-vba-return-pandas-pivot.html That's my blog and I use the Python gateway class pattern. `toList()` is key to converting a Python data structure to be something consumable by VBA – S Meaden Jun 08 '19 at 18:52
  • Will try the answers and update the solution here soon. – The Great Jun 09 '19 at 01:30
  • Both the answers provided below is working perfectly fine. I am marking Python approach as it's easy to understand for a beginner like me. Nonetheless, Excel VBA is also a valid and working solution. Upvoted both the answers – The Great Jun 09 '19 at 05:09

2 Answers2

1

One way can be (pandas):

s=df.Values.str.findall('(\D+)').str[1:]
m=(df.reindex(df.index.repeat(s.str.len()))
   .assign(new_Value=np.concatenate(s.values)).drop('Values',1))
m.new_Value=m.groupby('Person_id').cumcount().add(1).astype(str)+m.new_Value
print(m)

   Person_id   Ethnicity      new_Value
0          1    dffather         1.Yes 
0          1    dffather          2.No 
0          1    dffather  3.Do not Know
1          2    dfmother         1.Yes 
1          2    dfmother          2.No 
1          2    dfmother  3.Do not Know
2          3       dfson         1.Yes 
2          3       dfson          2.No 
2          3       dfson  3.Do not Know
3          4  dfdaughter         1.Yes 
3          4  dfdaughter          2.No 
3          4  dfdaughter  3.Do not Know
anky
  • 74,114
  • 11
  • 41
  • 70
  • 1
    Hello @anky_91 - I did try your answer. It works fine. However to get the output as expected, I added the below two lines as I don't wish to repeat the values in Ethnicity and Person_id columns. Though I did this, felt it is a lengthy/round about way to do. Here is my additional piece of code to make it NA for repeating values "m['Ethnicity']=m[('Ethnicity')].mask(~m['new_Value'].str.startswith('1.')) m['Person_id']=m[('Person_id')].mask(~m['new_Value'].str.startswith('1.'))". – The Great Jun 09 '19 at 05:05
  • 1
    @AVLES i guess you then need `m.loc[m.duplicated(['Person_id','Ethnicity']),['Person_id','Ethnicity']]=''` and then print m to check. :) – anky Jun 09 '19 at 05:08
  • In the above example, if I would like to order inputs based on their source value, can you let me know how can it be done? Ex: 1:Yes, 2: No, 777:Not Applicable, 999: Do not Know. As you can see instead of 3 and 4 it is 777 and 999. I would like to retain these values as it is – The Great Jun 11 '19 at 04:05
  • @AVLES umm.. i think that would require a different(complex) approach. Since this question is answered, i would suggest you to post a fresh question. – anky Jun 11 '19 at 04:26
  • Thank you @anky_91. Here is the link https://stackoverflow.com/questions/56536954/split-rows-of-dataframe-and-store-them-as-individual-rows-in-the-same-dataframe – The Great Jun 11 '19 at 05:20
  • @AVLES thanks i will try solving this once i an free at work – anky Jun 11 '19 at 05:26
1

From a VBA perspective, this is one way you could do it:

Option Explicit

Sub splitVals()

Dim ws As Worksheet: Set ws = ActiveWorkbook.Sheets("Sheet Name")
Dim arrData As Variant, arrValues() As String
Dim arrTmp() As String: ReDim arrTmp(1 To 2, 1 To 1)
Dim arrFinal() As String

Dim lrow As Long: lrow = ws.Cells(Rows.Count, 1).End(xlUp).Row
Dim R As Long, C As Long, X As Long, Z As Long

arrData = ws.Range("A1:C" & lrow)

For R = LBound(arrData) + 1 To UBound(arrData)
    arrData(R, 2) = Replace("df" & arrData(R, 2), arrData(R, 3) & ":", "")
    arrValues = Split(arrData(R, 2), ".")
    For X = LBound(arrValues) To UBound(arrValues)
        If X + 1 = UBound(arrValues) Then
            arrValues(X) = X + 1 & "." & arrValues(X + 1)
            ReDim Preserve arrValues(X)
            Exit For
        Else
            arrValues(X) = X + 1 & "." & Left(arrValues(X + 1), Len(arrValues(X + 1)) - 2)
        End If
    Next X

    For X = LBound(arrValues) To UBound(arrValues)
        Z = Z + 1
        ReDim Preserve arrTmp(1 To 2, 1 To Z)
        If X = 0 Then arrTmp(1, Z) = R - 1
        arrTmp(2, Z) = arrValues(X)
    Next X
Next R

ReDim arrFinal(LBound(arrTmp, 2) To UBound(arrTmp, 2), LBound(arrTmp) To UBound(arrTmp))
For R = LBound(arrFinal) To UBound(arrFinal)
    For C = LBound(arrFinal, 2) To UBound(arrFinal, 2)
        arrFinal(R, C) = arrTmp(C, R)
    Next C
Next R

With ws.Range("E1")
    .Resize(UBound(arrFinal), UBound(arrFinal, 2)) = arrFinal
End With

End Sub

And the result: enter image description here

FAB
  • 2,505
  • 1
  • 10
  • 21