As I'm fairly new to python I've tried various ways based on answers found here but can't manage to normalize my json file.
As I checked in Postman it has 4 levels of nested arrays. For suppliers I want to expand all the levels of data.The problem I have is with the score_card
subtree for which I want to pull out all risk_groups
, then name
,risk_score
and all risks
together with name
,risk_score
and indicators
with the deepest level containing again name
and risk_score
.
and I'm not sure whether it is possible in one go based on ambiguous names between levels.
Below I'm sharing a reproducible example:
data = {"suppliers": [
{
"id": 1260391,
"name": "2712270 MANITOBA LTD",
"duns": "rm-071-7291",
"erp_number": "15189067;15189067",
"material_group_ids": [
176069
],
"business_unit_ids": [
13728
],
"responsible_user_ids": [
37761,
37760,
37759,
37758,
37757,
37756,
36520,
37587,
36494,
22060,
37742,
36446,
36289
],
"address": {
"address1": "BOX 550 NE 16-20-26",
"address2": None,
"zip_code": "R0J 1W0",
"city": "RUSSELL",
"state": None,
"country_code": "ca",
"latitude": 50.7731176,
"longitude": -101.2862461
},
"score_card": {
"risk_score": 26.13,
"risk_groups": [
{
"name": "Viability",
"risk_score": 43.33,
"risks": [
{
"name": "Financial stability supplier",
"risk_score": None,
"indicators": [
{
"name": "Acquisitions",
"risk_score": None
}
]
}
]
},
]
}
}
]
}
And here is how it should look:
expected = [[1260391,'2712270 MANITOBA LTD','rm-071-7291',176069,13728,[37761,37760,
37759,37758,37757,37756,36520,37587,36494,22060,37742,36446,36289],
'BOX 550 NE 16-20-26','None','R0J 1W0','RUSSELL','None','ca',50.7731176,
-101.2862461,26.13,'Viability',43.33,'Financial stability supplier','None',
'Acquisitions','None']]
df = pd.DataFrame(expected,columns=['id','name','duns','material_groups_ids',
'business_unit_ids','responsible_user_ids',
'address.address1','address.address2','address.zip_code',
'address.city','address.state','address.country_code','address.latitude',
'address.longitude','risk_score','risk_groups.name',
'risk_groups.risk_score','risk_groups.risks.name',
'risk_groups.risks.risk_score','risk_groups.risks.indicators.name',
'risk_groups.risks.indicators.risk_score'])
What I tried since last 3 days is to use json_normalize:
example = pd.json_normalize(data['suppliers'],
record_path=['score_card','risk_groups','risks',
'indicators'],meta = ['id','name','duns','erp_number','material_group_ids',
'business_unit_ids','responsible_user_ids','address',['score_card','risk_score']],record_prefix='_')
but when I specify the fields for record_path which seems to be a vital param here it always return this prompt:
TypeError: {'name': 'Acquisitions', 'risk_score': None} has non list value Acquisitions for path name. Must be list or null.
What I want to get a a table with columns as presented below:
So it seems that python doesn't know how to treat most granular level which contains null values.
I've tried approach from here: Use json_normalize to normalize json with nested arrays but unsuccessfully. I would like to get the same view which I was able to 'unpack' in Power BI but this way of having the data is not satisfying me. All risk indicators are the components of risks which are part of risk groups as You can see on attached pictures. Going to the most granular level it goes this way: Risk groups -> Risks -> Indicators so I would like to unpack it all.
Any help more than desperately needed. Kindly thank You for any support here.