Python or Pandas Data Summarization(Converting Table into Dictionary of row_name:[col_name, value]....)

Question

Please find the data below where the Index name is repeated 3 to 5 times as the monthly report got consolidated for 3months improperly, before being handed over to me.

Name    Score   Rating
Peralta 0   40
Peralta 20  0
Peralta 0   0
Amy 0   40
Amy 20  40
Amy -20 40
Terry   0   0
Terry   -20 40
Terry   0   -40
Gina    20  0
Gina    0   0
Gina    -20 40

I have come so far with only pulling the column name as shown in the code in the end. But I am unable to bring the "Column Name" and "Cell Value" in place together.

I need to summarize the data in the below format: (the results can have tabs or commas, as per your comfort.)

### Final Result
# IndexName [col_name, cell_value]  [sum of positive numbers, result] [sum of negative numbers, result]


Peralta [Rating, 40]    [Score, 20]     [Sum_Total_of_positive_numbers, 60]
Amy     [Rating, 40]    [Score, 20]     [Rating, 40]    [Score, -20]    [Rating, 40]    [Sum_Total_of_positive_numbers, 140]    [Sum_Total_of_negative_numbers, -20]
Terry   [Score, -20]    [Rating, 40]    [Rating, -40]   [Sum_Total_of_positive_numbers, 40] [Sum_Total_of_negative_numbers, -60]
Gina    [Score, 20]     [Score, -20]    [Rating, 40]    [Sum_Total_of_positive_numbers, 60] [Sum_Total_of_negative_numbers, -20]

All the standard pandas functions for summarizing data have not helped, and/or the other libraries available to bring the summary in the above format doesn't exist.

The following code actually helped in bringing the column name down, but then the value of the Cell, pertaining to that column+row, isn't coming in..

for k, v in dff_dict.items():               # k: name of index, v: is a df
    check = v.columns[(v == 20).any()]
    if len(check) > 0:
        print((k, check.to_list()), file=open("output.txt", "a"))

Is there some way that I can achieve the "Final Result" as shown in the 2nd table, please? (the end result not necessarily in any particular format, like a dataframe or table) Thank you..

Note:

The original data may contain more than 20 columns hundreds of rows, but I have simplified the data, so pleassse do not limit to only a few rows or columns.'

Also the last 2 bracketed value shows the sum total of all the positive and negative numbers in the row, after bringing down the values in place. Positive numbers and negative numbers separately.

There is a helpful article here (Closest Possible answer), but then I need [column_name, value] unlike just the [row:value]

your "Final Result" doesn't seem to be a tabular format, do you want it out as a list of dictionaries ? — Jimmar, Aug 07 '20 at 16:12
Dear @Jimmar, yes please, it can be of any format and does not need to be a pandas dataframe or any table at all.. Because it will be finally read and analysed by non-programmers who have absolutely no coding knowledge nor any interest in excel.. They just need the data in <"Person name" and their details> format with the last item being the Sum total of all the positive numbers separately and negative number separately.... You solution is welcome in any format — Lokkii9, Aug 07 '20 at 18:38
@Jimmar Kindly find a minor edit in the above, where the sum of + and - are separately bracketed please... Thank you so much again... — Lokkii9, Aug 07 '20 at 18:46
please edit this question for clarity. There are grammar issues and I don't know what the desired output is — anon01, Aug 08 '20 at 05:17
@anon01 the desired result is the second coded section ,, starting with the comment **"### Final Result"**... Result is again demonstrated below the comment in second coded section... — Lokkii9, Aug 08 '20 at 05:43

score 1 · Accepted Answer · answered Aug 08 '20 at 06:09

I think this is easier if you can split the overall data set into two, given the "positive and negative numbers sum separately" requirement.

Starting with your sample data:

import pandas as pd
import numpy as np
data = [
{"Name": "Peralta", "Score": 0, "Rating": 40},
{"Name": "Peralta", "Score": 20, "Rating": 0},
{"Name": "Peralta", "Score": 0, "Rating": 0},
{"Name": "Amy", "Score": 0, "Rating": 40},
{"Name": "Amy", "Score": 20, "Rating": 40},
{"Name": "Amy", "Score": -20, "Rating": 40},
{"Name": "Terry", "Score": 0, "Rating": 0},
{"Name": "Terry", "Score": -20, "Rating": 40},
{"Name": "Terry", "Score": 0, "Rating": -40},
{"Name": "Gina", "Score": 20, "Rating": 0},
{"Name": "Gina", "Score": 0, "Rating": 0},
{"Name": "Gina", "Score": -20, "Rating": 40},
]
df = pd.DataFrame(data).set_index("Name")

we can get positive- and negative- valued projections to work with:

df_pos = df.where(df>=0, other=0)
df_neg = df.where(df<0, other=0)

and then group and sum to get your desired results:

df_pos = df_pos.groupby(by="Name").sum()
df_pos["total_positive"] = df_pos.apply(np.sum, axis=1)

df_neg = df_neg.groupby(by="Name").sum()
df_neg["total_negative"] = df_neg.apply(np.sum, axis=1)

Note - at this stage the data is still in two dataframes, not flattened out to the [field, value] format you showed.

Dear @Josh, thanks so much for the solution.. Actually the data gets loaded from an external pandas df (from csv).. would it be fine if I would just do ** data = pd.read_csv("input.csv", index="Name") -__{ actually when I tried it, it gives me error:: **parser_f() got an unexpected keyword argument 'index'** Kindly help pls... — Lokkii9, Aug 08 '20 at 06:22
That should work fine, with minor changes. You're replacing the `data` with the file, so you can remove the `data = [...]` and start with the assignment of df, like so: `df = pd.read_csv("data.csv", index_col="Name")`. Please note that this assumes the data is in a file named data.csv, and is in the same directory as your script - you may need to make modifications there. Also note the argument is `index_col`, not `index`. — Josh, Aug 08 '20 at 15:22
thanks again, and also kindly help me of how I can see this new data in a '.csv' or '.txt' file please.. I have tried to print it out to a file, but I am unable to work it out please... really sorry, and thank you again, please let me know how to bring it out to a file... — Lokkii9, Aug 09 '20 at 05:35
Does something like `df_pos.to_csv("df_pos.csv")` work for you? — Josh, Aug 10 '20 at 05:05

Python or Pandas Data Summarization(Converting Table into Dictionary of row_name:[col_name, value]....)

1 Answers1