0

this is my first public question here because I am absolutely lost in this case...

I have an excel file that looks like this:

excel file image

I applied the following commands to transform the excel file into a dataframe:

df = pd.ExcelFile(r'file_path/excel_file.xlsx')
df = df.parse(sheet_name, header = [0, 1, 2])

Now the problem is that the empty header spaces (in the 'Name', 'ID' and 'Test mode' columns) are being filled with 'Unnamed: 0_level_1' and 'Unnamed: 1_level_1' in the first row and 'Unnamed: 0_level_2' and 'Unnamed: 1_level_2' and 'Unnamed: 2_level_2' in the second row.

How can I merge these rows together for every header column so that it just says 'Name', 'ID' and 'Test mode' under the condition that I don't have the opportunity to edit the excel file? So basically I want to merge all header rows where this happens.

this is how I want the dataframe header to look like

Can anyone help me?

Thanks for the help!

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Does this answer your question? [Pandas: parse merged header columns from Excel](https://stackoverflow.com/questions/27420263/pandas-parse-merged-header-columns-from-excel) – not_speshal Nov 09 '21 at 16:40
  • You whant to merge these cells in the read DataFrame? – Glauco Nov 09 '21 at 17:39
  • Not entirely... I have a few cells in the second and third header row that are empty and with the fill method `df.iloc[0:2] = df.iloc[0:2].fillna(method='ffill', axis=1)` the program fills all these empty spaces with the previous entry, which I don't want. I tried filling the empty spaces with `'', inplace = True` but that makes the values in the subcolumns inaccessible... – hackerboi1337 Nov 10 '21 at 11:38
  • @Glauco if that gives me the solution I want why not! My main problem is just that I want to access the values from specific columns by using the 'right' column names/paths so e.g. `ID = df['ID']` or `Test_mode = df[('Preconditions', 'Test mode')]` or `Current = df[('Preconditions', 'Electronic Load', 'Current')]` @not_speshal this indexing works when I use the 'ffill' method but like I said it gives me unwanted header descriptions... – hackerboi1337 Nov 10 '21 at 11:44
  • Ok now is clear, thank's for explanation, the resulting DataFrame has a `Multilevel Column Index` https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html. You can remove the index during the parse phase and add right name or starting from your df you could flatten the column index. maybe `to_flat_index()` useful – Glauco Nov 10 '21 at 13:09

0 Answers0