1

I am trying to replicate a table, which is currently produced in R, in python implementing plotnine library. I am using facet.grid with two variables (CBRegion and CBIndustry). I have found a similar problem, however, it is also done in R. I applied similar codes as in that link and produced the following table:

Table using R code

I tried to use exactly the same code in python using plotnine library, but the final output is very ugly. This is my python code so far:

myplot = ggplot(data = df_data_bar) + aes(x = "CCR100PDMid %" ,y = "CBSector")+ \
    geom_segment(aes(yend="CBSector", xend=0), colour="black", size = 2) +\
    geom_text(aes(label = "label")) + \
    theme(panel_grid_major_y = element_blank()) + \
    facet_grid('CBIndustry ~ CBRegion',scales="free_y",space="free") + \
    labs(x="", y = "", title=title) + \
    theme_bw() + \
    theme(plot_title = element_text(linespacing=0.8, face="bold", size=20, va="center"), 
        axis_text_x = element_text(colour="#333333",size=12,rotation=0,ha="center",va="top",face="bold"), 
        axis_text_y = element_text(colour="#333333",size=12,rotation=0,ha="right",va="center",face="bold"), 
        axis_title_x = element_blank(), 
        axis_title_y = element_blank(),
        legend_position="none", 
        strip_text_x = element_text(size = 12, face="bold", colour = "black", angle = 0), 
        strip_text_y = element_text(size = 8, face="bold", colour = "black", angle = 0, ha = "left"),
        strip_background_y = element_text(width = 0.2),
        figure_size=(30,20))

The image from plotnine is as follows:

Table using Python code

Comparing Python vs R, we can clearly see that y-axis labels overlap using plotnine. In addition, when we look at Europe and Community groups we can notice that it has the same size box as others with multiple groups which is not necessary. I also tried different aspect ratios, but it has not resolved my problem. In short words, I would like to have the same plot as R produces. It does not need to be produced in plotnine. Alternatives are also welcome. Data from top ten rows is:

{'CBRegion': {0: 'Europe', 1: 'Europe', 2: 'Europe', 3: 'Europe', 4: 'Europe', 5: 'Europe', 6: 'Europe', 7: 'Europe', 8: 'Europe', 9: 'Europe'}, 'CBSector': {0: 'Aerospace & Defense', 1: 'Alternative Energy', 2: 'Automobiles & Parts', 3: 'Banks', 4: 'Beverages', 5: 'Chemicals', 6: 'Colleges & Universities', 7: 'Community Groups', 8: 'Construction & Materials', 9: 'Electricity'}, 'CBIndustry': {0: 'Industrials', 1: 'Oil & Gas', 2: 'Consumer Goods', 3: 'Financials', 4: 'Consumer Goods', 5: 'Basic Materials', 6: 'NPO', 7: 'Community Groups', 8: 'Industrials', 9: 'Utilities'}, 'CCR100PDMid': {0: 0.015545818181818181, 1: 0.003296, 2: 0.012897471223021583, 3: 0.008079544600938968, 4: 0.008716597402597401, 5: 0.0094617476340694, 6: 0.008897475862068967, 7: 0.000821, 8: 0.012205547455295736, 9: 0.0050264210526315784}, 'CCR100PDMid %': {0: 1.554581818181818, 1: 0.3296, 2: 1.2897471223021584, 3: 0.8079544600938968, 4: 0.8716597402597401, 5: 0.9461747634069401, 6: 0.8897475862068966, 7: 0.0821, 8: 1.2205547455295735, 9: 0.5026421052631579}, 'label': {0: '1.6%', 1: '0.3%', 2: '1.3%', 3: '0.8%', 4: '0.9%', 5: '0.9%', 6: '0.9%', 7: '0.1%', 8: '1.2%', 9: '0.5%'}}

If it is necessary, I can upload the entire dataset, but I just read the MRC and it says that I should only include a subset of data. I am new to SO and hope that I included all vital information. I will be grateful for any help. Thank you in advance!

tdy
  • 36,675
  • 19
  • 86
  • 83
Carpe Diem
  • 13
  • 1
  • 3
  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Oct 23 '22 at 20:38
  • I think if you want a plotnine/python solution, it would be good to create some synthetic data that kind of mirrors your existing data in python. The above is R code so requires some translation first. – brb Oct 24 '22 at 10:16
  • Thank you for your comment. I already changed data so that it is in a python format. – Carpe Diem Oct 24 '22 at 20:03

1 Answers1

0

The other issues with colours, overlapping labels, wrapping text etc can be fixed, but unfortunately space = 'free' is not currently supported in plotnine. See documentation here. Unfortunately that's kind of a deal-breaker for your table, sadly. You will need to do in R's ggplot.

brb
  • 1,123
  • 17
  • 40
  • Thank you. Is there an alternative way to produce this table? I just noticed that using matplotlib would be very complicated. I also had a look at seaborn & plotly, but none of them seem to be easy in comparison to plotnine which is quite straightforward. – Carpe Diem Oct 27 '22 at 19:32
  • I did look at Seaborn (https://seaborn.pydata.org/generated/seaborn.FacetGrid.html), but setting 'sharey=False' produces similar issues in terms of really fat bars rather than a thinner plot. Both are built on Matplotlib, which means it might be a limitation in the Matplotlib layout manager preventing plotnine from implementing it. Either way, I do not know Seaborn or Matplotlib well enough to assist futher. Apologies. – brb Oct 31 '22 at 07:41
  • Suggest you either do this in R's ggplot, or change the question to be about how to do this in Matplotlib and get those more familiar with that pack to assist... – brb Oct 31 '22 at 07:42
  • It is fine. I appreciate any kind of answers. I already changed the title. – Carpe Diem Oct 31 '22 at 13:58