0

I have the following table:

Year    VTP VTT VKB VDP VCL VLX VVL VCT
2007
2008    50  54  55  56  56  56  56  55
2011    49  49  48  46  46  48  48  49
2013    44  48  47  49  48  49  50  51
2015    47   47 50  49  48  49  50  48

And I want to plot it, but before doing that, I want to make sure that all of the value in my table are numeric, so I use the following code:

df = df1.apply(pd.to_numeric, errors='coerce')

When I print it, it turns some values in the table to NaN (please see table below, especially for year 2015).

      VTP   VTT   VKB    VDP   VCL  VLX    VVL   VCT
Year
2007   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
2008  50.0  54.0  55.0  56.0  56.0  56.0  56.0  55.0
2011  49.0  49.0  48.0  46.0  46.0  48.0  48.0  49.0
2013  44.0  48.0  47.0  49.0  48.0  49.0  50.0  51.0
2015  47.0   NaN  50.0  49.0   NaN  49.0   NaN  48.0

Thus, I would like to ask why does it behave that way and how to solve it?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Vanna
  • 191
  • 2
  • 2
  • 10
  • Use `fillna` (replace by e.g. `0`) or `dropna` (drop rows you don't like) – Amadan Nov 17 '17 at 02:12
  • Thank Mr. Martineau. But if I drop nan, I will lost my value there (e.g. 47 in last row of column 3). so could you please any way? Many thanks – Vanna Nov 17 '17 at 02:21
  • Not sure but, you seem to have a non-regular space between 47 and 47 in the 2015 year. Remove that and see :) – Nabin Nov 17 '17 at 02:26
  • 2
    @Fan_of_Martijn_Pieters `to_numeric` handles spaces. – cs95 Nov 17 '17 at 02:30
  • @cᴏʟᴅsᴘᴇᴇᴅ: could you please help me with that? Thanks – Vanna Nov 17 '17 at 02:54

1 Answers1

1

Let's try replace + to_numeric:

df.astype(str).replace('[^\d.]', '', regex=True)\
                   .apply(pd.to_numeric, errors='coerce')

   Year   VTP   VTT   VKB   VDP   VCL   VLX   VVL   VCT
0  2007   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
1  2008  50.0  54.0  55.0  56.0  56.0  56.0  56.0  55.0
2  2011  49.0  49.0  48.0  46.0  46.0  48.0  48.0  49.0
3  2013  44.0  48.0  47.0  49.0  48.0  49.0  50.0  51.0
4  2015  47.0  47.0  50.0  49.0  48.0  49.0  50.0  48.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I tried, but it shows error as text below: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128) – Vanna Nov 17 '17 at 03:18
  • 2
    @Vanna HAH! Add encoding=“latin-1” when reading your dataframe. – cs95 Nov 17 '17 at 03:24
  • @Vanna You’re welcome. For reference, I figured out that your CSV contains unprintable junk, which my code removes for you. – cs95 Nov 17 '17 at 04:00