7

My goal is to create an object that behaves the same as a Pandas DataFrame, but with a few extra methods of my own on top of it. As far as I understand, one approach would be to extend the class, which I first tried to do as follows:

class CustomDF(pd.DataFrame):
    def  __init__(self, filename):
        self = pd.read_csv(filename)

But I get errors when trying to view this object, saying: 'CustomDF' object has no attribute '_data'.

My second iteration was to instead not inherit the object, but rather import it as a DataFrame into one of the object attributes, and have the methods work around it, like this:

class CustomDF():

    def  __init__(self, filename):
        self.df = pd.read_csv(filename)

    def custom_method_1(self,a,b,...):
        ...

    def custom_method_2(self,a,b,...):
        ...

This is fine, except that for all custom methods, I need to access the self.df attribute first to do anything on it, but I would prefer that my custom dataframe were just self.

Is there a way that this can be done? Or is this approach not ideal anyway?

teepee
  • 2,620
  • 2
  • 22
  • 47
  • In your first example. you probably just need to call `super().__init__(...)` in your overridden init so that the rest of the setup that occurs in `pd.DataFrame.__init__()` also happens in your custom class. – Randy Nov 29 '18 at 17:42
  • @Randy: I tried doing this, but still got strange results from the output. How would I want to implement the `pd.read_csv(filename)` portion of this code? – teepee Nov 29 '18 at 18:00
  • See the [Subclassing pandas Data Structures](http://pandas.pydata.org/pandas-docs/stable/extending.html#subclassing-pandas-data-structures) section of the docs; using inheritance is very much non-trivial. – root Nov 29 '18 at 19:12
  • 2
    @root: I was finally able to get what I was hoping for by doing `super(CustomDF, self).__init__(pd.read_csv(filename))`. Would this be an appropriate solution, or does this introduce any problems? – teepee Nov 29 '18 at 19:46

4 Answers4

1

The __init__ method is overwritten in your first example.

Use super and then add your custom code

class CustomDF(pd.DataFrame):
    def __init__(self, *args, **kw):
        super(CustomDF, self).__init__(*args, **kw)
        # Your code here

    def custom_method_1(self,a,b,...):
        ...
Kamil Niski
  • 4,580
  • 1
  • 11
  • 24
  • Thanks, I tried this though and didn't quite get what I expected. How would I put in my line where I want to assign `pd.read_csv(filename)` to `self`? – teepee Nov 29 '18 at 17:59
1

Is this what you were looking for?

class CustomDF:

    def  __init__(self):
        self.df = pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2

    ...
yongsheng
  • 376
  • 3
  • 19
0

I would probably go with the decorator pattern here. The accepted answer for this post will put you on the right track.

I see that your first iteration would be really cool, but it seem to me you need to know quite a lot of stuff about Pandas' internals, e.g., that this _data attribute need to be set in a certain way.

Cheers.

manu
  • 1,333
  • 2
  • 11
  • 24
0

In my project I did something similar and use decorators, like manu suggested. The decorator @property might work for you, it basically turns the method .df() into a property .df. Therefore it will only be read in when it's called specifically. But this only works on instances of the class.

class CustomDF:
    
    @property
    def df(self):
        return pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2
Yehla
  • 199
  • 11