Creating a class based on Pandas.DataFrame using the pandas.read_csv() function to initialize

Question

My goal is to create an object that behaves the same as a Pandas DataFrame, but with a few extra methods of my own on top of it. As far as I understand, one approach would be to extend the class, which I first tried to do as follows:

class CustomDF(pd.DataFrame):
    def  __init__(self, filename):
        self = pd.read_csv(filename)

But I get errors when trying to view this object, saying: 'CustomDF' object has no attribute '_data'.

My second iteration was to instead not inherit the object, but rather import it as a DataFrame into one of the object attributes, and have the methods work around it, like this:

class CustomDF():

    def  __init__(self, filename):
        self.df = pd.read_csv(filename)

    def custom_method_1(self,a,b,...):
        ...

    def custom_method_2(self,a,b,...):
        ...

This is fine, except that for all custom methods, I need to access the self.df attribute first to do anything on it, but I would prefer that my custom dataframe were just self.

Is there a way that this can be done? Or is this approach not ideal anyway?

In your first example. you probably just need to call `super().__init__(...)` in your overridden init so that the rest of the setup that occurs in `pd.DataFrame.__init__()` also happens in your custom class. — Randy, Nov 29 '18 at 17:42
@Randy: I tried doing this, but still got strange results from the output. How would I want to implement the `pd.read_csv(filename)` portion of this code? — teepee, Nov 29 '18 at 18:00
See the [Subclassing pandas Data Structures](http://pandas.pydata.org/pandas-docs/stable/extending.html#subclassing-pandas-data-structures) section of the docs; using inheritance is very much non-trivial. — root, Nov 29 '18 at 19:12
@root: I was finally able to get what I was hoping for by doing `super(CustomDF, self).__init__(pd.read_csv(filename))`. Would this be an appropriate solution, or does this introduce any problems? — teepee, Nov 29 '18 at 19:46

score 1 · Answer 1 · answered Nov 29 '18 at 17:41

1

The __init__ method is overwritten in your first example.

Use super and then add your custom code

class CustomDF(pd.DataFrame):
    def __init__(self, *args, **kw):
        super(CustomDF, self).__init__(*args, **kw)
        # Your code here

    def custom_method_1(self,a,b,...):
        ...

answered Nov 29 '18 at 17:41

Kamil Niski

4,580
1
11
24

Thanks, I tried this though and didn't quite get what I expected. How would I put in my line where I want to assign `pd.read_csv(filename)` to `self`? – teepee Nov 29 '18 at 17:59

yongsheng · Answer 2 · 2020-10-16T06:49:45.657

Is this what you were looking for?

class CustomDF:

    def  __init__(self):
        self.df = pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2

    ...

score 0 · Answer 3 · answered Nov 29 '18 at 17:52

I would probably go with the decorator pattern here. The accepted answer for this post will put you on the right track.

I see that your first iteration would be really cool, but it seem to me you need to know quite a lot of stuff about Pandas' internals, e.g., that this _data attribute need to be set in a certain way.

Cheers.

score 0 · Answer 4 · answered Feb 19 '21 at 14:41

In my project I did something similar and use decorators, like manu suggested. The decorator @property might work for you, it basically turns the method .df() into a property .df. Therefore it will only be read in when it's called specifically. But this only works on instances of the class.

class CustomDF:
    
    @property
    def df(self):
        return pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2

Creating a class based on Pandas.DataFrame using the pandas.read_csv() function to initialize

4 Answers4