-1

I am new with Python and have a problem that I don't know how to solve. I have a dataframe (localdf["where_condition"]) which contains the following values:

0    FirstName ='sonali'
1             Gender='F'
2                   NaN

function:

def filter_record_count_check(localdf):
    try:
        sql = 'select count(*) as CNT from ' + localdf["schema_table"] + ' where ' + localdf["where_condition"]
        sql = sql.to_dict()
        print(sql)
        df_list=[]
        for i in sql.values():
            df = pd.read_sql_query(i, db_connection)
            df_list.append(df.CNT[0])
            print("df_list")
            print(df_list)
        return df_list

sql output:

{0: "select count(*) as CNT from testdb.DimCurrency where FirstName ='sonali'", 1: "select count(*) as CNT from testdb.banking_fraud where Gender='F'", 2: nan}

df_list output:

[2, 208]

issue is when any null value/nan coming in sql whole output coming as nan when calling same function in another function. sample output

      table_name column_name  ... record_count  **filter_column_cnt**
0    DimCurrency    PersonID  ...            6                NaN
1  banking_fraud         NaN  ...         1000                NaN
2  banking_fraud         NaN  ...         1000                NaN

expected output is :

      table_name column_name  ... record_count  **filter_column_cnt**
0    DimCurrency    PersonID  ...            6                2
1  banking_fraud         NaN  ...         1000                208
2  banking_fraud         NaN  ...         1000                NaN

Alia
  • 33
  • 5
  • Does this answer your question? [Python pandas Filtering out nan from a data selection of a column of strings](https://stackoverflow.com/questions/22551403/python-pandas-filtering-out-nan-from-a-data-selection-of-a-column-of-strings) – Iguananaut Jul 25 '22 at 08:47
  • Above link solution is not working for my case. – Alia Jul 25 '22 at 09:16
  • can someone please suggest – Alia Jul 25 '22 at 11:43
  • HI do you mind to share `df.head().to_dict()` and the expected output? – rpanai Jul 25 '22 at 12:58
  • 2
    Your question is not very clear: (1) Is this an actual `NaN` or a string? With `data != 'NaN'` you make it seem like it's a string. (2) What does _"if any nan in dataframe then my query will not run else run"_ mean? Do you mean if there's at least one `NaN` in the column then none of the queries should run (even if they could be build)? – Timus Jul 25 '22 at 13:00
  • 2
    (3) Why are you using a `dict` here `sql = sql.to_dict()`, when you're only using its values? – Timus Jul 25 '22 at 13:04
  • 2
    @Alia I suggest you reword your question to 1) Provide reproducible data so it's clear exactly what you're working with. @rpanai's suggestion of `df.head().to_dict()` is a good one here. 2) Focus just on the problem at hand: Filtering NaNs out of a pandas dataframe column. Make sure you get that part right first before trying to do anything else with your data. – Iguananaut Jul 25 '22 at 15:26
  • I have modify my question and provide outputs as well. Hope its helpful. Please have a look. – Alia Jul 26 '22 at 06:58
  • @Iguananaut please have a look. – Alia Jul 26 '22 at 07:41

1 Answers1

0

Due to the try, every exception in its block results in a filter_record_count_check-return of None. So if a NaN in sql leads to an exception (which I suspect), you'll get a full set of NaNs.

Let's say you have a dataframe df like

  schema_table      where_condition
0       table1  FirstName ='sonali'
1       table2           Gender='F'
2       table3                  NaN

then

sql = (
    "select count(*) as CNT from " + df["schema_table"]
    + " where " + df["where_condition"]
)

results in

0    select count(*) as CNT from table1 where First...
1    select count(*) as CNT from table2 where Gende...
2                                                  NaN
dtype: object

Which means the NaN is still there. But you could restrict further processing to the non-NaN values, for example by

df["Count"] = sql[sql.notna()].apply(
    lambda s: pd.read_sql_query(s, db_connection).CNT[0]
)

which should result in

  schema_table      where_condition  Count
0       table1  FirstName ='sonali'    2.0
1       table2           Gender='F'  208.0
2       table3                  NaN    NaN
Timus
  • 10,974
  • 5
  • 14
  • 28