5

how do I elegantly port the below recursive SQL query to Pandas python code? Somehow I don't see a straightforward way without writing own recursive function?

Python sample code:

import datetime
import numpy as np
import pandas as pd
import pandas.io.data
from pandas import Series, DataFrame

data = {
        'ID': [1,2,3,4,5,6,7,8],
        'Name': ['Keith','Josh','Robin','Raja','Tridip','Arijit','Amit','Dev'],
        'MgrID': [0,1,1,2,0,5,5,6]
    }

df = pd.DataFrame.from_dict(data)
df.set_index('ID', inplace=True, drop=False, append=False)
df.ix[df.query('MgrID >0')['MgrID']]

Trying to get this:

nLevel      ID          Name
================================
1           6            Arijit
2           8               Dev
1           1            Keith
2           2               Josh
2           3               Robin
3           4                 Raja
1           5            Tridip
2           7               Amit

Recursive SQL Query:

;WITH Employee (ID, Name, MgrID) AS 
(
    SELECT 1,      'Keith',      NULL   UNION ALL
    SELECT 2,      'Josh',       1      UNION ALL
    SELECT 3,      'Robin',      1      UNION ALL
    SELECT 4,      'Raja',       2      UNION ALL
    SELECT 5,      'Tridip',     NULL   UNION ALL
    SELECT 6,      'Arijit',     NULL      UNION ALL
    SELECT 7,      'Amit',       5      UNION ALL
    SELECT 8,      'Dev',        6   
)
,Hierarchy AS
(
    --  Anchor
    SELECT   ID
            ,Name
            ,MgrID
            ,nLevel = 1
            ,Family = ROW_NUMBER() OVER (ORDER BY Name)
    FROM Employee
    WHERE MgrID IS NULL

    UNION ALL
    --  Recursive query
    SELECT   E.ID
            ,E.Name
            ,E.MgrID
            ,H.nLevel+1
            ,Family
    FROM Employee   E
    JOIN Hierarchy  H ON E.MgrID = H.ID
)
SELECT nLevel ,ID,space(nLevel+(CASE WHEN nLevel > 1 THEN nLevel ELSE 0 END))+Name Name FROM Hierarchy ORDER BY Family, nLevel
user3615048
  • 51
  • 1
  • 6

1 Answers1

3

First, you need to correct the typo in python code MgrID list: [0,1,1,2,0,0,5,6]

Second, if this job is done recursively in SQL, why do you expect Python/Pandas can do it without recursive method? It isn't too hard:

def nlevel(id, mgr_dict=df.MgrID, _cache={0:0}):
    if id in _cache:
        return _cache[id]

    return 1+nlevel(mgr_dict[id],mgr_dict)

df['nLevel'] = df.ID.map(nlevel)

print df[['nLevel','ID','Name']]

Then the output(nLevel) is what you need (except the order, which I don't understand from your SQL):

    nLevel  ID    Name
ID                    
1        1   1   Keith
2        2   2    Josh
3        2   3   Robin
4        3   4    Raja
5        1   5  Tridip
6        1   6  Arijit
7        2   7    Amit
8        2   8     Dev

[8 rows x 3 columns]
Happy001
  • 6,103
  • 2
  • 23
  • 16
  • Thank you very much! This is what I was looking for. One followup question if you could be so kind. How do I go about the situation when an employee is having two managers? E.g. 'Dev' is under 'Keith' and 'Amit'. Or same employee works for the same manager twice, but is connected via different path (Project)? Is it possible to do dictionary lookups with compound keys(Manager+Project)? – user3615048 May 21 '14 at 21:12
  • One employee under multiple manager may result in different nLevel. You need to define that first. For compound keys, probably you can use tuple as keys? But in this case, it's more about compound values right? i.e., `mgr_dict={'Dev':('Kenith','Proj1')}` – Happy001 May 22 '14 at 01:59
  • So great Do you have some documentation about _cache ? – Charles R Nov 19 '18 at 14:59