0

So I have a program where I preload some data into an lru cache and the run multiple processed in parallel. There are two main preload cached calls: openExcelWithCache and returnReferenceDataWithCache. The two functions are implemented like this:

from functools import lru_cache

@lru_cache(maxsize=32)
def openExcelWithCache(pathFile, blDataOnly):
    return(openpyxl.load_workbook(file,data_only = blDataOnly)

@lru_cache(maxsize=32)
def returnReferenceDataWithCache(tableName, id):
    logAtLevel("INFO", "Retrieving uncached reference for Id: " + str(id) + "(" + str(type(id)) + ") table: " + tableName + ".")
    return self.interface.referencequery(tbl, id)

My parallel class is set up like this and makes calls to the above functions for pre loading:

from multiprocessing.dummy import Pool

#Pre-loading
openExcelWithCache('File1',True)
openExcelWithCache('File1',False)
openExcelWithCache('File2',True)

returnReferenceDataWithCache('Tbl1', 1)


lsFolders = ['Folder1', 'Folder2']

pool = Pool( processes = 6 )

 # instantiated by call to 
pool.map(worker, lsFolders)

def worker(clientFolder):
    iterateThroughFiles = IterateThroughFiles()
    iterateThroughFiles.runProcess(clientFolder)

The IterateThroughFiles class is:

class IterateThroughFiles( object ):

  def runProcess( self, folder ):


    openExcelWithCache('File1',True)
    print(openExcelWithCache.cache_info()) #outputs: CacheInfo(hits=1,misses=3,maxsize=32,currsize=3)
    openExcelWithCache('File1',False)
    print(openExcelWithCache.cache_info()) #outputs: CacheInfo(hits=2,misses=3,maxsize=32,currsize=3)
    openExcelWithCache('File2',True)
    print(openExcelWithCache.cache_info()) #outputs: CacheInfo(hits=3,misses=3,maxsize=32,currsize=3)


    returnReferenceDataWithCache('Tbl1', 1)
    print(retrieveReferenceDataWithCache.cache_info()) #outputs: CacheInfo(hits=0,misses=2,maxsize=32,currsize=2)    

So for some reason, the openExcelWithCache cache is working properly, but not the one for the returnReferenceDataWithCache function. The same call is still causing a "miss" and a new key pair to be generated. I checked that the type of '1' was the same for both, so I am not sure what is going on here.

I apologize for writing this a little pseudocodey, let me know if there is anything else that needs clarification.

EliSquared
  • 1,409
  • 5
  • 20
  • 44
  • Your processes are not sharing state, which is incredibly non-trivial, especially for something like caching. I'm not sure this is the root of the issue, but it very well could be. – juanpa.arrivillaga Nov 14 '17 at 20:47
  • Also, why is there a bare call to `pool.map()`? pretty sure that will give you a `TypeError` – juanpa.arrivillaga Nov 14 '17 at 20:48
  • IOW, you'll have independent caches in each process. – juanpa.arrivillaga Nov 14 '17 at 20:52
  • @juanpa.arrivillaga, I have deleted the pool.map() line, that was a typo. Ok, so you are saying that my processes are not sharing a state, but how come the cache is working as expected for tjhe `openExcelWithCache` function? And are you saying there is no realistic solution for this or that I need to use a different methodology? – EliSquared Nov 14 '17 at 20:53
  • Maybe something like this would work: https://stackoverflow.com/a/13694262/8651755 –  Nov 14 '17 at 20:55
  • @EliSquared it's hard for me to say, your example isn't really a [mcve]. – juanpa.arrivillaga Nov 14 '17 at 20:59

0 Answers0