I have a Flask application where I use the flask_executor library to run code in the background while returning an immediate response to the client. One of the endpoints in my application encounters a RuntimeError with the following message:
RuntimeError: cannot schedule new futures after interpreter shutdown.
Here are the key details of my situation:
The application is built on Flask, and the endpoint in question utilizes the flask_executor library for running code asynchronously. The specific error occurs when using the submit function of a thread in the background execution. The error is triggered when PyArrow attempts to write to a Parquet file within the background thread. The Parquet file writing code uses the to_parquet function from PyArrow. The relevant code snippet is as follows:
from flask_executor import Executor
@app.route('/my_endpoint', methods=['POST'])
def my_endpoint():
# ... other code ...
# Running code asynchronously using flask_executor
future = executor.submit(my_background_function, args)
# ... other code ...
return 'Immediate response to the client'
def my_background_function(args):
for account in accounts:
my_background_save_function(account)
# ... more background code ...
def my_background_save_function(args):
# ... code running in the background ...
# Writing to a Parquet file using PyArrow
# data - pandas dataframe
data.to_parquet(parquet_file)
# ... more background code ...
I have read that this error can be related to PyArrow's interaction with asynchronous execution and parallel threads. I have tried various solutions, but I am still encountering the RuntimeError.
error trace :
RuntimeError('cannot schedule new futures after interpreter shutdown')
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 169, in submit
arrays.append(executor.submit(convert_column, c, f))
File "/usr/local/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 620, in dataframe_to_arrays
File "pyarrow/table.pxi", line 3681, in pyarrow.lib.Table.from_pandas
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "/usr/local/lib/python3.9/site-packages/pandas/io/parquet.py", line 159, in write
impl.write(
File "/usr/local/lib/python3.9/site-packages/pandas/io/parquet.py", line 411, in to_parquet
return to_parquet(
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 2889, in to_parquet
data.to_parquet(parquet_buffer)
Is there a recommended approach to avoid the "cannot schedule new futures after interpreter shutdown" error when using Flask, flask_executor, and PyArrow's Parquet file writing together? How can I ensure the smooth execution of the Parquet writing process in the background while maintaining an immediate response to the client?
Any insights, suggestions, or alternative solutions would be greatly appreciated.
Thank you.