Best way for running long Python scripts on GCP

Question

We are starting a new project in our company where we basically run few Python scripts for each client, twice a day. So the idea is, twice a day a Cloud Function will be triggered where the function will trigger the Python script for each client creating new instances of App Engine / Cloud Run or any other serverless service Google's offer.

At the begining we though of using Cloud Functions, but very quickly we found out they are not suited for long running Python scripts, the scripts will eventually calculate and collect different information for each client and write them to Firebase.

The flow of the processes would be: Cloud Function triggered -> function trigger GCP instance for each client -> script running for each client -> out put is being saved to Firebase.

What would be the recommended way to do it without a dedicated server, which GCP serverless services would fit the most?

Do you run the script for each client in parallel? How long take each script for each client? — guillaume blaquiere, Jan 22 '22 at 14:26
Hey @guillaumeblaquiere, the duration of each script is around 15 minutes per client, we are running a script per client, so each client will have it's own instance of the 15 minutes script — ia244, Jan 24 '22 at 17:34

NoCommandLine · Answer 1 · 2022-01-24T20:12:23.627

1

You can execute "long" running Google App Engine (GAE) Tasks using Cloud Tasks.

How long (which is why I have it in quotes) depends on the kind of scaling that you are using for your GAE Project Instance. Instances which are set to 'automatic scaling' are limited to a maximum of 10 minutes while instances which are set to 'manual' or 'basic' have up to 24 hours execution time.

From the earlier link

....all workers must send an HTTP response code (200-299) to the Cloud Tasks service, in this instance before a deadline based on the instance scaling type of the service: 10 minutes for automatic scaling or up to 24 hours for manual scaling. If a different response is sent, or no response, the task is retried....

Adding Update (there's seems to be some confusion between 30 mins vs 24 hours)

Standard HTTP Requests have a maximum execution time of 30 minutes (source) while GAE Endpoints can run for up to 24 hours if you're using manual scaling (source)

edited Jan 24 '22 at 20:12

answered Jan 21 '22 at 20:00

NoCommandLine

5,044
2
4
15

From my understanding (i might be completely wrong) Cloud Run will achieve the exact same thing, isn't it? in case it will why to use one over the other? – ia244 Jan 21 '22 at 20:45
Maybe. I'm not as familiar with Cloud Run as I am with GAE (we built an App for GAE). If you can achieve the same thing with Cloud Run, then go for it. If Cloud Run executes long running tasks, it might end up being cheaper than GAE since GAE requires manual scaling for anything > 10 minutes which means the instance won't go down to 0 when it is not servicing any request. Cloud Run goes to 0 when it is not servicing any request. – NoCommandLine Jan 21 '22 at 23:36
Also, if I understand correctly Cloud Tasks has a maximum timeout of 30 minutes, that means if my long running GAE task won't finish within these 30 minutes, Cloud Task will retire the task and re-run it, isn't? – ia244 Jan 24 '22 at 17:48
No. 30 minutes is for 'standard' HTTP requests. Requests with an App Engine endpoint can run for up to 24 hours - https://cloud.google.com/tasks/docs/dual-overview#appe updated my answer to provide source – NoCommandLine Jan 24 '22 at 20:09

score 1 · Answer 2 · answered Jan 24 '22 at 07:31

1

@NoCommandLine's answer is a best recommendation and Cloud Run is also a good option if you want to set longer running operations as timeout could be set between 5 minutes (as default) and 60 minutes. You can set or update request timeout through either Cloud Console, command line or YAML.

Meanwhile, execution time for Cloud Function only has 1 minute (by default) and could be set to 9 minutes maximum.

You can check out the full documentation below:

You can also check a related SO question through this link.

answered Jan 24 '22 at 07:31

Robert G

1,583
3
13

Thank you for your answer, we are finding Cloud Run to fit most of our requirements, at this moment we will set the script in both of them GAE and Cloud Run so we can compare them, hopefully I will share my answer here. – ia244 Jan 24 '22 at 17:35
Be careful, Cloud Task has a timeout of 30 minutes even if Cloud Run accept longer timeout. But it's enough here. – guillaume blaquiere Jan 24 '22 at 19:15

score 1 · Accepted Answer · answered Jan 24 '22 at 19:27

There is a lot of great answers! The key here is to decouple and to distribute the processing.

When you talk about decoupling you can use Cloud Task (where you can add flow control with rate limit or to postpone a task in the future) or PubSub (more simple message queueing solution).

And Cloud Run is a requirement to run up to 15 minutes processing. But you will have to fine tune it (see below my tips)

So, to summarize the process

You have to trigger a Cloud Functions twice a day. You can use Cloud Scheduler for that.
The triggered Cloud Functions get the list of clients (in database?) and for each client, create a task on Cloud Task(or a message in PubSub)
Each task (or message) call a HTTP endpoint on Cloud Run that perform the process for each client. Set the timeout to 30 minutes on Cloud Run.

However, if your processing is compute intensive, you have to tune Cloud Run. If the processing take 15 minutes for 1 client on 1vCPU, that mean you can't process more than 1 client per CPU if you don't want to reach the timeout (2 clients can lead you to take about 30 minutes for both on the same CPU and you can reach the timeout). For that, I recommend you to set the concurrency parameter of Cloud Run to 1, to process only one request at a time (of course, if you set 2 or 4 CPU on Cloud Run you can also increase the concurrency parameter to 2 or 4 to allow parallel processing on the same instance, but on different CPU).

If the processing is not CPU intensive (you perform API call and you wait the answer) it's harder to say. Try with a concurrency of 5, 10, 30,... and observe the behaviour/latency of the processed requests. No worries, with Cloud Task and PubSUb you can set retry policies in case of timeout.

Last things: is your processing idempotent? I mean, if you run 2 time the same process for the same client, is the result correct or is it a problem? Try to make the solution idempotent to overcome retry issues and globally issues that can happen on distributed computing (including the replays)

Thank you very much for your detailed answer, I think this one cover pretty much everything. As for your question, there is no issue if the task will run twice as the results will just correct the ones already inserted in the database. Also, the task is not CPU intensive it is "API intensive" so basically waiting for answers from different sources. — ia244, Jan 25 '22 at 20:09

Best way for running long Python scripts on GCP

3 Answers3