I recently started playing around with AWS ParallelCluster and I noticed that when I submit a job that requires more instances than there are currently available in my region/AZ then the available instances are brought up and idle until all remaining instances become available. It seems like this can sometimes take a very long time. SLURM reports in /var/log/parallelcluster/slurm_resume.log
ERROR - Error in CreateFleet request (...): InsufficientInstanceCapacity - We currently do not have sufficient c6i.metal capacity in the Availability Zone you requested (us-east-1a)
The problem is, I still pay for the nodes that are up and waiting. Is there a way to instead cancel the job after a certain timeout such that I can try later?