1

We have the following scenario:

  1. A file is uploaded to s3, this will trigger a lambda function and pass the bucket name and file key
  2. The lambda function downloads the file, reads it's content and uses it to update records in a database.

The problem is most of the files being uploaded to s3 are too large (>3 GB) to be read by the lambda functions at once, so we're aiming for a stream-like approach, where the file is read by parts, then each part is processed and dismissed, there's no downloading and no keeping the whole file in memory nor in storage.

We've come across this implementation, where you send the chunks of information to your own pipe. However, the process keeps failing with signal: killed error. We know this cold be due to the lambda function running out of memory (as pointed out here) and we're still trying to make it work. However, we're not sure if this is the best approach, as we're somehow "tricking" the asws package, if someone has done something similar and can point us to the right direction it would be very helpful.

We have also tried to find a solution in the official s3 documentation but haven't been able to find any. We can only find how to download the whole file at once.

Thanks in advance!

We're using go 1.19.5

Jairo Lozano
  • 3,883
  • 3
  • 20
  • 29
  • Note that your client can issue ranged GETs to S3, so your initial Lambda function could, for example, initiate a Step Functions workflow, running multiple Lambda functions in parallel against ranges of bytes in the S3 object. Alternatively, create a workflow that launches EC2, downloads or streams the entire object, then self-terminates. – jarmod Feb 20 '23 at 17:22

1 Answers1

0

S3 supports byte-range requests.

This downloadRange function in aws-sdk-go seems like the implementation in go.

Theofilos Papapanagiotou
  • 5,133
  • 1
  • 18
  • 24