I have a systems design challenge that I would like to get some community feedback on.
Basic system structure:
[Client] ---HTTP-POST--> [REST Service] ---> [Queue] ---> [Processors]
- [Client] POSTs json to [REST Service] for processing.
- Based on request, [Rest Services] sends data to various queues to be picked up by various processors written in various languages and running in different processes.
- Work is parallelized in each processor but can still take up to 30 seconds to process. The time to process is a function of the complexity of the data and cannot be speed up.
- The result cannot be streamed back to the client as it is completed because there is a final post processing step that can only be completed once all the sub steps are completed.
Key challenge: Once the post processing is complete, the client either needs to:
- be sent the results after the client has been waiting
- be notified async that the job is completed and passed an id to request the final result
Design requirements
I don't want to block the [REST Service]. It needs to take the incoming request, route the data to the appropriate queues for processing in other processes, and then be immediately available for the next incoming request.
Normally I would have used actors and/or futures/promises so the [REST Service] is not blocked when waiting for background workers to complete. The challenge here is the workers doing the background work are running in separate processes/VMs and written in various technology stacks. In order to pass these messages between heterogeneous systems and to ensure integrity of the request lifetime, a durable queue is being used (not in memory message passing or RPC).
Final point of consideration, in order to scale, there are a load balanced set of [REST Services] and [Processors] in respective pools. Therefore, since the messages from the [REST Service] to the [Processor] need to be sent asynchronously via a queue (and everything is running is separate processes), there is no way to correlate the work done in a background [Processor] back to its original calling [REST Service] instance in order to return the final processed data in a promise or actor message and finally pass the response back to the original client.
So, the question is, how to make this correlation? Once the all the background processing is completed, I need to get the result back to the client either via a long waited response or a notification (I do not want to use something like UrbanAirship as most of the clients are browsers or other services.
I hope this is clear, if not, please ask for clarification.
Edit: Possible solution - thoughts?
I think I pass a spray RequestContext to any actor which can then response back to the client (does not have to be the original actor that received HTTP request). If this is true, can I cache the RequestContext and then use it later to asynchronously send the response to the appropriate client using this cached RequestContext when the processing is completed?