I'm evaluating Amazon SWF as an option to build a distributed workflow system. The main language will be Java, so the Flow framework is an obvious choice. There's just one thing that keeps puzzling me and I would get some other opinions before I can recommend it as a key component in our architecture:
The examples are all about tasks that produce a result after a deterministic, relatively short period of time (i.e. after some minutes). In our real-life business workflow, the matter looks different, here we have tasks that could take potentially weeks to complete. I checked the calculator already, having workflows that live 30 days or so do not lead to a cost explosion, so it seems they already counted in that possibility.
Did anybody use SWF for some scenario like this and can share any experience? Are there any recommendations, best practices how to design a workflow like this? Is Flow the right choice here?
It seems to me Activity implementations are expected to eventually return a value synchronously, however, for long running transactions we would rather use messages to send back worker results asynchronously.
Any helpful feedback is appreciated.