3

I am building a distributed application on top of Java and Cassandra. To generate unique sequential 32bit & 64 bit IDs, is an approach like using Flickr's ticket servers to generate primary IDs, a good one? I am particularly excited about this as it can help me reduce the size of the IDs to 32 bits or 64 bits as required, which otherwise may go up to 128 bits with UUIDs. I do not want these IDs to be perfectly sequential, but increasing at least!

Using a single database server may however introduce a single point of failure that was eliminated by Cassandra. However this may be OK for the initial stage of our application. Later we may introduce two servers to alleviate those problems.

Does this sound like a good strategy? In short, we are mixing MYSQL and Cassandra in one application. I know, if mySQL is down for some reason then we can't go ahead with Cassandra alone.

We have looked to other solutions like snowflake however it did not perfectly matched our requirements.

EDIT : I am seeking advice on whether using MySQL for generating unique primary IDs to key the data/ entities stored in Cassandra database is a good approach. What are the downsides, if any, of an approach like Flickr's Ticket servers?

bobs
  • 21,844
  • 12
  • 67
  • 78
Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294
  • I think there's a question here, but it's difficult to see exactly what you're asking. Please clarify – Cfreak Feb 28 '11 at 16:49
  • dupe: http://stackoverflow.com/questions/3935915/how-to-create-auto-increment-ids-in-cassandra (but with no solution, just for reference) – Unreason Feb 28 '11 at 17:58
  • I checked it out.. but no use since contains no answer – Rajat Gupta Feb 28 '11 at 17:59
  • introducing single point of failure indeed messes up the main advantages of distributed system. so what are your requirements? why didn't snowflake match them? why is it important for IDs to be sequential? – Unreason Feb 28 '11 at 18:01
  • I need to generate Ids 32bits & 64bits that are roughly sorted in increasing order. I also want some free bits(around 4 in 64 bit Id) to split the data for a single entity in two rows according to the category of data. I would append some extra bits at the end of Id hence some free bits are required. – Rajat Gupta Feb 28 '11 at 18:15
  • I do not however demand perfectly sequential ids, necessarily. – Rajat Gupta Feb 28 '11 at 18:21
  • And... The single point of failure can be removed by introducing more servers ofcourse as we move to a later stage, but is not neccessary for the initial stages – Rajat Gupta Feb 28 '11 at 18:28

2 Answers2

3

I'm not a big fan of trying to attach meaning to surrogate keys (which you're trying to do if you want them to increase over time). As you're seeing, it makes your problem of generating keys more complicated. Assuming that you want the keys to increase over time simply so that you can sort data, why not include a timestamp of when the object was created and store that in your data store? This simplifies the key generation significantly and allows you to do pretty much everything you could do with keys that increase over time, with the added bonus of the fact it will be crystal clear to whoever has to maintain your code how objects should be sorted.

nojo
  • 1,065
  • 6
  • 12
  • I'll add that the same goes for inserting a few bits of meaningful data in your keys - why not have separate fields for those few bits of meaningful data? – nojo Feb 28 '11 at 21:21
1

In general, you can't have both "always increasing" and "no SPOF and no complex synchronization".

If you want to have several ID-generators which do not have to ask each other on every new ID, each of them really need a separate ID-pool.

A really simple example is mentioned in the article linked by you, where one server creates odd ones while the other one create even ones. (You can expand this to more servers trivially). Of course, then you can't be sure that one server doesn't run ahead of the other, which can lead to a non-increasing sequence like 111, 120, 113, 122, 115, 124 ...

If you only want "roughly increasing", you can implement a scheme where each server in some intervals (like each minute or each 10000 IDs) tells the other one(s) his current ID, and the other one then jumps its own ID (only forward) if he hangs back too far. This should be done in a way which does not interrupt the ID-generation, for robustness if the other server is down.


Ah, for the "free bits at the end", simply multiply your ID by some number (the same one each time, and a power of two if you really want "free bits" and not only "space for data"), then add your data (which should be less than number). But of course then you'll run out of ID space quite a bit earlier (by factor number).

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210