How default_time_to_live would delete rows without tombstones in Cassandra?

Question

Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction.

This is also answered here

If a table has default_time_to_live on it then rows that exceed this time limit are deleted immediately without tombstones being written.

And commented in LastPickle's post About deletes and tombstones

Another clue to explore would be to use the TTL as a default value if that's a good fit. TTLs set at the table level with 'default_time_to_live' should not generate any tombstone at all in C*3.0+. Not tested on my hand, but I read about this.

I've made the simplest test that I could imagine using LeveledCompactionStrategy:

CREATE KEYSPACE IF NOT EXISTS temp WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE IF NOT EXISTS temp.test_ttl (
    key text,
    value text,
    PRIMARY KEY (key)
) WITH  compaction = { 'class': 'LeveledCompactionStrategy'}
  AND default_time_to_live = 180;

INSERT INTO temp.test_ttl (key,value) VALUES ('k1','v1');
nodetool flush temp
sstabledump mc-1-big-Data.db
wait for 180 seconds (default_time_to_live)
sstabledump mc-1-big-Data.db The tombstone isn't created yet
nodetool compact temp
sstabledump mc-2-big-Data.db The tombstone is created (and not dropped on compaction due to gc_grace_seconds)

The test was performed using apache cassandra 3.0.13

From the example I conclude that isn't true that default_time_to_live not require tombstones, at least for version 3.0.13. However this is a very simple test and I'm forcing a major compaction with nodetool compact so I may not be recreating the scenario where default_time_to_live magic comes into play.

But how would C* delete without tombstones? Why this should be a different scenario to using TTL per insert?

Have you seen this blog post? http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html — Alex Ott, Sep 12 '18 at 08:33

score 6 · Accepted Answer · answered Oct 01 '18 at 15:16

I have been fooled by the piece of documentation you mentioned when answering this question on our blog (The Last Pickle Blog). I probably answered this one too quickly, even though I wrote this a thing 'to explore', even saying I did not try it explicitly.

Another clue to explore would be to use the TTL as a default value if that's a good fit. TTLs set at the table level with 'default_time_to_live' should not generate any tombstone at all in C*3.0+. Not tested on my hand, but I read about this.

So my sentence above is wrong. Basically, the default can be overwritten by the TTL at the query level and I do not see how Cassandra could handle this without tombstones.

From the example I conclude that isn't true that default_time_to_live not require tombstones, at least for version 3.0.13.

Also, I am glad to see you did not believe me or Datastax documentation but tried it by yourself. This is definitively the right approach.

But how would C* delete without tombstones? Why this should be a different scenario to using TTL per insert?

Yes, exactly this,

C*heers.

Alain Rodriguez - @arodream - alain@thelastpickle.com France / Spain

The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com

score 2 · Answer 2 · answered Sep 12 '18 at 04:25

AFAIK there is no big difference between tombstone records and the ones with expired TTLs. In your case, forcing major compaction transformed TTL expired record to the tombstone, but it was not purged due to gc_grace_seconds. According to this presentation, tombstones/ttl-expired-records go away:

Never before it's gc_grace_seconds old
During compaction, for a tombstone/ttl being past gc_grace, it's partition key is checked against the bloom filters of all other SSTables for the given table
If there's a bloom filter collision, the tombstone will remain, even if collision was false positive.
If there is any data, even other tombstones for that partition in any SSTable, the tombstone will not get cleaned up
If bloom filters indicate that there is no chance of overlap on that partition key, then the tombstone gets cleaned up.

So technically, the tombstone/ttl may go away after gc_grace, but it's not guaranteed.

The question is about the premise that states that default_time_to_live does not require tombstones, not about explicit deletes vs ttl deletes. What you say is true but doesn't answer my question. — gabrielgiussi, Sep 13 '18 at 17:04

How default_time_to_live would delete rows without tombstones in Cassandra?

2 Answers2

Linked