2

I have an NDB model. Once the data in the model becomes stale, I want to remove stale data items from searches and updates. I could have deleted them, which is explained in this SO post, if not for the need to analyze old data later. I see two choices

  1. adding a boolean status field, and simply mark entities deleted
  2. move entities to a different model

My understanding of the trade off between these two options

  1. mark-deleted is faster
  2. mark-deleted is more error prone: having extra column would require modifying all the queries to exclude entities that are marked deleted. That will increase complexity and probability of bugs.

Question: Can move-entities option be made fast enough to be comparable to mark-deleted? Any sample code as to how to move entities between models efficiently?

Update: 2014-05-14, I decided for the time being to use mark-deleted. I figure there is an additional benefit of fewer RPCs.

Related:

  1. How to delete all entities for NDB Model in Google App Engine for python?
Community
  • 1
  • 1
Michael
  • 1,851
  • 4
  • 19
  • 26
  • I think I implement the same solution if I need to solve the same need. Add a Boolean to mark obsolete entities is the nice solution. If you use the cron task to clean your db you can clean obsolete entities for a selected interval. – olituks May 23 '14 at 06:28

1 Answers1

0

You can use a combination, of the solutions you propose although in my head I think its an over engineering.

1) In first place, write a task queue that will update all of your entities with your new field is_deleted with a default value False, this will prevent all the previous entities to return an error when you ask them if they are deleted.

2) Write your queries in a model level, so you don't have to alter them any time you make a change in your model, but only pass the extra parameter you want to filter on when you make the relevant query. You can get an idea from the model of the bootstrap project gae-init. You can query them with is_deleted = False.

3) BigTable's performance will not be affected if you are querying 10 entities or 10 M entities, but if you want to move the deleted ones in an new Entity model you can try to create a crop job so in the end of the day or something copy them somewhere else and remove the original ones. Don't forget that will use your quota and you mind end up paying literally for the clean up.

Keep in mind also that if there are any dependencies on the entities you will move, you will have to update them also. So in my opinion its better to leave them flagged, and index your flag.

Community
  • 1
  • 1
topless
  • 8,069
  • 11
  • 57
  • 86