4

I have the following class to keep my records:

class List(ndb.Model):
    '''
    Index
      Key:              sender
    '''
    sender = ndb.StringProperty()
    ...
    counter = ndb.IntegerProperty(default=0)
    ignore = ndb.BooleanProperty(default=False)

    added = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
    updated = ndb.DateTimeProperty(auto_now=True, indexed=False)

The following code is used to return all entities I need:

entries = List.query()
entries = entries.filter(List.counter > 5)
entries = entries.filter(List.ignore == False)
entries = entries.fetch()

How should I modify the code to get 10 random records from entries? I am planning to have a daily cron task to extract random records, so they should be really random. What is the best way to get these records (to minimize number of read operations)?

I don't think that the following code is the best:

entries = random.sample(entries, 10)
LA_
  • 19,823
  • 58
  • 172
  • 308
  • Check this.http://stackoverflow.com/questions/17289752/query-random-row-in-ndb/17291209#17291209 – Jimmy Kane Feb 08 '14 at 14:51
  • Thanks, @JimmyKane, but it will not work in my case - as given in my question, (1) ids are not auto generated, (2) I should filter entities. – LA_ Feb 08 '14 at 18:01
  • if you already have the entities and you want to choose 10 in random among them, then why is not the random.sample suitable? – Jimmy Kane Feb 08 '14 at 18:11
  • @JimmyKane, since I do `List.query().filter(..).fetch()`, I believe it makes to many datastore reads. Am I wrong? – LA_ Feb 08 '14 at 18:43

1 Answers1

3

Well after reading the comments the only improvement you can make as far I can see is to fetch the keys only and limit if possible.

Haven't tested but like so

list_query = List.query()
list_query = list_query.filter(List.counter > 5)
list_query = list_query.filter(List.ignore == False)
list_keys = list_query.fetch(keys_only=True) # maybe put a limit here.

list_keys = random.sample(list_keys, 10)
lists = [list_key.get() for list_key in list_keys]
Jimmy Kane
  • 16,223
  • 11
  • 86
  • 117
  • Had some typos there. – Jimmy Kane Feb 08 '14 at 19:17
  • 1
    It works, thanks. I'll wait more for other answers and if no other/better solutions, will accept yours ;). – LA_ Feb 09 '14 at 18:33
  • @ LA_ - Seven months, no answer. If you've came with better technique - I'd like to hear it, else it'll be good to accept @jimmy-kane answer. – Jakub Mendyk Sep 20 '14 at 18:39
  • @KubaBest Still no better implementation. Will check the GAE changelog if there is any new features but I don't think so. – Jimmy Kane Sep 22 '14 at 09:09
  • 3
    Would you lose anything by changing `lists = [key.get() for key in list_keys]` to `lists = ndb.get_multi(list_keys)`? I believe the latter is more performant: 10 RPC's vs. 1 RPC. – Zach Young May 26 '15 at 19:23