App engine - Datastore, NOT EQUAL (!=), 30 Subquery limit

Question

Using Java, Google App Engine and it's Datastore. I know plenty of these questions have been asked before, but I can't quite grasp if what I'm doing is possible or not, or of I'm doing it the wrong way.

What I want to do: Let's say I have 100s of questions each one as a Question-entity. I want a user to download for example 20 questions and answer them. Then later on the user is supposed to download another 20 questions, but I want to make sure it's not the same questions as the ones before. What's the best approach?

I'm currently sending the user 20 questions and each question has a unique ID. Then when they request 20 more questions the user also sends back which questions he answered (in other words the IDs of the 20 questions). When I then make the "query" to retrieve 20 new questions I set a filter by calling query.setFilter("id != 1 && id != 2 && ... && id != 20"). Here's the "problem", is there a limit as to how many '!=' (NOT EQUAL) I can have in this kind of query? Because the 3rd time a user request new questions the filter is not only 20 conditions long, it's instead 60 conditions long. Is this possible? And is it a valid way of doing things? Does the '!=' create a new subquery each time it's used?

Thank you!

score 0 · Accepted Answer · answered Dec 02 '13 at 11:17

0

It sounds like a 'cursor' is what you need.

Query cursors allow an application to retrieve a query's results in convenient batches without incurring the overhead of a query offset. After performing a retrieval operation, the application can obtain a cursor, which is an opaque base64-encoded string marking the index position of the last result retrieved. The application can save this string (for instance in the Datastore, in Memcache, in a Task Queue task payload, or embedded in a web page as an HTTP GET or POST parameter), and can then use the cursor as the starting point for a subsequent retrieval operation to obtain the next batch of results from the point where the previous retrieval ended. A retrieval can also specify an end cursor, to limit the extent of the result set returned.

https://developers.google.com/appengine/docs/java/datastore/queries#Java_Query_cursors

You'll have to be careful how you use it, see the link for more details.

answered Dec 02 '13 at 11:17

Paul Collingwood

9,053
3
23
36

Thank you for a VERY good answer! I might use this if I can't find a better solution, but what approach would one take if the 20 questions were "randomly" chosen? (I know I didn't include this in the question because I didn't wanna limit the potential possibilities). Also, my other questions, is there a limit to how many '!=' I can have? – Whyser Dec 02 '13 at 11:23
np. If you don't want to ask questions that have already been asked you'll have to store what questions the user has answered already. Where/how you do that really depends on how many questions we're talking about. If you run a query it'll return 20 "random" results anyway (unless you specify a sort order) and you can use the cursor from that point on. I don't know how many != you can have as I use python/ndb in any case. – Paul Collingwood Dec 02 '13 at 11:29
Thank you! Even if I store which questions a certain user has answered I would still end up in the same boat, I would still have to filter out the one they already has answered. :( (I seem to be getting the results in the same order each time without any order specified, though this is not really relevant to the case) – Whyser Dec 02 '13 at 11:44
yes, you'll generally get the questions(records) in the same order but it's not guaranteed. Instead of filtering them out, why not try it the other way around. Say you have 100 questions in total, each stored with a key that is also the number of the question. 1,2,3 etc. So to pick some random questions you can simply generate a random number 1 to 100 and then get by key that question. Before you do that check to see if the question picked has already been answered by the user, if so generate another pick and try again. This is not efficient, or elegant, but just something to think about. – Paul Collingwood Dec 02 '13 at 11:47
I ended up using your solution! I used two Cursors; one that always points at the last _known_ element and one that continuously moved from start to end. If new questions were added while moving from start to end, it would also retrieve the latest questions and send them to the user first! – Whyser Dec 03 '13 at 11:41

score 0 · Answer 2 · edited May 23 '17 at 12:16

0

I would have a look at these two questions about fetching random records.

Get a random entity in Google App Engine datastore which does not belong to a list

and

Query random row in ndb

In your case select a random set and then filter out already answered questions, this will be more efficient than the queries you are trying to perform at the moment.

If you pre-allocate ID's in an known integer range for questions, you could then use a bitmap stored for user the to check the keys of questions against it, and set bit's based on the questions id that have been completed. This will be potentially quicker than doing get's on keys.

edited May 23 '17 at 12:16

Community

1
1

answered Dec 02 '13 at 13:23

Tim Hoffman

12,976
1
17
29

Thank you! (Also thanks for considering the random-element to this question without me including it in the original question.) So if I understood you correctly; I should random 20 questions, if 10 of those have already been answered by the user I should either: 1. Be happy with the 10 I got, send them to the user OR... 2. Random another 10 questions (following your logic above) and then send them to the user. Also, does each '!=' in the filter cause a subquery or can I use unlimited amount of '!=' in the filter? – Whyser Dec 02 '13 at 13:59
Each `!=` creates a sub query, so by selecting ranges ie < or > you only need a single query. If you use a bit map, you could even create keys from the bit positions, and fetch a bunch of questions (assuming they exist) that haven't been completed. If you questions are sparsely distributed the random query is more likely to give you results. I would then fetch 40 or 50 questions and discard any once you have 20 non answered questions, rather than do multiple queries. Or at least profile the two strategies to see what performs better most of the time. – Tim Hoffman Dec 02 '13 at 14:02
Okay! I will try this tomorrow and see if I can get it to work in a 'clean' way. I'm leaning abit more towards Pauls solution because of the simpleness (the random wasn't necessary but more preferably). One more question: When I tried I seem to be able to have more then 30 '!=', does this rule only apply when they are used on different "fields"? – Whyser Dec 02 '13 at 14:39
Inequality filters are limited to at most one property - I would read up on inequality filters https://developers.google.com/appengine/docs/python/datastore/queries#Python_Restrictions_on_queries the same restrictions will apply to java. – Tim Hoffman Dec 02 '13 at 14:46
Uhh, yah sorry about that last part. Don't think my question came out as I wanted it too. I know inequality filters are limited to at most one property(/field). What I was really wondering was: Why do my query work even though I try with 60 inequalitys, for example: setFilter("id != 1 && id != 2 && .. && id != 60)? – Whyser Dec 02 '13 at 14:58

App engine - Datastore, NOT EQUAL (!=), 30 Subquery limit

2 Answers2