2

I'm working on an application in CakePHP 3.5.13. It is tied to a "legacy" database - i.e. one which hasn't been written according to CakePHP naming conventions. Nonetheless I've baked it and produced the Table and Entity files.

I'm having problems constructing an SQL query using the ORM.

The query - in plain SQL - looks like this:

SELECT s.id FROM substances s WHERE s.id 
IN
    (SELECT fs.substance_id FROM filters_substances fs WHERE fs.filter_id IN (147))
AND s.id IN 
    (SELECT fs.substance_id FROM filters_substances fs WHERE fs.filter_id IN (47,93))
AND s.id NOT IN
    (SELECT fs.substance_id FROM filters_substances fs WHERE fs.filter_id IN (148,354))

I've managed to write the following using the ORM:

$subquery_in = $FiltersSubstances
    ->find()
    ->select(['FiltersSubstances.substance_id'])
    ->where(['FiltersSubstances.filter_id IN' => [47,93]]);

$subquery_not_in = $FiltersSubstances
    ->find()
    ->select(['FiltersSubstances.substance_id'])
    ->where(['FiltersSubstances.filter_id IN' => [354]]);

$Substances = TableRegistry::get('Substances');

$query = $Substances
    ->find()
    ->where([
        'Substances.id IN' => $subquery_in,
        'Substances.id NOT IN ' => $subquery_not_in
    ]);

debug($query);
debug($query->all());

The above query works fine but is incomplete: It will give me all of the results but doesn't take into account the line (in the plain SQL) which contains fs.filter_id IN (147).

The problem I'm having is that I don't know how to produce multiple AND IN sub-queries.

I've attempted the following:

// Same as PHP given previously, plus:

$subquery_in2 = $FiltersSubstances
    ->find()
    ->select(['FiltersSubstances.substance_id'])
    ->where(['FiltersSubstances.filter_id IN' => [147]]);

// Modified $query:

$query = $Substances
    ->find()
    ->where([
        'Substances.id IN' => [$subquery_in, $subquery_in2],
        'Substances.id NOT IN ' => $subquery_not_in]
    );

// Also tried:

$query = $Substances
    ->find()
    ->where([
        'Substances.id IN' => $subquery_in,
        'Substances.id IN' => $subquery_in2,
        'Substances.id NOT IN ' => $subquery_not_in
    ]);

The above attempts do not work - SQL errors.

Please can anyone advise if/how it's possible to construct this query in the ORM?

I have considered going down the "raw SQL" route of writing it by hand. This would seem a great shame as the ORM is doing 95% of what I need.

I have read Cakephp 3 NOT IN query which is how I managed to get the first part of what I needed to work done. However the query in that example is arguably less complex and doesn't meet my needs here.

Important: The raw SQL produces the correct results. I know some people might say, why not just group all of the IN ID's into one query. However, that would produce very different - and incorrect - results. The IN... AND IN is extremely important here. I know that the query is fine, it's how to write it in Cake's ORM I'm interested in.

ndm
  • 59,784
  • 9
  • 71
  • 110
Andy
  • 5,142
  • 11
  • 58
  • 131
  • I understand that you cannot combine the `IN` and `NOT IN` in case the same `substance_id` might occour in the groups, but the first two `IN`'s should be mergeable, given that they are combined via `AND`? – ndm Mar 21 '18 at 16:18
  • @ndm It will give very different results if you try that. That's why I added the "important" note. For example if I execute - in raw SQL - `WHERE fs.filter_id IN (147,47,93))` (all in one condition) it produces over 6000 rows. Whereas it produces 2 when they are separate (`fs.filter_id IN (147)` followed by `fs.filter_id IN (47,93)`). Unfortunately I cannot provide the database. But there are subtle differences when running queries in this way. – Andy Mar 21 '18 at 16:25
  • Yeah, sorry, I got a little distracted, of course it's different, given that `IN` is basically a list of `OR`'s... – ndm Mar 21 '18 at 16:36
  • No problem, I always appreciate any help. I guess my question is just whether conditions like this can be written in the ORM, or if I'd be better going down the "write your own SQL in Cake" road for this? It's a shame because the ORM is making writing most of that query pretty easy, and I'd hate to have to write something that constructs all of those parts of the SQL. The application allows users to select various filters, so there are variations of this too, but for now I'm most interested in knowing if this is possible via the ORM. – Andy Mar 21 '18 at 16:38

1 Answers1

3

Your problem basically is that you need to use the same identifier multiple times, which is invalid in a PHP array.

You can solve this by either nesting the conditions:

->where([
    ['Substances.id IN' => $subquery_in],
    ['Substances.id IN' => $subquery_in2],
    'Substances.id NOT IN ' => $subquery_not_in
])

by issuing multiple where() calls:

->where(['Substances.id IN' => $subquery_in])
->where(['Substances.id IN' => $subquery_in2])
->where(['Substances.id NOT IN' => $subquery_not_in])

or by using expressions:

->where(function (\Cake\Database\Expression\QueryExpression $exp) use (
    $subquery_in,
    $subquery_in2,
    $subquery_not_in
) {
    return $exp
        ->in('Substances.id', $subquery_in)
        ->in('Substances.id', $subquery_in2)
        ->notIn('Substances.id', $subquery_not_in);
})

It is also possible to mix expressions with the regular array syntax:

$inExpressions = $query
    ->newExpr()
    ->in('Substances.id', $subquery_in)
    ->in('Substances.id', $subquery_in2);

// ...

->where([
    $inExpressions,
    'Substances.id NOT IN ' => $subquery_not_in
])

See also

ndm
  • 59,784
  • 9
  • 71
  • 110
  • This is really good. The vanilla PHP used to construct such statements is much more complex. I'm pleased the ORM is capable of doing this. – Andy Mar 22 '18 at 12:15