0

I have a table where several reporting entitities store several versions of their data (indexed by an integer version number). I created a view for that table that selects only the latest version:

SELECT * FROM MYTABLE NATURAL JOIN 
(
  SELECT ENTITY, MAX(VERSION) VERSION FROM MYTABLE 
  GROUP BY ENTITY
)

Now I want to create another view that always selects the one version before the latest for comparison purposes. I thought about using MAX()-1 for this (see below), and it generally works but the problem is that this excludes entries from entities who reported only one version.

SELECT * FROM MYTABLE NATURAL JOIN 
(
  SELECT ENTITY, MAX(VERSION) - 1 VERSION FROM MYTABLE 
  GROUP BY ENTITY
)

Edit: for clarity, if there is only one version available, I would like it to report that one. As an example, consider the following table:

ENTITY  VERSION VALUE1
10000   1       10
10000   2       11
12000   1       50
14000   1       15
14000   2       16
14000   3       17

Now what I would like to get with my query would be

ENTITY  VERSION VALUE1
10000   1       10
12000   1       50
14000   2       16

But with my current query, the entry for 12000 drops out.

Chris7b
  • 167
  • 1
  • 3
  • 8
  • 2
    So what should it do if there is only one version? Report null, or zero, or the first (only) version? Adding some sample data and expected output to the question might clarify things. And can versions ever be deleted - so if you have version 8, there might not be a version 7? – Alex Poole Mar 15 '16 at 15:57
  • 1
    Don't use `NATURAL JOIN`. It is a bug waiting to happen. – Gordon Linoff Mar 15 '16 at 16:04
  • @GordonLinoff I was using `NATURAL JOIN` because `INNER JOIN` gave me additional fields (like VALUE1_1) which I dont't want. What would you recommend instead? – Chris7b Mar 15 '16 at 16:12
  • @Chris7b . . . `USING` or `ON`. You probably need to choose the right `join` keys. And that is the problem with `NATURAL JOIN`: it chooses all columns that happen to have the same name. Dangerous. – Gordon Linoff Mar 15 '16 at 21:29

3 Answers3

1

You can formulate the task as: Get the two highest available versions per entity and from these take the minimum version per entity. You determine the n highest versions by ranking the records with ROW_NUMBER.

select entity, min(version)
from
(
  select 
    entity, 
    version, 
    row_number() over (partition by entity order by version desc) as rn
  from mytable 
)
where rn <= 2
group by entity;

This works no matter if there is only one record or two or more for an entity and regardless of any possible gaps.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
0

You can join to MAX(VERSION) in case there is only one VERSION value for an ENTITY:

SELECT * 
FROM MYTABLE 
NATURAL JOIN 
(
  SELECT ENTITY, 
         CASE 
            WHEN MAX(VERSION) <> MIN(VERSION) THEN MAX(VERSION) - 1 
            ELSE MAX(VERSION)
         END VERSION 
  FROM MYTABLE 
  GROUP BY ENTITY
)
Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
  • Great, thanks! It does the job perfectly, but I accepted the other answer as it was recommended to avoid the `NATURAL JOIN` altogether. – Chris7b Mar 15 '16 at 16:33
0

You could avoid the self-join with an analytic query:

SELECT ENTITY, VERSION, LAST_VERSION
FROM (
  SELECT ENTITY, VERSION,
    NVL(LAG(VERSION) OVER (PARTITION BY ENTITY ORDER BY VERSION), VERSION) AS LAST_VERSION,
    RANK() OVER (PARTITION BY ENTITY ORDER BY VERSION DESC) AS RN
  FROM MYTABLE
)
WHERE RN = 1;

That finds the current and previous version at the same time, so you could have a single view to get both if you want.

The LAG(VERSION) OVER (PARTITION BY ENTITY ORDER BY VERSION) gets the previous version number for each entity, which will be null for the first recorded version; so NVL is used to take the current version again in that case. (You can also use the more standard COALESCE function). This also allows for gaps in the version numbers, if you have any.

The RANK() OVER (PARTITION BY ENTITY ORDER BY VERSION DESC) assigns a sequential number to each entity/version pair, with the DESC meaning the highest version is ranked 1, the second highest is 2, etc. I'm assuming you won't have duplicate versions for an entity - you can use DENSE_RANK and decide how to break ties if you do, but it seems unlikely.

For your data you can see what that produces with:

SELECT ENTITY, VERSION, VALUE1,
  LAG(VERSION) OVER (PARTITION BY ENTITY ORDER BY VERSION) AS LAG_VERSION,
  NVL(LAG(VERSION) OVER (PARTITION BY ENTITY ORDER BY VERSION), VERSION) AS LAST_VERSION,
  RANK() OVER (PARTITION BY ENTITY ORDER BY VERSION DESC) AS RN
FROM MYTABLE
ORDER BY ENTITY, VERSION;

    ENTITY    VERSION     VALUE1 LAG_VERSION LAST_VERSION         RN
---------- ---------- ---------- ----------- ------------ ----------
     10000          1         10                        1          2
     10000          2         11           1            1          1
     12000          1         50                        1          1
     14000          1         15                        1          3
     14000          2         16           1            1          2
     14000          3         17           2            2          1

All of that is done in an inline view, with the outer query only returning those ranked first - that is, the row with the highest version for each entity.

You can include the VALUE1 column as well, e.g. just to show the previous values:

SELECT ENTITY, VERSION, VALUE1
FROM (
  SELECT ENTITY,
    NVL(LAG(VERSION) OVER (PARTITION BY ENTITY ORDER BY VERSION), VERSION) AS VERSION,
    NVL(LAG(VALUE1) OVER (PARTITION BY ENTITY ORDER BY VERSION), VALUE1) AS VALUE1,
    RANK() OVER (PARTITION BY ENTITY ORDER BY VERSION DESC) AS RN
  FROM MYTABLE
)
WHERE RN = 1
ORDER BY ENTITY;

    ENTITY    VERSION     VALUE1
---------- ---------- ----------
     10000          1         10
     12000          1         50
     14000          2         16
Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Great, thanks to all the replies!! I'll accept his answer as it avoids my original `NATURAL JOIN` which appearantly is poor practice. – Chris7b Mar 15 '16 at 16:32
  • @Chris7b - there's some discussion of the reasons not to use natural joins in [the answers to this question](http://stackoverflow.com/q/8696383/266304). Your self-join query isn't really vulnerable to schema changes - more than any other join would be - but it's not a good habit to get into. You could have used an inner join if you didn't use `select *`, which is another bad habit to avoid. But in this case, a join isn't needed at all. – Alex Poole Mar 15 '16 at 16:36
  • Thanks for the additional info! Having read some of the answers, I feel more comfortable about the natural join again. But it's true that I was lazy with the * (the actual table has over 20 fields, in my defense ;)). – Chris7b Mar 15 '16 at 16:46