Ambiguity in the SQL query

Question

There are two tables: table customer consists of information about customers and table payment consists of information about payments. Primary key customer_id in the customer table is a foreign key in the table payment_id. The following two queries return identical results:

SELECT
  payment.customer_id,
  last name,
  amount
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id

SELECT
  customer.customer_id,
  last_name,
  amount
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id

The only difference between the queries is in the first argument in the SELECT clause: payment.customer_id vs customer.customer_id. As the customer_id is the column on which the tables are joining on, the distinction between payment.customer_id and customer.customer_id seems meaningless. However, if I try to omit the table in the query:

SELECT
  customer_id,
  last_name,
  amount
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id

I receive

[42702] ERROR: column reference "customer_id" is ambiguous

Could you please describe where is ambiguity in the query?

How can DBMS engine know which (common) column do you prefer without any addressing to individual table(s)? — Barbaros Özhan, Jul 01 '18 at 07:40

score 4 · Answer 1 · edited Jul 05 '18 at 06:10

The error means there are two columns have same name customer_id, let DB engine didn't know Which one column do you want to query.

You need to explicitly tell the DB engine the name of the column you want to query.

Tables may be added a new column after table has been created, the new column may be the same as the old column name if you didn't clearly specify SELECT table columns of the query in the selection the will be an error on your original query.

Here is some suggestion for you

You can give query table an Alias name, let your query Clearer.
Clearly specify SELECT table columns of the query in the selection from tables name, because of tables

If last_name column in payment table and amount column in customer

you can do this.

SELECT
  c.customer_id,
  p.last_name,
  c.amount
FROM customer c
INNER JOIN payment p ON c.customer_id = p.customer_id

Jasen · Accepted Answer · 2018-07-01T10:49:46.737

Just because two columns match using the equality test, does not mean that they have the same value.

The two columns could be different types eg integer and float, or numeric etc.

Or they could be citext which does case insensitive comparisons (one table could have 'RedRum' and other 'redruM').

often the join condition might not be a strict equality (eg a network range comparison, or prefix match)

In all these cases which table you use for the result column is significant.

if you're doing an outer join table name is again significant.

Postgresql does not know when = means that the table can be implied and when it cannot, it requires it always.

Rule of thumb, when joining tables specify the table of every column you use in the query. that way things won't break if someone adds some columns to the other table.

score 2 · Answer 3 · answered Jul 01 '18 at 07:47

2

You answered your own question by omitting the table in the select statement. By not specifying it, SQL doesn't know Which table is customer_id referring to.

answered Jul 01 '18 at 07:47

Songtham T.

185
1
2
15

score 2 · Answer 4 · answered Jul 01 '18 at 08:54

Could you please describe where is ambiguity in the query?

Logically there is no ambiguity in the query, as both the columns must have the same values. However, an ambiguity may appear when you use LEFT JOIN instead of INNER JOIN, e.g.:

INSERT INTO customer (customer_id, last_name) VALUES
(1, 'Smith'),
(2, 'Jones');

INSERT INTO payment (customer_id, amount) VALUES
(1, 100);

SELECT
    customer.customer_id,
    payment.customer_id,
    last_name,
    amount
FROM customer
LEFT JOIN payment ON customer.customer_id = payment.customer_id

 customer_id | customer_id | last_name | amount 
-------------+-------------+-----------+--------
           1 |           1 | Smith     |    100
           2 |             | Jones     |       
(2 rows)

The parser just follows general rules and does not analyse a query to find out when a potential ambiguity can come to light.

score 1 · Answer 5 · answered Jul 01 '18 at 08:22

1

It is a good practice to always prefix your column with table/subquery alias.

But in your case(only PK/FK names are shared among both tables) you could also use USING clause:

SELECT
  customer_id,
  last_name,
  amount
FROM customer
JOIN payment USING(customer_id);

DBFiddle Demo

There is also third possible solution, but I strongly recommend not to use it:

SELECT
  customer_id,
  last_name,
  amount
FROM customer
NATURAL JOIN payment

answered Jul 01 '18 at 08:22

Lukasz Szozda

162,964
23
234
275

I strongly recommend the `NATURAL JOIN` 'solution' because I consider it best practice to avoid duplicate columns rather than using range variables to work around the problems they cause. – onedaywhen Jul 02 '18 at 12:35
@onedaywhen I prefer to avoid natural join, [here](https://stackoverflow.com/a/8696399/5070879) you can find some reasons why. – Lukasz Szozda Jul 02 '18 at 16:55
@onedaywhen Feel free to use any approach that suits you best. In my opinion `NATURAL JOIN` does not solve anything plus it could cause more problems in the future. My advice is simple: don't be lazy (use table aliases for each column and it will pay off in the long run). – Lukasz Szozda Jul 06 '18 at 07:01
First, what you call "table aliases" are actually range variables. The idea is they 'range over' the rows in the table and represent a row, not a table. Therefore, "table aliases" is most inappropriate. Second, range variables were the workaround to the problem of duplicate columns generated by pre-1992 join types. The problem was fixed in 1992 with `NATURAL JOIN`, a superior join type that replaces older join types... – onedaywhen Jul 06 '18 at 10:34
If I chose to use a pre-1992 join type, it would generate duplicate columns, which I'd then need to use range variables to workaround. Instead, I choose to avoid the problem entirely by using `NATURAL JOIN`. Your accusation of laziness is well wide of the mark. – onedaywhen Jul 06 '18 at 10:34
@onedaywhen "NATURAL JOIN, a superior join type that replaces older join types" `NATURAL JOIN` is not "superior" at all. I prefer EIBTI rule simply because it is easier to maintain in the long run. If you feel that natual join is strongly recommended I accept that. Based on my experience I wouldn't use it on production code(simply when schema drifts and new column is added with the same name) many bad things may happen. – Lukasz Szozda Jul 06 '18 at 15:37
@onedaywhen If I had to specify each table columns with CTE, as you proposed, then I would simply use `JOIN ON` and fully qualified names. `consider that adding new columns to an existing table is probably not a good idea in principle` Still I cannot agree with that statement. Views may be good if you read only date otherwise you have to do some clumsy things like making columns nullable or add triggers. – Lukasz Szozda Jul 10 '18 at 16:06
Noting a table is analogous to a type (and, at a stretch, its key is its interface), take a look at the [open-closed principle](https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle). But principle aside, if doing something in practice (adding a column to a table...) is a breaking change (...breaks a customer's query) then doing so is reckless. How does one 'extend' a table while avoiding nulls and triggers? Extending table has same key as extended table and with new attribute. I think the closest to 'principle' here is [6NF](https://en.wikipedia.org/wiki/Sixth_normal_form). – onedaywhen Jul 12 '18 at 08:35

score 1 · Answer 6 · answered Jul 02 '18 at 12:33

Legacy joins such as INNER JOIN create duplicate columns. Using INNER JOIN in your query generates two columns named customer_id. The SQL language has a workaround for this: you must prefix the column with a range variable, as others have suggested here (though using the misleading term 'table alias').

Thankfully, the SQL language also has a fix for this problem: NATURAL JOIN creates no duplicate columns, therefore you don't need to disambiguate them:

SELECT
  customer_id,
  last_name,
  amount
FROM customer
NATURAL JOIN payment

Joins that produce duplicate columns remain because nothing is ever removed from the SQL language (the "shackles of compatibility"). But you don't need any join other than NATURAL JOIN.

The idea is that your data element names mean the same thing throughout your data dictionary e.g. amount means one thing (pertaining to payments) and one things only (there is no amount that pertains to customers or any other type).

Sometimes you may need to 'project away' columns you don't want to participate in NATURAL JOIN e.g.

WITH
C AS ( SELECT customer_id, last_name FROM customer ),
P AS ( SELECT customer_id, amount FROM payment )
SELECT
  customer_id,
  last_name,
  amount
FROM C 
NATURAL JOIN P

This also 'defends' your code e.g. in the unlikely event of someone adding a last_name attribute to payments.

Ambiguity in the SQL query

6 Answers6