8

A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node.

enter image description here

Question - in case the primary key does not exists and I set the Name column as clustered index. In this case, will the above statement becomes contradictory?

Nilish
  • 1,066
  • 3
  • 12
  • 26
  • 1
    I believe non-unique clustered keys will also get up to 4 bytes of uniqueifying data. So your name column does not have to be unique. – ta.speot.is May 29 '12 at 12:47
  • 1
    Consider asking questions like this on [DBA.SE](http://dba.stackexchange.com/). We have a very targeted audience that would love to answer questions like these. – Nick Chammas May 29 '12 at 19:34

3 Answers3

23

No - why?

The clustered index will still store the actual data pages at its leaf level, (initially) physically sorted by the name column.

The index navigation structure above the leaf level will contain the name column values for all rows.

So overall: nothing changes.

The primary key is a logical construct, designed to uniquely identify each row in your table. That's why it has to be unique and non-null.

The clustering index is a physical construct that will (initially) phyiscally sort your data by the clustering key and arrange the SQL Server pages accordingly.

While in SQL Server, the primary is used by default as the clustering key, the two do not have to fall together - nor does one have to exist with the other. You can have a table with a non-clustered primary key, or a clustered table without primary key. Both is possible. Whether it's sensible to have that is another discussion - but it's technically possible.

Update: if your primary key is your clustering key, uniqueness is guaranteed (since the primary key must be unique). If you're choosing some column that is not the primary key as your clustering key, and that column does not guarantee uniqueness, SQL Server will - behind the scenes - add a 4-byte (INT) uniqueifier column to those duplicates values to make them unique. So you might have Smith, Smith1, Smith2 and so forth in your clustered index navigation structure for your Smith's.

See:

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • So how are row locators defined in this case that point to the actual data rows ? I mean name can be duplicate. right? – Nilish May 29 '12 at 12:49
  • can you share the link that states this fact? thanks. This is my last comment. – Nilish May 29 '12 at 12:51
  • can you share some knowledge [here](http://stackoverflow.com/q/10803226/1415094) ? – Nilish May 29 '12 at 16:55
  • Hey! What about bulk insert? Is there any difference when inserting into a table with clustered index only or in a table with primary key? – Dmitriy Dokshin Feb 02 '16 at 11:01
3

If the clustered index is not unique, SQL Server creates a 4-byte uniqueifier and adds it to the clustered index value. The uniqueifier is added only if the clustered index value is duplicate, not for all clustered index values. All nonclustered indexes will contain this value in its leaf level, and non-unique nonclustered index will also have this uniqueifier value in its non-leaf level entry, as a part of bookmark.

Cavar
  • 31
  • 1
2

Difference between a Primary key and a unique index (or constraint) is that Null values are not allowed in a the primary key column. There is no need to have a primary key on a table but it make things easier for external application to edit the rows in the table and even then, it's not really a necessity with most external applications.

In term of performance, this change nothing. The important is the presence or absence of indexes (either unique or not, clustered or not and with null values or not) and the primary key is essentially simply one more unique index without null value.

For the clustered index, the column doesn't need to be unique and/or without null. A column with duplicates and null values is fine for creating a clustered index.

For a foreign key, it must reference a column with a unique index on it but not necessarily a primary key or without null value. It's perfectly legal to reference a column that is not a primary key and is allowing null value a long as there is a unique index on it. Notice that because there must be an unique index on it, this column cannot have more than a single null value.

There is no limitation on the foreign key column itself (the column on the foreign table) but performance wise, setting an index on it is often a good thing.

Gaurav Agrawal
  • 4,355
  • 10
  • 42
  • 61
  • 1
    if the column(s) used for the clustered index is not unique, SQL Server will *make* it unique - by adding a 4-byte uniqueifier. It's additional overhead you need to be aware of. You *can* use a non-unique column - I don't think it's a very good idea in general, to do so. – marc_s May 29 '12 at 12:59