What do Clustered and Non clustered index actually mean? ~ Discussion of Coding

What do Clustered and Non clustered index actually mean?

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes. I googled and what I found was :

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

What I found in SO was What are the differences between a clustered and a non-clustered index?.

Can someone explain this in plain English?

Answer by user151323 for What do Clustered and Non clustered index actually mean?

A clustered index means you are telling the database to store close values actually close to one another on the disk. This has the benefit of rapid scan / retrieval of records falling into some range of clustered index values.

For example, you have two tables, Customer and Order:

Customer  ----------  ID  Name  Address    Order  ----------  ID  CustomerID  Price

If you wish to quickly retrieve all orders of one particular customer, you may wish to create a clustered index on the "CustomerID" column of the Order table. This way the records with the same CustomerID will be physically stored close to each other on disk (clustered) which speeds up their retrieval.

P.S. The index on CustomerID will obviously be not unique, so you either need to add a second field to "uniquify" the index or let the database handle that for you but that's another story.

Regarding multiple indexes. You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.

Answer by Shiraz Bhaiji for What do Clustered and Non clustered index actually mean?

With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.

With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indexes, although each new index will increase the time it takes to write new records.

It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.

Answer by Dan Diplo for What do Clustered and Non clustered index actually mean?

A very simple, non-technical rule-of-thumb would be that clustered indexes are usually used for your primary key (or, at least, a unique column) and that non-clustered are used for other situations (maybe a foreign key). Indeed, SQL Server will by default create a clustered index on your primary key column(s). As you will have learnt, the clustered index relates to the way data is physically sorted on disk, which means it's a good all-round choice for most situations.

Answer by Anirudh Sood for What do Clustered and Non clustered index actually mean?

Find below some characteristics of clustered and non-clustered indexes:

Clustered Indexes

Clustered indexes are indexes that uniquely identify the rows in an SQL table.
Every table can have exactly one clustered index.
You can create a clustered index that covers more than one column. For example: create Index index_name(col1, col2, col.....).
By default, a column with a primary key already has a clustered index.

Non-clustered Indexes

Non-clustered indexes are like simple indexes. They are just used for fast retrieval of data. Not sure to have unique data.

Answer by Martin Smith for What do Clustered and Non clustered index actually mean?

In SQL Server row oriented storage both clustered and nonclustered indexes are organized as B trees.

enter image description here

(Image Source)

The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.

The rows on the clustered index leaf pages always contains something for each of the (non sparse) columns in the table (either the value, or a pointer to the actual value).
The clustered index is the primary copy of a table.

Non clustered indexes can also do point 1 by using the INCLUDE clause (Since SQL Server 2005) to explicitly include all non key columns but they are secondary representations and there is always another copy of the data around (the table itself).

CREATE TABLE T  (  A INT,  B INT,  C INT,  D INT  )    CREATE UNIQUE CLUSTERED INDEX ci ON T(A,B)  CREATE UNIQUE NONCLUSTERED INDEX nci ON T(A,B) INCLUDE (C,D)

The two indexes above will be nearly identical. With the upper level index pages containing values for the key columns A,B and the leaf level pages containing A,B,C,D

There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

The above quote from SQL Server books online causes much confusion (as it is frankly misleading). In my opinion it would be much better phrased as.

There can be only one clustered index per table, because the leaf level rows of the clustered index are the table rows.

It is trivially correct that the CI rows and the table rows are ordered the same way as they are the same thing (In the same way as the kittens in the picture above are ordered the same as the baby cats are) however the commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false.

This would be an absurd implementation. For example if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row .

Instead a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (File:Page) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.

e.g. the linked page chain might be 1:2000 <-> 1:157 <-> 1:7053

When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables, or a non empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the file group contains more than one.

The degree to which the logical order and contiguity differs from the idealised physical version is the degree of logical fragmentation.

In a newly created database with a single file I ran the following.

CREATE TABLE T    (       X TINYINT NOT NULL,       Y CHAR(3000) NULL    );    CREATE CLUSTERED INDEX ix    ON T(X);    GO    --Insert 100 rows with values 1 - 100 in random order  DECLARE @C1 AS CURSOR,          @X  AS INT    SET @C1 = CURSOR FAST_FORWARD  FOR SELECT number      FROM   master..spt_values      WHERE  type = 'P'             AND number BETWEEN 1 AND 100      ORDER  BY CRYPT_GEN_RANDOM(4)    OPEN @C1;    FETCH NEXT FROM @C1 INTO @X;    WHILE @@FETCH_STATUS = 0    BEGIN        INSERT INTO T (X)        VALUES        (@X);          FETCH NEXT FROM @C1 INTO @X;    END

Then checked the page layout with

SELECT page_id,         X,         geometry::Point(page_id, X, 0).STBuffer(1)  FROM   T         CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )  ORDER  BY page_id

Results were all over the place. The first row in key order (with value 1 - highlighted with arrow below) was on nearly the last physical page.

enter image description here

Fragmentation can be reduced or removed by rebuilding or reorganising an index to increase the correlation between logical order and physical order.

After running

ALTER INDEX ix ON T REBUILD;

I got the following

enter image description here

If the table has no clustered index it is called a heap.

Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap this is a physical row identifier (rid) and consists of three components (File:Page:Slot). In the case of a Clustered index the row locator is logical (the clustered index key).

For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or INCLUDE-d columns then nothing is added. Otherwise the missing CI key column(s) silently get added in to the NCI.

SQL Server always ensures that the key columns are unique for both types of index. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types however.

Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row. This is just an ascending integer.

For non clustered indexes not declared as unique SQL Server silently adds the row locator in to the non clustered index key. This applies to all rows, not just those that are actually duplicates.

The clustered vs non clustered nomenclature is also used for column store indexes. The paper Enhancements to SQL Server Column Stores states

Although column store data is not really "clustered" on any key, we decided to retain the traditional SQL Server convention of referring to the primary index as a clustered index.

Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

Discussion of Coding

Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Friday, September 2, 2016