A surrogate key (or synthetic key, entity identifier, system-generated key, database sequence number, factless key, technical key, or arbitrary unique identifier[citation needed]) in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from application data.[1]
May 30, 2005 There are many other possible variations on those themes. But I have to say that SSIS could be improved very much to make life easier for the BI developer: a real âdefinitiveâ surrogate key generation task could be the right answer to this kind of needs. Mar 19, 2012 The CSUM function will generate the next surrogate key number only if the highest surrogate key already generated is provided as part of the equation. This option can be implemented by developing a surrogate key generation process via a stored procedure together with a surrogate key table containing the natural key plus the surrogate key.
Definition[edit]
There are at least two definitions of a surrogate:
The Surrogate (1) definition relates to a data model rather than a storage model and is used throughout this article. See Date (1998).
An important distinction between a surrogate and a primary key depends on whether the database is a current database or a temporal database. Since a current database stores only currently valid data, there is a one-to-one correspondence between a surrogate in the modeled world and the primary key of the database. In this case the surrogate may be used as a primary key, resulting in the term surrogate key. In a temporal database, however, there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object.
Although Hall et al. (1976) say nothing about this, others[specify] have argued that a surrogate should have the following characteristics:
Surrogates in practice[edit]
In a current database, the surrogate key can be the primary key, generated by the database management system and not derived from any application data in the database. The only significance of the surrogate key is to act as the primary key. It is also possible that the surrogate key exists in addition to the database-generated UUID (for example, an HR number for each employee other than the UUID of each employee).
A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server 'identity column', a PostgreSQL or Informix
serial , an Oracle or SQL ServerSEQUENCE or a column defined with AUTO_INCREMENT in MySQL). Some databases provide UUID/GUID as a possible data type for surrogate keys (e.g. PostgreSQL UUID or SQL Server UNIQUEIDENTIFIER ).
Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making the database more agile) and guarantees uniqueness.
In a temporal database, it is necessary to distinguish between the surrogate key and the business key. Every row would have both a business key and a surrogate key. The surrogate key identifies one unique row in the database, the business key identifies one unique entity of the modeled world. One table row represents a slice of time holding all the entity's attributes for a defined timespan. Those slices depict the whole lifespan of one business entity. For example, a table EmployeeContracts may hold temporal information to keep track of contracted working hours. The business key for one contract will be identical (non-unique) in both rows however the surrogate key for each row is unique.
Some database designers use surrogate keys systematically regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one.
Some of the alternate names ('system-generated key') describe the way of generating new surrogate values rather than the nature of the surrogate concept.
Approaches to generating surrogates include:
![]() Advantages[edit]![]() Immutability[edit]
Surrogate keys do not change while the row exists. This has the following advantages:
Requirement changes[edit]
Attributes that uniquely identify an entity might change, which might invalidate the suitability of natural keys. Consider the following example:
In these cases, generally a new attribute must be added to the natural key (for example, an original_company column).With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all tables (and possibly other, related software) that use the natural key will have to change.
Some problem domains do not clearly identify a suitable natural key. Surrogate keys avoid choosing a natural key that might be incorrect.
Performance[edit]
Crysis 3 product key. Surrogate keys tend to be a compact data type, such as a four-byte integer. This allows the database to query the single key column faster than it could multiple columns. Furthermore, a non-redundant distribution of keys causes the resulting b-tree index to be completely balanced. Surrogate keys are also less expensive to join (fewer columns to compare) than compound keys.
Compatibility[edit]
While using several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, it is much easier to use an integer or GUID surrogate keys for every table instead of natural keys in order to support database-system-agnostic operations and object-to-row mapping.
Uniformity[edit]
When every table has a uniform surrogate key, some tasks can be easily automated by writing the code in a table-independent way.
Validation[edit]
It is possible to design key-values that follow a well-known pattern or structure which can be automatically verified. For instance, the keys that are intended to be used in some column of some table might be designed to 'look differently from' those that are intended to be used in another column or table, thereby simplifying the detection of application errors in which the keys have been misplaced. However, this characteristic of the surrogate keys should never be used to drive any of the logic of the applications themselves, as this would violate the principles of Database normalization.
Disadvantages[edit]Disassociation[edit]
The values of generated surrogate keys have no relationship to the real-world meaning of the data held in a row. When inspecting a row holding a foreign key reference to another table using a surrogate key, the meaning of the surrogate key's row cannot be discerned from the key itself. Every foreign key must be joined to see the related data item. If appropriate database constraints have not been set, or data imported from a legacy system where referential integrity was not employed, it is possible to have a foreign-key value that does not correspond to a primary-key value and is therefore invalid. (In this regard, C.J. Date regards the meaninglessness of surrogate keys as an advantage. [5])
To discover such errors, one must perform a query that uses a left outer join between the table with the foreign key and the table with the primary key, showing both key fields in addition to any fields required to distinguish the record; all invalid foreign-key values will have the primary-key column as NULL. The need to perform such a check is so common that Microsoft Access actually provides a 'Find Unmatched Query' wizard that generates the appropriate SQL after walking the user through a dialog. (It is, however, not too difficult to compose such queries manually.) 'Find Unmatched' queries are typically employed as part of a data cleansing process when inheriting legacy data.
Surrogate keys are unnatural for data that is exported and shared. A particular difficulty is that tables from two otherwise identical schemas (for example, a test schema and a development schema) can hold records that are equivalent in a business sense, but have different keys. This can be mitigated by NOT exporting surrogate keys, except as transient data (most obviously, in executing applications that have a 'live' connection to the database).
When surrogate keys supplant natural keys, then domain specific referential integrity will be compromised. For example, in a customer master table, the same customer may have multiple records under separate customer IDs, even though the natural key (a combination of customer name, date of birth, and E-mail address) would be unique. To prevent compromise, the natural key of the table must NOT be supplanted: it must be preserved as a unique constraint, which is implemented as a unique index on the combination of natural-key fields.
Query optimization[edit]
Relational databases assume a unique index is applied to a table's primary key. The unique index serves two purposes: (i) to enforce entity integrity, since primary key data must be unique across rows and (ii) to quickly search for rows when queried. Since surrogate keys replace a table's identifying attributesâthe natural keyâand since the identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan when fulfilling likely queries. The remedy to the full table scan is to apply indexes on the identifying attributes, or sets of them. Where such sets are themselves a candidate key, the index can be a unique index.
These additional indexes, however, will take up disk space and slow down inserts and deletes.
Normalization[edit]
Surrogate keys can result in duplicate values in any natural keys. To prevent duplication, one must preserve the role of the natural keys as unique constraints when defining the table using either SQL's CREATE TABLE statement or ALTER TABLE ..ADD CONSTRAINT statement, if the constraints are added as an afterthought.
Business process modeling[edit]
Because surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business requirements, relying on the natural key, then need to be translated to the surrogate key. A strategy is to draw a clear distinction between the logical model (in which surrogate keys do not appear) and the physical implementation of that model, to ensure that the logical model is correct and reasonably well normalised, and to ensure that the physical model is a correct implementation of the logical model.
Inadvertent disclosure[edit]
Proprietary information can be leaked if sequential key generators are used. By subtracting a previously generated sequential key from a recently generated sequential key, one could learn the number of rows inserted during that time period. This could expose, for example, the number of transactions or new accounts per period. There are a few ways to overcome this problem:
Inadvertent assumptions[edit]Surrogate Key And Primary Key
Sequentially generated surrogate keys can imply that events with a higher key value occurred after events with a lower value. This is not necessarily true, because such values do not guarantee time sequence as it is possible for inserts to fail and leave gaps which may be filled at a later time. If chronology is important then date and time must be separately recorded.
What Is Surrogate Key Generation NameSee also[edit]References[edit]Citations[edit]
Sources[edit]
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Surrogate_key&oldid=949211050'
What Is Surrogate Key Generation Plus
This article demonstrates how to âroll your ownâ surrogate keys and sequences in a platform-independent way, using standard SQL.
Surrogate keys
Relational theory talks about something called a âcandidate key.â In SQL terms, a candidate key is any combination of columns that uniquely identifies a row (SQL and the relational model arenât the same thing, but Iâll put that aside for this article). The dataâs primary key is the minimal candidate key. Many people think a primary key is something the DBA defines, but thatâs not true. The primary key is a property of the data, not the table that holds the data.
Unfortunately, the minimal candidate key is sometimes not a good primary key in the real world. For example, if the primary key is 6 columns wide and I need to refer to a row from another table, itâs impractical to make a 6-column wide foreign key. For this reason, database designers sometimes introduce a surrogate key, which uniquely identifies every row in the table and is âmore minimalâ than the inherently unique aspect of the data. The usual choice is a monotonically increasing integer, which is small and easy to use in foreign keys.
Every RDBMS of which Iâm aware offers a feature to make surrogate keys easier by automatically generating the next larger value upon insert. In SQL Server, itâs called an
IDENTITY column. In MySQL, itâs called AUTO_INCREMENT . Itâs possible to generate the value in SQL, but itâs easier and generally safer to let the RDBMS do it instead. This does lead to some issues itself, such as the need to find out the value that was generated by the last insertion, but those are usually not hard to solve (LAST_INSERT_ID() and similar functions, for example).
Itâs sometimes desirable not to use the provided feature. For instance, I might want to be sure I always use the next available number. In that case, I canât use the built-in features, because they donât generate the next available number under some circumstances. For example, SQL Server doesnât decrement the internal counter when transactions are rolled back, leaving holes in the data (see my article on finding missing numbers in a sequence). Neither MySQL nor SQL Server decrements the counter when rows are deleted.
In these cases, itâs possible to generate the next value in the insert statement. Suppose my table looks like this:
The next value for
c1 is simply the maximum value + 1. If there is no maximum value, it is 1, which is the same as 0 + 1.
There are platform-dependent ways to write that statement as well, such as using SQL Serverâs
ISNULL function or MySQLâs IFNULL . This code can be combined into an INSERT statement, such as the following statement to insert 3 into the second column:
The code above is a single atomic statement and will prevent any two concurrent inserts from getting the same value for
c1 . It is not safe to find the next value in one statement and use it in another, unless both statements are in a transaction. I would consider that a bad idea, though. Thereâs no need for a transaction in the statement above.
Downsides to this approach are inability to find the value of
c1 immediately after inserting, and inability to insert multiple rows at once. The first problem is inherently caused by inserting meaningless data, and is always a problem, even with the built-in surrogate keys where the RDBMS provides a mechanism to retrieve the value.
Sequences: a better surrogate key
Surrogate keys are often considered very bad practice, for a variety of good reasons I wonât discuss here. Sometimes, though, there is just nothing for it but to artificially unique-ify the data. In these cases, a sequence number can often be a less evil approach. A sequence is just a surrogate key that restarts at 1 for each group of related records. For example, consider a table of log entries related to records in my
t1 table:
At this point I might want to enter some more records (0, 11) into
t1 :
Now suppose I want the following three log entries for the first row in
t1 :
Thereâs no good primary key in this data. I will have to add a surrogate key. It might seem I could add a date-time column instead, but thatâs a dangerous design. It breaks as soon as two records are inserted within a timespan less than the maximum resolution of the data type. It also breaks if two records are inserted in a single transaction where the time is consistent from the first to the last statement. Iâm much happier with a sequence column. The following statement will insert the log records as desired:
If I want to enter a log record on another record in
t1 , the sequence will start at 1 for it:
MySQL actually allows an
AUTO_INCREMENT value to serve as a sequence for certain table types (MyISAM and BDB). To do tihs, just make the column the last column in a multi-column primary key. Iâm not aware of any other RDBMS that does this.
Comments are closed.
|
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |