We’ll be covering the following topics in this tutorial:
What is Key?
A database consists of tables, which consist of records, which further consist of fields.
This below figure provides an example of a typical table consisting of STUDENT details:
Here, table containing five records and each containing four fields.
Table name = Student
A row =a record
A column = a field
A cell = a data item; a particular value in a field for a particular record.
A table in a database may be empty, containing no records or contain many millions of them. Similarly, a single record may consist of one or thousands of fields. Each field in turn contains a single data item. In order to uniquely identify each record of the table, there must be some field or combination of fields, which should have only unique values.
For example: In above table more than one student may have the same Name, Class and Marks but they must have the different Roll numbers. So, we can distinguish one record from the other with the help of Roll_number column. Here, the column Roll_number that is used to uniquely identify each record of the table is called as Key Field.
Key is a set of one or more columns whose combined values are unique among all occurrences in a given table. A key is the relational means of specifying uniqueness. In patient database, a patient number could be used as a key field to uniquely identify each patient’s record in the doctors’ patient file. The patient’s name could not be a key field as there may be more than one patient with the same name. There must be at least one key field in each table. Sometimes a record may contain more than one key field. For example, the doctor’s patient’s file may contain both a patient number and a National Insurance number for each patient. Both of these are key fields.
Types of Keys
Every key which has the property of uniqueness can be distinguished as following:
• Candidate Key
• Super Key
• Primary Key
Candidate key: Candidate keys are those attributes of a relation, which have the properties of· uniqueness and irreducibly, whose explanation is given below:
Let R be a relation. By definition, the set of all attributes of R has the uniqueness property, meaning that, at any given time, no two tuples in the value of R at that time are duplicates of one another. As in the case of the STUDENT relation, for example, the subset containing just attribute Roll number has that property. These facts constitute the intuition behind the definition of candidate key.
Let K be a set of attributes of relation R. Then K is a candidate key for R .if and only if it possesses both of the following properties:
a) Uniqueness: No legal value of R ever contains two distinct tuples with the same value for K.
b) Irreducibility: No proper subset of K has the uniqueness property.
Note that every relation does have at least one candidate key. The uniqueness property of such keys is self-explanatory. Irreducibility property means tJ1atif a candidate key is a composite key (consists of more than one attribute) then no individual attribute of candidate key, which participate into it, is unique. For example, if the combination of (Name, Class) is unique, then it can be identified as the candidate key if and only if Name and Class individually are not unique. Since a null value is not guaranteed to be unique, no component of a candidate key is allowed to be null. There can be any number of candidate keys in a table.
Super Key: A super key has the uniqueness property but not necessarily the irreducibility property. A candidate key is a special case of a super key.
For example, if Roll_number is unique in relation STUDENT then, the set of attributes
(Roll_number, Name, Class) is a super key for a relation STUDENT, these set of attributes are also unique, but this combination of keys (composite key) is not having the property of irreducibility because Roll_number which is one subset of the composite key is also unique itself. Thus, this composite key is called as super key because it has the property of uniqueness but not the irreducibility.
Consider a relation of Patient in which Patient_number is unique. Then, Patient_number is a candidate key and (Patient number, Patient name) is a super key. Thus, we can say that “A superset of a candidate key is a super key.”
Primary Key: The primary key is an attribute or a set of attributes that uniquely identify a specific instance of an entity. Every entity in the data model must have a primary key whose values uniquely identify instances of the entity. Sometimes a record may contain more than one key field.
For example, the doctor’s patient’s file may contain both a patient number and a National Insurance number for each patient. Both of these are key fields. We therefore choose one of them and call it the primary key field. Primary key cannot contain any Null value because we cannot uniquely identify multiple Null values.
Properties of Primary Key
To qualify as a primary key for an entity, an attribute must have the following properties:
• Stable: The value of a primary key must not change or should not become null Through out the life of an entity. A stable primary key helps to keep the model stable.
For example: if we consider a patient record, the value for the primary key (Patientnumber) must not change with time as would happen with the age field.
• Minimal: The primary key should be composed of the minimum number of fields that ensures the occurrences are unique.
• Definitive: A value must exist for every record at creation time. Because an entity occurrence cannot be substantiated unless the primary key value also exists.
• Accessible: Anyone who wants to create, read or delete a record must be able to see the Primary key value.
Once candidate keys are identified, choose one aw’ only one, primary key for each.
Candidate keys, which are not chosen as the primary key, are known as Alternate Keys.
An example of an entity that could have several possible primary keys is Employee. Let’s assume that for each employee in an organization there are three candidate keys: Employee ID, Social Security Number, and Name. Name is the least desirable candidate. ‘While it might work for a small department where it would be unlikely that two people would have exactly the same name, it would not work for a large organization that had hundreds or thousands of employees. Moreover, there is the possibility that an employee’s name could change because of marriage. Employee ID would be a good candidate as long as each employee was assigned a unique identifier at the time of hire. Social Security would work best since every employee is required to have one before being hired.
The primary key of any table is any candidate key of that table which the database designer arbitrarily designates as “primary”. The primary ~ey may be selected for convenience, comprehension, performance, or any other reasons.
Alternate Key
The alternate keys of any table are simply those candidate keys, which are not currently selected as the primary key.
Exactly one of those candidate keys is chosen as the primary key and the remainders, if any, are then called alternate keys. An alternate key is a function of all candidate keys minus the primary key.
The primary key is unique and often programmers assign a primary key just to obtain uniqueness. If this is the only requirement of the primary key, then there is a danger that at some point someone else may eventually change the primary key (for example, to a system assigned number) and lose the original enforcement of uniqueness on what had been the primary key earlier.
Let look a table below which holds student class enrollment:
This table has a compound primary key
Here, the designer knows that each student-class pair will be unique; i.e., each student may enroll in each class only once. Several years later a new DBA decides that it is inefficient to use two columns for the primary key where one would do. He/She adds a “row id” column and makes it the primary key by loading it with a system counter.
This is fine as far as an identity for each row. But now nothing prevents a student from Enrolling in the same class multiple times. This happened because the data model did not retain a candidate key property on the two original columns when the primary key was changed. Therefore the new DBA had no direct way of knowing (other than text notes somewhere) that these two columns must still remain unique, even though they are no longer part of the primary key.
Notice here how the model could have handled this automatically, if it had captured candidate keys in the first place and then generated alternate keys as a function of those candidates not in the primary key. The original two columns remain unique even after they are no longer primary.
So, do not confuse a candidate key specification of uniqueness with the arbitrary selection of primary key.
Composite Keys
Sometimes it requires more than one attribute to uniquely identify an entity. A primary key that is made up of more than one attribute is known as a composite key.
Table below shows an example of a composite key. Each instance of the entity Work can be uniquely identified only by a composite key composed of Employee ID and Project ill,
Example of Composite Key Artificial Keys
An artificial key is one that has no meaning to the business or organization.
Artificial keys are permitted when
I) no attribute has all the primary key properties, or
2) The primary key is large and complex.
For Example: Enrollment table from business point of view has (student, c1ass) combination as primary key. But this primary key is large and complex, so DBA decides to add row_id column as primary key. Here, row_id can also be called as Artificial Key.
Foreign Keys
Foreign keys are the attributes of a table, which refers to the primary key of some another table. Foreign Keys permit only those values, which appears in the primary key of the table to which it refers or may be null. Foreign keys are used to link together two or more different tables which have some form of relationship with each other. The foreign key is a reference to the tuple of a table from which it was taken, this tuple being called the Referenced or Target tuple. The table containing the referenced tuple will be called as Target table.
Thus, a foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. A foreign key must support every relationship in the model. The matter of integrity of foreign keys is referred to as Referential Integrity.
Foreign keys values should always be matched by corresponding primary key values.
For example:
Consider the below, containing two tables (EMP, DEPT)
If we try to insert information of employee with deptno 50, then this is an invalid information, because there is no deptno 50 exists in the company(as shown in table DEPT).Then this invalid information should be prevented from insertion, which would only be possible if deptno of EMP table refer the deptno of DEPT table. It means that only those values are permitted in deptno of EMP table, which appears in the deptno attribute of the DEPT table. Thus, we can say that deptno of EMP table is the foreign key which refers the primary key deptno of DEPT table. Thus, we can insert the empno 6 with any deptno from 10, 20, 30 and 40.
Null may also be permitted in the deptno of EMP table. Here, deptno of DEPT table
IS Target attribute and DEPT is the target table.
Consider another example:
Here, college has three valid classes with class code 1, 2 and 3. The class code (foreign key) of student table refers class_code of class table.
Relational Integrity Rules
There are two important integrity rules, which are constraints or restrictions that apply to all instances of the database. The two principal rules for the operational model are known as entity integrity and referential integrity. Before we define these terms, it is necessary to understand the concept of nulls.
Null or Unknown Value
Null represents a value for an attribute that is currently unknown or is not applicable for this tuple. A null can be taken to mean the logical value ‘unknown’. It can ~ean that a value is not applicable to a particular tuple, or it could merely mean that. no value has yet been supplied. Nulls are a way to deal with incomplete or exceptional data. However a null is not the same as a zero numeric value or a text string filled with spaces; zeros and spaces are values, but a null represents the absence of a value. Therefore, nulls should be treated differently from other values.
For Example: Consider the following table
There is no entry for Rajesh’s age in the table, obviously not because he does not have one, but simply because it was unknown to the organization at the point of time the ‘snapshot’ was taken. Raja might have just joined the organization and has not yet been assigned a job; therefore there is no value for the attribute ‘job’ which can be assigned to his row in the table. In time, it is to be expected that Rajesh’s age will be determined and Raja will be assigned a job. A snapshot of the database at that time would reveal values in the currently blank places in the table.
Difference between Null and Not Applicable
In order to understand above concept Let us consider the following example:
In this table the attribute ‘date-pension fund’ has been added which indicates the date when an employee joined the company pension fund. There are blank places for this as well as the original blanks under ‘age’ and ‘job’, but there could be different reasons for the blanks in the ‘date-pension fund’ column. The blank against Rajesh might indicate simply that, as for his age, the information about him is as yet incomplete. The reason for the blank against admit on the other hand, might be that he is a part time employee and, as such, ineligible to join the fund. In the first case, the information was simply unknown, in the second case the information was irrelevant to that particular employee. Where blanks occur in a relational table simply because the relevant facts are unknown, they are referred to as nulls. Thus the above table might be more properly represented as:
The problem with this is that the date values in the ‘date-pension fund’ column and the legend ‘not applicable’ in the same column are not syntactically or semantically compatible. This leads to practical difficulties. The problem can be resolved in this particular instance simply by having a ‘nonsense’ date to indicate non-applicability as in:
But not every case is so simple and it is very easy to confuse ‘missing’ with no applicable information. The important thing to note about nulls is that they are not values. It is not possible, for example, to say that one null equals another or is greater than another and so on.
Entity Integrity rule
Entity Integrity rule states that in a base relation, value of attribute of a primary key cannot be null. Here, a base relation is a relation that corresponds to an entity in the conceptual schema. By definition, a primary key is a minimal identifier that is used to identify tuples uniquely. This means that no subset of the primary key is sufficient to provide unique identification to tuples. If we allow a null for any part of a primary key, we are implying that not all the attributes are needed to distinguish between tuples, which contradicts the definition of the primary key. For example, as branch No is the primary key of the Branch relation, we should not be able to insert a tuple into the Branch relation with a null for the branch_No attribute.
In order to understand the concept of primary key and null, consider the following of STUDENT database.Now record 3 and record 5 are not distinguishable as there Rollno is Null and it is not possible to exactly locate one of the students as he is having common name, class and marks with some other student. Record number 3 and 5 violates the rule of primary key because the Rollno is primary key, all the value of attributes are unique but their primary key entries are NULL and this violates the integrity rule 1, so these records are not allowed in RDBMS.
Referential Integrity
Referential Integrity rule states that, if a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null.
Consider the following database to understand the referential integrity rule:
Here, branch_No in the Staff relation is a foreign key targeting the branch_No attribute in the· home relation, Branch. It should not be possible to create a staff record with branch number B025, unless there is already a record for branch number B025 in the Branch relation. However, we should be able to create a new staff record with a null branch number, to cater for the situation where a new member of staff has joined the company but has not yet been assigned to particular branch office.
Enterprise Constraints
Enterprise Constraints are additional rules specified by the users or database administrators of a database. It is also possible for users to specify additional constraints that the data must satisfy. For example, in case of STUDENT database if an upper limit of 300 has been placed upon the marks attribute of STUDENT database, then the user must be able to specify it and expect the DBMS to enforce it. In this case, it should not be possible to add a new student or update a record, whose marks are greater than 300.