More precisely, the relational model is concerned with three components: Data Structure,Data Integrity,Data Manipulation
We’ll be covering the following topics in this tutorial:
Relational Data Structure
Relation : A relation is a table with columns and rows.
All data and relationships are represented in a two dimensional table called a relation.
A RDBMS requires only that the user perceive the database as tables. A relation consists of number of records or row-wise information and column-wise information. In other words, it represents the relation between rows and columns of a two dimensional table.
Attribute: An attribute is a named column of a relation.
In the relational model, relations are used to hold information about the objects to be represented in the database. A relation is represented as a two-dimensional table in which the rows of the table correspond to individual records and the table columns correspond to attributes. Attributes can appear in any order and the relation will still be the same relation, and therefore.
To illustrate the concept of relation, domain, attribute and tipple, consider the following tables:
For example, the Branch relation represents the information about branch offices, with columns for attributes BranchNo (the branch number), locality, city and pincode. Similarly, the Staff relation represents the information about staff, with columns for attributes StaffNo (the staff number), Name, position, Sex, DOB (date of birth), Salary, and BranchNo (the number of the branch the staff member works at). The above table shows instances of the Branch and Staff relations. As you can see from this example, a column contain values of a single attribute; for example, the BranchNo column contains only numbers of existing branch offices.
Domain: A domain is the set of allowed values for one or more attributes.
A domain defines the kind of data represented by an attribute. More particular, a domain is the set of all possible values that an attribute may validly contain. A Domain may also be defined as “A pool of values from which actual values appearing in the column are drawn.”
For Example: In college database Domain for Roll number column consists of range of valid roll numbers of students studying in the college. If a college has 5000 students, then the domain of Roll number may be from 1001 to 6000. Every table, that has a column Roll number, must refer to this domain and can contain only those values that are permitted in domain.
Each attribute in the model should be assigned domain information that includes:
• Data Type: Basic data types are integer, decimal, or character. Most databases support variants of these plus special data types for date and time.
• Length: This is the number of digits or characters in the value. For example, a value of 5 digits or 40 characters.
• Date Format: The format for date values such as dd/mm/yy or yy/mm/dd.
• Range: The range specifies the lower and upper boundaries of the values of attribute may legally have.
• Constraints: Are special resonations on allowed values. For example, the month in a date can never exceed 12 and day of a month can never exceed 31.
• Null support: Indicates whether the attribute can have null or unknown values.
• Default value (if any): The value an attribute will have if a value is not entered.
Domains are extremely powerful features of the relational model. Every attribute in a relation is defined on a domain. Domains may be distinct for each attribute, or two or more attributes may be defined on the same domain. Above tables have shown the domains for some of the attributes of the Branch and Staff relations. Note that, at any given time, typically there will be values in a domain that do not currently appear as values in the corresponding attribute.
The domain concept is important because it allows the user to define in a central place the meaning and source of values that attributes can hold. As a result, more information is available to the system when it undertakes the execution of a relational operation, and operations that are semantically incorrect can be avoided. For example, it is not sensible to compare a locality name with a telephone number, even though the domain definitions for both these attributes are character strings.
Tuple: A tuple is a row of a relation.
The elements of a relation are the rows or tuples in the table. In the Branch relation, each row contains four values, one for each attribute. Tuples can appear in any order and the relation will still be the same relation, and therefore convey the same meaning.
Extension of a Relation: The extension of a given relation is the set of tuples appearing in that relation at any given instant of time.
The extension thus varies with time. It changes as tuples are created, destroyed, and updated. In other words, an extension is the same as view of a table.
Intention of a Relation: It is the permanent part of the relation and independent of time. It corresponds to what is specified in the relational scheme.
More precisely, the intention is the combination of two things a naming structure and a set of integrity constraints.
• The naming structure consists of the relation name plus the names of the attributes .
• The integrity constraints can be subdivided into key constraints, referential constraints, and other constraints.
Degree: The degree of a relation is the number of attributes it contains.
The Branch relation as discussed earlier, has four attributes or degree four. This means each row of the table consists of four attributes, containing four values. A relation with only one attribute would have degree one and be called a unary relation. A relation with two attributes is called binary, one with three attributes is called ternary, and after that the term n-ary is usually used. The degree of a relation is a property of the intension of the relation.
Cardinality: The cardinality of a relation is the number of tuples it contains.
The number of tuples 1S called the cardinality of the relation and this changes as tuples are added or deleted. The cardinality is a property of the extension of the relation and is determined from the particular instance of the relation at any given moment.
Relational database: A collection of appropriately structured or normalized relations with distinct relation names.
A relational database consists of relations that are appropriately structured. We refer to this appropriateness as normalization .
Alternative Terminology
The terminology for the relational model can be quite confusing. We have introduced two sets of terms. In fact, a third set of terms is sometimes used: a relation may be referred to as a file, the tuples as records, and the attributes as fields. This terminology stems from the fact that, physically, the RDBMS may store each relation in a filed
Alternative terminologies for relational model terms:
Data Integrity
A data model has two other parts: a set of integrity rules, which ensure that the data is accurate and manipulative part, defining the types of operation that are allowed on the data. In order to discuss the relational integrity rules, concept of relational keys is desired.
Relational Keys
There should not be any duplicate tuple within a relation. Therefore, we should identify one or more attributes (called relational keys) that uniquely identify each tuple in a relation.
What is Key?
A database consists of tables, which consist of records, which further consist of fields.
This below figure provides an example of a typical table consisting of STUDENT details:
Here, table containing five records and each containing four fields.
Table name = Student
A row =a record
A column = a field
A cell = a data item; a particular value in a field for a particular record.
A table in a database may be empty, containing no records or contain many millions of them. Similarly, a single record may consist of one or thousands of fields. Each field in turn contains a single data item. In order to uniquely identify each record of the table, there must be some field or combination of fields, which should have only unique values.
For example: In above table more than one student may have the same Name, Class and Marks but they must have the different Roll numbers. So, we can distinguish one record from the other with the help of Roll_number column. Here, the column Roll_number that is used to uniquely identify each record of the table is called as Key Field
Key is a set of one or more columns whose combined values are unique among all occurrences in a given table. A key is the relational means of specifying uniqueness. In patient database, a patient number could be used as a key field to uniquely identify each patient’s record in the doctors’ patient file. The patient’s name could not be a key field as there may be more than one patient with the same name. There must be at least one key field in each table. Sometimes a record may contain more than one key field. For example, the doctor’s patient’s file may contain both a patient number and a National Insurance number for each patient. Both of these are key fields.
Types of Keys
Every key which has the property of uniqueness can be distinguished as following:
• Candidate Key
• Super Key
• Primary Key
Candidate key: Candidate keys are those attributes of a relation, which have the properties of· uniqueness and irreducibility, whose explanation is given below:
Let R be a relation. By definition, the set of all attributes of R has the uniqueness property, meaning that, at any given time, no two tuples in the value of R at that time are duplicates of one another. As in the case of the STUDENT relation, for example, the subset containing just attribute Roll number has that property. These facts constitute the intuition behind the definition of candidate key.
Let K be a set of attributes of relation R. Then K is a candidate key for R .if and only if it possesses both of the following properties:
a) Uniqueness: No legal value of R ever contains two distinct tuples with the same value for K.
b) Irreducibility: No proper subset of K has the uniqueness property.
Note that every relation does have at least one candidate key. The uniqueness property of such keys is self-explanatory. Irreducibility property means tJ1atif a candidate key is a composite key (consists of more than one attribute) then no individual attribute of candidate key, which participate into it, is unique. For example, if the combination of (Name, Class) is unique, then it can be identified as the candidate key if and only if Name and Class individually are not unique. Since a null value is not guaranteed to be unique, no component of a candidate key is allowed to be null. There can be any number of candidate keys in a table.
Super Key: A super key has the uniqueness property but not necessarily the irreducibility property. A candidate key is a special case of a super key.
For example, if Roll_number is unique in relation STUDENT then, the set of attributes
(Roll_number, Name, Class) is a super key for a relation STUDENT, these set of attributes are also unique, but this combination of keys (composite key) is not having the property of irreducibility because Roll_number which is one subset of the composite key is also unique itself. Thus, this composite key is called as super key because it has the property of uniqueness but not the irreducibility.
Consider a relation of Patient in which Patient_number is unique. Then, Patient_number is a candidate key and (Patient number, Patient name) is a super key. Thus, we can say that “A superset of a candidate key is a super key.”
Primary Key: The primary key is an attribute or a set of attributes that uniquely identify a specific instance of an entity. Every entity in the data model must have a primary key whose values uniquely identify instances of the entity. Sometimes a record may contain more than one key field.
For example, the doctor’s patient’s file may contain both a patient number and a National Insurance number for each patient. Both of these are key fields. We therefore choose one of them and call it the primary key field. Primary key cannot contain any Null value because we cannot uniquely identify multiple Null values.
Properties of Primary Key
To qualify as a primary key for an entity, an attribute must have the following properties:
• Stable: The value of a primary key must not change or should not become null Through out the life of an entity. A stable primary key helps to keep the model stable.
For example: if we consider a patient record, the value for the primary key (Patientnumber) must not change with time as would happen with the age field.
• Minimal: The primary key should be composed of the minimum number of fields that ensures the occurrences are unique.
• Definitive: A value must exist for every record at creation time. Because an entity occurrence cannot be substantiated unless the primary key value also exists.
• Accessible: Anyone who wants to create, read or delete a record must be able to see the Primary key value.
Once candidate keys are identified, choose one aw’ only one, primary key for each.
Candidate keys, which are not chosen as the primary key, are known as Alternate Keys.
An example of an entity that could have several possible primary keys is Employee. Let’s assume that for each employee in an organization there are three candidate keys: Employee ID, Social Security Number, and Name. Name is the least desirable candidate. ‘While it might work for a small department where it would be unlikely that two people would have exactly the same name, it would not work for a large organization that had hundreds or thousands of employees. Moreover, there is the possibility that an employee’s name could change because of marriage. Employee ID would be a good candidate as long as each employee was assigned a unique identifier at the time of hire. Social Security would work best since every employee is required to have one before being hired.
The primary key of any table is any candidate key of that table which the database designer arbitrarily designates as “primary”. The primary key may be selected for convenience, comprehension, performance, or any other reasons.
Alternate Key
The alternate keys of any table are simply those candidate keys, which are not currently selected as the primary key.
Exactly one of those candidate keys is chosen as the primary key and the remainders, if any, are then called alternate keys. An alternate key is a function of all candidate keys minus the primary key.
The primary key is unique and often programmers assign a primary key just to obtain uniqueness. If this is the only requirement of the primary key, then there is a danger that at some point someone else may eventually change the primary key (for example, to a system assigned number) and lose the original enforcement of uniqueness on what had been the primary key earlier.
Let look a table below which holds student class enrollment:
This table has a compound primary key
Here, the designer knows that each student-class pair will be unique; i.e., each student may enroll in each class only once. Several years later a new DBA decides that it is inefficient to use two columns for the primary key where one would do. He/She adds a “row id” column and makes it the primary key by loading it with a system counter.
This is fine as far as an identity for each row. But now nothing prevents a student from Enrolling in the same class multiple times. This happened because the data model did not retain a candidate key property on the two original columns when the primary key was changed. Therefore the new DBA had no direct way of knowing (other than text notes somewhere) that these two columns must still remain unique, even though they are no longer part of the primary key.
Notice here how the model could have handled this automatically, if it had captured candidate keys in the first place and then generated alternate keys as a function of those candidates not in the primary key. The original two columns remain unique even after they are no longer primary.
So, do not confuse a candidate key specification of uniqueness with the arbitrary selection of primary key.
Composite Keys
Sometimes it requires more than one attribute to uniquely identify an entity. A primary key that is made up of more than one attribute is known as a composite key.
Table below shows an example of a composite key. Each instance of the entity Work can be uniquely identified only by a composite key composed of Employee ID and Project ill,
Example of Composite Key Artificial Keys
An artificial key is one that has no meaning to the business or organization.
Artificial keys are permitted when
I) no attribute has all the primary key properties, or
2) The primary key is large and complex.
For Example: Enrollment table from business point of view has (student, c1ass) combination as primary key. But this primary key is large and complex, so DBA decides to add row_id column as primary key. Here, row_id can also be called as Artificial Key.
Foreign Keys
Foreign keys are the attributes of a table, which refers to the primary key of some another table. Foreign Keys permit only those values, which appears in the primary key of the table to which it refers or may be null. Foreign keys are used to link together two or more different tables which have some form of relationship with each other. The foreign key is a reference to the tuple of a table from which it was taken, this tuple being called the Referenced or Target tuple. The table containing the referenced tuple will be called as Target table.
Thus, a foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. A foreign key must support every relationship in the model. The matter of integrity of foreign keys is referred to as Referential Integrity.
Foreign keys values should always be matched by corresponding primary key values.
For example:
Consider the below, containing two tables (EMP, DEPT)
If we try to insert information of employee with deptno 50, then this is an invalid information, because there is no deptno 50 exists in the company(as shown in table DEPT).Then this invalid information should be prevented from insertion, which would only be possible if deptno of EMP table refer the deptno of DEPT table. It means that only those values are permitted in deptno of EMP table, which appears in the deptno attribute of the DEPT table. Thus, we can say that deptno of EMP table is the foreign key which refers the primary key deptno of DEPT table. Thus, we can insert the empno 6 with any deptno from 10, 20, 30 and 40.
Null may also be permitted in the deptno of EMP table. Here, deptno of DEPT table
IS Target attribute and DEPT is the target table.
Consider another example:
Here, college has three valid classes with class code 1, 2 and 3. The class code (foreign key) of student table refers class_code of class table.
Relational Integrity Rules
There are two important integrity rules, which are constraints or restrictions that apply to all instances of the database. The two principal rules for the operational model are known as entity integrity and referential integrity. Before we define these terms, it is necessary to understand the concept of nulls.
Null or Unknown Value
Null represents a value for an attribute that is currently unknown or is not applicable for this tuple. A null can be taken to mean the logical value ‘unknown’. It can ~ean that a value is not applicable to a particular tuple, or it could merely mean that. no value has yet been supplied. Nulls are a way to deal with incomplete or exceptional data. However a null is not the same as a zero numeric value or a text string filled with spaces; zeros and spaces are values, but a null represents the absence of a value. Therefore, nulls should be treated differently from other values.
For Example: Consider the following table
There is no entry for Rajesh’s age in the table, obviously not because he does not have one, but simply because it was unknown to the organization at the point of time the ‘snapshot’ was taken. Raja might have just joined the organization and has not yet been assigned a job; therefore there is no value for the attribute ‘job’ which can be assigned to his row in the table. In time, it is to be expected that Rajesh’s age will be determined and Raja will be assigned a job. A snapshot of the database at that time would reveal values in the currently blank places in the table.
Difference between Null and Not Applicable
In order to understand above concept Let us consider the following example:
In this table the attribute ‘date-pension fund’ has been added which indicates the date when an employee joined the company pension fund. There are blank places for this as well as the original blanks under ‘age’ and ‘job’, but there could be different reasons for the blanks in the ‘date-pension fund’ column. The blank against Rajesh might indicate simply that, as for his age, the information about him is as yet incomplete. The reason for the blank against admit on the other hand, might be that he is a part time employee and, as such, ineligible to join the fund. In the first case, the information was simply unknown, in the second case the information was irrelevant to that particular employee. Where blanks occur in a relational table simply because the relevant facts are unknown, they are referred to as nulls. Thus the above table might be more properly represented as:
The problem with this is that the date values in the ‘date-pension fund’ column and the legend ‘not applicable’ in the same column are not syntactically or semantically compatible. This leads to practical difficulties. The problem can be resolved in this particular instance simply by having a ‘nonsense’ date to indicate non-applicability as in:
But not every case is so simple and it is very easy to confuse ‘missing’ with no applicable information. The important thing to note about nulls is that they are not values. It is not possible, for example, to say that one null equals another or is greater than another and so on.
Entity Integrity rule
Entity Integrity rule states that in a base relation, value of attribute of a primary key cannot be null. Here, a base relation is a relation that corresponds to an entity in the conceptual schema. By definition, a primary key is a minimal identifier that is used to identify tuples uniquely. This means that no subset of the primary key is sufficient to provide unique identification to tuples. If we allow a null for any part of a primary key, we are implying that not all the attributes are needed to distinguish between tuples, which contradicts the definition of the primary key. For example, as branch No is the primary key of the Branch relation, we should not be able to insert a tuple into the Branch relation with a null for the branch_No attribute.
In order to understand the concept of primary key and null, consider the following of STUDENT database.
Now record 3 and record 5 are not distinguishable as there Rollno is Null and it is not possible to exactly locate one of the students as he is having common name, class and marks with some other student. Record number 3 and 5 violates the rule of primary key because the Rollno is primary key, all the value of attributes are unique but their primary key entries are NULL and this violates the integrity rule 1, so these records are not allowed in RDBMS.
Referential Integrity
Referential Integrity rule states that, if a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null.
Consider the following database to understand the referential integrity rule:
Here, branch_No in the Staff relation is a foreign key targeting the branch_No attribute in the· home relation, Branch. It should not be possible to create a staff record with branch number B025, unless there is already a record for branch number B025 in the Branch relation. However, we should be able to create a new staff record with a null branch number, to cater for the situation where a new member of staff has joined the company but has not yet been assigned to particular branch office.
Enterprise Constraints
Enterprise Constraints are additional rules specified by the users or database administrators of a database. It is also possible for users to specify additional constraints that the data must satisfy. For example, in case of STUDENT database if an upper limit of 300 has been placed upon the marks attribute of STUDENT database, then the user must be able to specify it and expect the DBMS to enforce it. In this case, it should not be possible to add a new student or update a record, whose marks are greater than 300.
Relational Data Manipulation
The manipulative part of the relational model consists of a set of operators known collectively as the relational algebra together with relational calculus. Relational Algebra is a procedural language that can be used to tell the DBMS how to build a new relation from one or more relations in the database and the Relational Calculus is a non-procedural language that can be used to formulate the definition of a relation in terms of one or more database relations. The relation algebra provides a formal query language. Both the relational algebra and the relational calculus are formal, non-user-friendly languages. They have been used as the basis for other, higher-level Data Manipulation Languages (DMLs) for relational databases. SQL is the informal commercial query language. It provides an interface to open, close database, find records in files, navigate through the records, add new records, and change or delete existing records.