A relation is said to be in First Normal Form (lNF) if and only if every entry of the relation (the intersection of a tuple and a column) has at most a single value. In other words “a relation is in First Normal Form if and only if all underlying domains contain atomic values or single value only.”
The objective of normalizing a table is to remove its repeating groups and ensure that all entries of the resulting table have at most a single value. By simply removing the repeating groups of the unorganized tables do not become relations automatically. Some further manipulations of the resulting table(s) may be necessary to ensure that they are indeed relations. Sometimes, during the process of designing a database it may be necessary to transform it into a relation, i.e. the intersection of a row and a column must have only one value.
For example, consider the STUDENT table shown next where one or more students may be assigned a common course. Notice that for each Course Name every “row” of the table has more than one value under the columns RoIIno, Name, System Used, Hourly Rate, and Total_ Hrs.
Table entries that have more than one value called multi-value entries. Tables with multi-value entries are called unnormalized tables.
Within an unnormalized table, we will call a repeating group an attribute or group of attributes that may have multivalue entries for single occurrences of the table identifier. The term refers to the attribute that allows us to distinguish the different rows of the unnormalized table. Using this terminology we can describe the STUDENT table shown above as an unnormalized table where attributes Rollno, Name, System Used, Hourly Used, and Total_ Hrs are repeating groups. This type of table cannot be considered a relation because there are entries with more than one value. To be able to represent this table as a relation and to implement it in a RDBMS it is necessary to normalize the table. In other words we need to put the table in first normal form.
In general, there are two basic approaches to normalize tables.
We’ll be covering the following topics in this tutorial:
First Approach: Flattening the table
The first approach known as “flattening the table” removes repeating groups by filling in the “missing” entries of each “incomplete row” of the table with copies of their corresponding non-repeating attributes. The following example illustrates .this.
In the STUDENT table, for each individual Course, under the RolIno Name, System Used, Hourly_Used, and Total_Hrs attributes, there is more than one value per entry. To normalize this table, we just fill in the remaining entries by copying the corresponding information from the non-repeating attributes. For instance, for the row that contains the course Visual Basic, we fill in the remaining “blank” entries by copying the values of the Course_Code, Course_Name and Teacher_Name columns. This row has now a single value in each of its entries. We have repeated a similar process for the students· of the remaining two courses. The normalized representation of the STUDENT table is:
In normalized STUDENT table the attribute Course_Code no longer identifies uniquely any row. Thus, a suitable primary key for this table is the composite key (Course_Code, Rollno).
Second Approach: Decomposition of the table
The second approach for normalizing a table requires that the table be decomposed into two new tables that will replace the original table. Decomposition of a relation involves separating the attributes of the relation to create the schemes· )f two newrelations. However, before decomposing the original table it is necessary to identify an attribute or a set of its attributes that can be used as table identifiers.
Rule of decomposition
• One of the two tables contains the table identifier of the original table and all the non-repeating attributes.
• The other table contains a copy of the table identifier and all the repeating attributes.
To transform these tables in to relations, it may be necessary to identify a PK for each table. The Tuples of the new relations are the projection of the original relation into their respective schemes. The following example illustrates this second approach for normalizing tables.
To normalize the STUDENT table we need to replace it by two new tables. The first table COURSE contains the table identifier and the non-repeating groups. These attributes are Course_Code (the table identifier), Course_Name, anc Teacher_Name.
The second table contains the table .identifier and· all the repeating groups. Therefore, the attributes of COURSE_STUDENT table are Course_Code, RolIno, Name, System Used,
Hourly Rate and Total_Hrs.
To transform the latter table into a relation, it is necessary to assign it a PK. These two new INF relations are shown above. Notice that for the COURSE_STUDENT table the composite attribute (Course_Code, RoIIno) is an appropriate PK. At this point the reader may ask “which of these two approaches is better to use. Actually both approaches are correct because they transform any unnormalized table into an INF relation. However the second approach is more efficient because the relations produced are less redundant. In addition as we will see in the next section, the single table obtained using the first approach win eventually broken into the same two tables obtained in the second approach.
Anomalies in 1NF Relations (Considering STUDENT table)
Redundancies in INF relations lead to a variety of data anomalies. Data anomalies are divided into three general categories: insertion, deletion and update anomalies.
They are named respectively after the relational operations of Insert, Delete, and Update\ because it is during the application of these operations that a relation may experience anomalies.
Insert Anomalies
We cannot insert the information about the student until he/she join any course. e.g. as
Shown in the above database we cannot store the information about the Rollno 110 until he join any course, similarly we are unable to store the information about the course until there is a student who enroll in to that course. e.g. We cannot store that C I course is of Visual Basic until at least one student join that course.
These anomalies occur because Course_Code, RolIno is the composite primary key and we cannot insert null in any of these two attributes for a record. So, in order to store a record we must know the Course_ Code and the RolIno of student who join the course
Update Anomalies
This relation is also susceptible to update anomalies because the course in which a student studies may appear many times in the table. It is this redundancy of information that causes the anomaly because if a teacher moves to another course, we are now faced with two problems; we both search the entire table looking for that teacher and update his or her Course_Code value or we miss one or more tuple of that STUDENT and end up with an inconsistent database. For small tables, this type of anomaly may not seem to be much of a problem, but it is easy to imagine situations where there may be thousands of Tuples that experience similar anomaly.
Let us consider, a situation in which we have to change the teacher for a particular course e.g. we have to update the teacher for Course_Code Cl then, we have to modify multiple records which is equal to the number of the students for that particular Course_Code e.g. Cl. This will cause the problem of inconsistency. Suppose we change the name of teacher three times and forget to change the name at one place then data become inconsistent.
Delete Anomalies
This relation experiences deletion anomalies whenever we delete the last tuple of a particular student. In this case, we not only delete the course information that connects that student to a particular course, but also lose other information about the system on which this student works.
Let us consider, the case where we have to delete the information of student having RolIno 109, then we also lose the information about Course_Code C4 i.e. now we are unable to tell that the name of course as well as the corresponding teacher for that course. Suppose, we have to delete the information of Java course we also lose the information about the student Kumar.
Practice Session
1. In the EMPLOYEE table shown below, identify the table identifier, all repeating and non repeating attributes. Flatten the table and state if the resulting table is a relation. If not, how can you make it a relation?
The table identifier is the attribute ID. The no repeating attributes are: ID, Last-name and Department. The repeating attributes are: Dependent Name, Dependent_DOB, Dependent- Sex and Dependent-ID.
To flatten the table ,we have to fill in all the entries of the table by copying the information of the corresponding non repeating attributes. The normalized table looks like this:
This ‘flat’ table ·is not a relation because it does not have a primary key. Notice, for instance, that ID (the table identifier) no longer identifies any of the first three rows of this table. To transform this table into a relation we need to identify a suitable primary key for the relation. The composite key (ID, Dependent-ID) seems to be a suitable primary key.
2. Normalize the table of the previous example by creating two new relations. The table is reproduced below for the convenience of the reader.
To normalize this table we need to create two new relations. The attributes of the first relation are the table identifier and all the non-repeating attributes. The attributes of the second table are the table identifier and all the repeating attributes. The schemes of these two relations are shown below. Observe that the attribute ID (of Employee) has been renamed in the Dependent relation.
Employee (ID, Last-name, Department)
Dependent (Emp-ID, Dependent-ID, Dependent Name, Delendent_DOB, Dependent_Sex)
The corresponding instances of these two relations are shown next. Notice that duplicate rows have been deleted to comply with the definition of a relation. PKs ale underlined.