The database is the location to store the data. Firstly we used excel sheet to store the data. However, it has some problem with data integrity and redundancy problem. This problem solved by E.F. Codd proposed a model in 1970. This model was straightforward. It used the concept of mathematical relation which looks like a table.
Definition: A relational database is a collection of relations which show the data in the form of row and column in a table. Relational Database Management System (RDBMS) handles the way data stored, maintained and accessed by the user. Relational database model removes all the redundancy problem off the table,and unique key concept helps to access and store unique values in a table. Relational database operations satisfy the ACID ( Atomicity , Consistency , Isolation , Durability ) properties to make the system reliable and secure. To create, alter and access the data from database Structure Query Language (SQL) is used. SQL is easy to use and simple to learn.
We’ll be covering the following topics in this tutorial:
Relational Model Terminology
In relational model data stored in cells of a table having a unique key to identify or access the data without any redundancy and duplicity. In a relational database, a row is called a tuple and a column header is known as an attribute like ”student_name,” ”Class” and the table is called a relation. Domains specify the specific values for an attribute. It specifies that attribute has aninteger value, char value, etc. Degrees of a relation represents the number of the attribute (columns) in a relation. NULL is used to represent not applied data values of the tuple in a table. NULL is a particular value of tuple.For example in Student table entry, if he /she is living with guardian than parents attribute has a NULL value or if he/she is living with parents than guardian attribute has a NULL value. NULL and zero ‘0’ both are different from each other.
Data Integrity Rules
Data Integrity rules assure that data is consistent and unique in a table. It integrates the whole data and makes it complete. To bring consistency and assure the security of data many integrity checks applied on the table. There are four data integrity checks.
Domain Integrity: It specifies that attribute must contain the data item according to defined data type, the range of an attribute during relation creation. It also checks for the Null value according to integrity check.
Entity Integrity: Rules state that primary key should be applied to a column or sets of the column to store unique values in a tuple. Each primary key column contains a value, but not contain a NULL value.
Referential Integrity: Referential integrity concern with the concept foreign key. A foreign key is used to create a connection between two tables. Parent table (Referenced table) has a primary key, and foreign key (referencing table) is used in the second table to create a connection with parent table.
1) We cannot delete any row from the primary table (Referenced table) if the same row is available in referencing table. First, delete from referencing table and then from the parent table.
2) Similarly, A row cannot be added first in referencing table if the entry was not in the reference table.
User-defined Integrity: User-defined Integrity check user defined some specific business does not consider under any other integrity categories triggers, procedures.
Keys: The key is an essential part of the table. Keys set of one attribute or more attributes required to access unique data in the table.
Suppose we have a relation Student having four column R(A, B, C, D). The column has named, but rows are not having. So to access any data value in a row, we have to consider similar attribute naming + rows cell value.
A | B | C | D |
1 | c | u | abc |
2 | c | v | xyz |
3 | b | v | mno |
If I want to access column B is c value than there is a confliction because of replicated values in column B., So we have to use column ‘A’ rows value to access corresponding data values.
Super Key: Set of one and more attributes to identify unique data and access all attributes in a relation.
Candidate key: If a proper subset of the super key is not a super-key than it is called candidate key. All candidate keys can access all tuples uniquely.
R(A,B,C,D) has functional dependency
Super Key | Candidate Key | Primary Key | |
A->BCD | Yes | Yes | Yes |
B->ACD | Yes | Yes | Yes |
AB->CD | Yes | No | No |
ABC->D | Yes | No | No |
In above example, B is a proper subset of AB.B and AB both is super key. So according to the definition of a candidate key, If a proper subset of the super key is itself a super key than it is not a candidate key.
Primary key: Candidate key and primary key both are same. A relation can contain more than one candidate key but can’t contain more than the primary key. DBA’s responsibility to choose a primary key from candidate key. The primary key attribute does not hold a null value.
Normalization
Normalization is a process of eliminating inconsistency and redundancy, eliminating column that is not dependent on the key attribute of obtaining data integrity.Normalization removes insertion, deletion and updating anomaly. Codd proposed it. Normalization categorized in five steps. Each step of normalization brings atomicity after following one by one. Mostly First Normal form, Second Normal form, and Third normal form are enough to use to make relation anomalies free.
To understand normalization first, we discuss what Partial function dependency and fully Functional Dependency is.
Partial Function dependency: A non-prime attribute is dependent on the part of the primary key is known as partial function dependency.
First Normal form: This rule state that every cell in relation contains an atomic value. In other words, a cell does not contain two values.
Student_Name | Student_Class |
Rohit, Suman | 7th |
To make it compatible with the First Normalization, we should keep only one value in the cell.
Student_Name | Student_Class |
Rohit | 7th |
Suman | 7th |
Second Normal form: Second normal rules state that a relation must be in first normal form and all non-prime attribute(an attribute that is not part of candidate key attribute)should be fully dependent on a prime attribute( attribute part of candidate key attribute).In second normal form, the relation should not have any partial function dependency.
For example: A relation R has four attribute R(A,B,C,D) having functional dependencies FD:{AB->D,B->C}.
(AB) +-> ABCD.
AB is essential to find the value of D.
AB is candidate key.
C, D non-prime attribute
D is dependent on primary key. C is dependent on the part of primary key. So, it creates partial function dependency. Now to overcome this partial dependency we have one solution. We will decompose relation R(A,B,C,D) in two relations.R1(A,B,D) and R2(B,C).Now B act as a primary key in R2 and C is fully dependent on B.
Third Normal form: Second normal form allow transitive dependency.To be in Third Normal form, a relation should not have any transitive dependency. Transitive functional dependency is a functional dependency in which a non-prime attribute is dependent on another non-prime attribute. Let us consider a simple example to understand it.
Suppose relation R (A,B,C,D) having functional dependencies FD:{AB->C,C->D}
Here AB is prime attribute
C, D non-prime attribute
We can easily access the unique value of C through AB key. However, the actual problem we are facing is that we cannot access the value of D through attribute C. Because both are a non-prime attribute. The non-prime attribute can hold a NULL value, and through NULL value we cannot fetch a unique data value. To solve this problem, we decompose R(A, B, C, D) into two tables. Relation R1 hold three attribute(A,B,D) and R2 hold two attribute(B,C). Now B convert from non-prime attribute to prime attribute.
BCNF (Boyce-Codd normal form): A relation is in BCNF if it is in third normal form. During First to third form normalization process, we remove partial and transitive functional dependencies. However, still, we deal with non-trivial functional dependency. In non-trivial function dependency prime attribute dependent on the prime attribute. To be in BCNF form, X->Y where X should be super key and Y should be candidate key.
Structure Query Language (SQL ): SQL is a language used to access the data in relational database. SQL is a complete set of commands for creating a Relational database, selecting data, modify data in the database and provide permission to users. SQL language divided into different part according to use.
DDL (Data Definition Language): Language used to create and manipulate data structure. Create, Alter, Drop, Truncate are commands of DDL.
DML (Data Manipulation Language): Used to access or manipulate the data in relation. Insert, Select, Update, and Delete all these are DML commands.
DCL (Data Control Language) : DCL used to control the access to the database. This command is used to provide and revoke the access permission from a user. Grant and revoke command is used to give and deny the permission on the database.
TCL (Transaction Control Language): This language is used to manage the changes by DML.We can rollback data after identifying a save point. Commands are committed, Save point, Rollback, and Set Transaction.
Some commands are given below :
Create: Create command is used to create a relation in the database.
Create table table_name (column1_name (data_type), column2_name(data_type));
Drop: Drop command is used to delete the structure of the table from database completely. We can also drop single or multiple columns from relation using alter command. After dropping a table, we have to recreate it.
Truncate: Truncate command is used to delete the entire data entry from the table. It does not delete the structure.
Select: Select command is used to fetch the data from the relation.We can fetch entire table as well as some rows. We can also fetch the particular row and column data.
Insert: Insert command used for input in relation. We can use this command differently.
Insert into Table_name(column1_name(data_type),column2_name(data_type)) values (column1_datavalues,column2_datavalues); Or
Insert into Table_name values (column1_datavalues,column2_datavalues);
Update: Some time we need to modify some data values in relation .For this purpose we used Update command. We can modify single or multiple records in a table. ’Where’ is used with update command to find the particular data in table.