DBMS A database management system is the software system that allows users to define, create and maintain a database and provides controlled access to the data.
A Database Management System (DBMS) is basically a collection of programs that enables users to store, modify, and extract information from a database as per the requirements. DBMS is an intermediate layer between programs and the data. Programs access the DBMS, which then accesses the data. There are different types of DBMS ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are main examples of database applications:
• Computerized library systems
• Automated teller machines
• Flight reservation systems
• Computerized parts inventory systems
A database management system is a piece of software that provides services for accessing a database, while maintaining all the required features of the data. Commercially available Database management systems in the market are dbase, FoxPro, IMS and Oracle, MySQL, SQL Servers and DB2 etc.
These systems allow users to create update, and extract information from their databases.
Compared to a manual filing system, the biggest advantages to a computerized database system are speed, accuracy, and’ accessibility.
We’ll be covering the following topics in this tutorial:
Who makes this database software?
There are a lot of database software manufacturers out there and a wide range of prices, sizes, speeds and functionalities. At the lower end of the scale are personal database software products like Microsoft Access, which is designed to be used by individuals or small companies relatively little data. User friendliness and ease of use are the priority rather than speed and scalability (in other words, it works well when you have 100 records but not when you have 100,000). At the higher end are full-fledged enterprise solutions, such as Oracle Enterprise Edition. These database software products can handle millions of data entries and are fast and efficient. They have ·many optimization and performance tools and generally require a Database Administrator (DBA) to look after them. Products in this range can also be very expensive.
In the middle are products like Microsoft SQL Server, which is a logical upgrade from Microsoft Access for Windows users. There are also several very good free database software products, such as MySQL and PostgreSQL. These are lacking on the user interface side, but can certainly compete on speed and scalability.
Developments and Evolution of DBMS Concept
We have already seen that the predecessor to the DBMS was the file-based system. However, there was never a time when the database approach began and the file-based system ceased. In fact, the file-based system still exists in specific areas. It has been suggested that the DBMS has its roots in the 1960s Apollo moon-landing project, which was initiated in response to USA’s President Kennedy’s objective of landing a man on the moon by the end of that decade. At that time there was no system available that would be able to handle and manage the vast amounts of information that the project would generate. As a result, North American Aviation (NAA, now Rockwell International), the prime contractor for the project, developed software known as GUAM (Generalized Update Access Method). GUAM was based on the concept that smaller components come together as parts of larger components, and so on, until the final product is assembled. This structure, which confirms to an upside down tree, is also known as a hierarchical structure.
In the mid 1960s, IBM joined NAA to develop GUAM into what is now known as IMS (Information Management System). The reason why IBM restricted IMS to the management of hierarchies of records was to allow the use of serial storage devices, most notably magnetic tape, which was a market requirement at that time. This restriction was subsequently dropped. Although one of the earliest commercial DBMS, IMS is still main hierarchical DBMS used by most large mainframe installations.
In the mid-1960s, another significant development was the emergency of IDS (Integrated Data Store) from General Electric. This work was headed by one of the early pioneers of database systems, Charles Bachmann. This development led to a new type of database system known as the network DBMS, which had a profound effect on the information systems of that generation. The network database was developed partly to address the need to represent more complex data relationships that could be modeled with hierarchical structures, and partly to impose a database standard. To help establish such standards, the Conference on Data Systems Languages (CODASYL), comprising representatives of the US government and the world of business and commerce formed a List Processing Task Force in 1965, subsequently renamed the Data Base Task Group (DBTG) in 1967. The terms of reference for the DBTG were to define standard specifications for an environment that would allow database creation and data manipulation. A draft report was issued in 1969 and the first definitive report in 1971.
Although, the report, was not formally adopted by the American National Standards Institute (ANSI), a number of systems were subsequently developed following the DBTG proposal. These systems are now known as CODASYL or DBTG systems. The CODASYL and hierarchical approaches represented the first-generation of DBMSs.
In 1970 E.F. Codd of the IBM Research Laboratory produced his highly influential paper on the relational data model. This paper was very timely and addressed the disadvantages of the former approaches. Many experimental relational DBMS’s were implemented there after, with the first commercial products appearing in the late 1970s and early 1980s. Of particular note is the System R project at IBM’s San Jose Research Laboratory in California, which was developed during the late 1970s.This project was designed to prove the practicality of the relational model by providing an implementation of its data structures and operations, and led to two major developments:
• The development ‘of a structure query language called SQL, which has since become the standard language for relational DBMS’s.
• The production of various commercial relational DBMS products during the 1980s, for example DB2 and SQL/DS from IBM and Oracle Corporation.
Now there are several hundred relational DBMSs for both mainframe and PC environments, though many are stretching the definition of the relational model. Other examples of multi-user relational DBMSs are INGRES-II from Computer Associates, and Informix Software, Inc. Examples of PC-based relational l)BMSs are Access and FoxPro from Microsoft, Paradox from Corel Corporation, InterBase and BDE from Borland, and R:Base from R:Base Technologies. Relational DBl\1Ss are referred to as second generation DBMSs
The relational model is not without its failures, and in particular its limited modeling capabilities. There has been much research since then attempting to address this problem. In 1976, Chen presented the Entity-Relationship model, which are now a widely accepted technique for database design and the basis for the methodology.
In 1979, Codd himself attempted to address some of the failures in-his original work with an extended version of the relational model called RM/T (1979) and subsequently RM/V2 (1990).The attempts to provide a data model that represents the ‘real world’ more closely have been loosely classified as semantic data modeling.
In response to the increasing complexity of database applications, two new systems have emerged: the Object Oriented DBMS (OODBMS) and the Object-Relational DBMS (ORDBMS). This evolution represents third-generation DBMSs.
Components of the Database System Environment
There are five major components in the database system environment and their interrelationship are.
• Hardware
• Software
• Data
• Users
• Procedures
1.Hardware: The hardware is the actual computer system used for keeping and accessing the database. Conventional DBMS hardware consists of secondary storage devices, usually hard disks, on which the database physically resides, together with the associated Input-Output devices, device controllers and· so forth. Databases run on a’ range of machines, from Microcomputers to large mainframes. Other hardware issues for a DBMS includes database machines, which is hardware designed specifically to support a database system.
2. Software: The software is the actual DBMS. Between the physical database itself (i.e. the data as actually stored) and the users of the system is a layer of software, usually called the Database Management System or DBMS. All requests from users for access to the database are handled by the DBMS. One general function provided by the DBMS is thus the shielding of database users from complex hardware-level detail.
The DBMS allows the users to communicate with the database. In a sense, it is the mediator between the database and the users. The DBMS controls the access and helps to maintain the consistency of the data. Utilities are usually included as part of the DBMS. Some of the most common utilities are report writers and application development.
3. Data : It is the most important component of DBMS environment from the end users point of view. As shown in observes that data acts as a bridge between the machine components and the user components. The database contains the operational data and the meta-data, the ‘data about data’.
The database should contain all the data needed by the organization. One of the major features of databases is that the actual data are separated from the programs that use the data. A database should always be designed, built and populated for a particular audience and for a specific purpose.
4. Users : There are a number of users who can access or retrieve data on demand using the applications and interfaces provided by the DBMS. Each type of user needs different software capabilities. The users of a database system can be classified in the following groups, depending on their degrees of expertise or the mode of their interactions with the DBMS. The users can be:
• Naive Users
• Online Users
• Application Programmers
• Sophisticated Users
• Data Base Administrator (DBA)
Naive Users: Naive Users are those users who need not be aware of the presence of the database system or any other system supporting their usage. Naive users are end users of the database who work through a menu driven application program, where the type and range of response is always indicated to the user.
A user of an Automatic Teller Machine (ATM) falls in this category. The user is instructed through each step of a transaction. He or she then responds by pressing a coded key or entering a numeric value. The operations that can be performed by valve users are very limited and affect only a precise portion of the database. For example, in the case of the user of the Automatic Teller Machine, user’s action affects only one or more of his/her own accounts.
Online Users : Online users are those who may communicate with the database directly via an online terminal or indirectly via a user interface and application program. These users are aware of the presence of the database system and may have acquired a certain amount of expertise with in the limited interaction permitted with a database.
Sophisticated Users : Such users interact with the system without ,writing programs.
Instead, they form their requests in database query language. Each such query is submitted to a very processor whose function is to breakdown DML statement into instructions that the storage manager understands.
Specialized Users : Such users are those ,who write specialized database application that do not fit into the fractional data-processing framework. For example: Computer-aided design systems, knowledge base and expert system, systems that store data with complex data types (for example, graphics data and audio data).
Application Programmers : Professional programmers are those who are responsible for developing application programs or user interface. The application programs could be written using general purpose programming language or the commands available to manipulate a database.
Database Administrator: The database administrator (DBA) is the person or group in charge for implementing the database system ,within an organization. The “DBA has all the system privileges allowed by the DBMS and can assign (grant) and remove (revoke) levels of access (privileges) to and from other users. DBA is also responsible for the evaluation, selection and implementation of DBMS package.
5. Procedures: Procedures refer to the instructions and rules that govern the design and use of the database. The users of the system and the staff that manage the database require documented procedures on how to use or run the system.
These may consist of instructions on how to:
• Log on to the DBMS.
• Use a particular DBMS facility or application program.
• Start and stop the DBMS.
• Make backup copies of the database.
• Handle hardware or software failures.
Change the structure of a table, reorganize the database across multiple disks, improve performance, or archive data to secondary storage.
Advantages of DBMS
The database management system has promising potential advantages, which are explained below:
1. Controlling Redundancy: In file system, each application has its own private files, which cannot be shared between multiple applications. 1:his can often lead to considerable redundancy in the stored data, which results in wastage of storage space. By having centralized database most of this can be avoided. It is not possible that all redundancy should be eliminated. Sometimes there are sound business and technical reasons for· maintaining multiple copies of the same data. In a database system, however this redundancy can be controlled.
For example: In case of college database, there may be the number of applications like General Office, Library, Account Office, Hostel etc. Each of these applications may maintain the following information into own private file applications:
It is clear from the above file systems, that there is some common data of the student which has to be mentioned in each application, like Rollno, Name, Class, Phone_No~ Address etc. This will cause the problem of redundancy which results in wastage of storage space and difficult to maintain, but in case of centralized database, data can be shared by number of applications and the whole college can maintain its computerized data with the following database:
It is clear in the above database that Rollno, Name, Class, Father_Name, Address,
Phone_No, Date_of_birth which are stored repeatedly in file system in each application, need not be stored repeatedly in case of database, because every other application can access this information by joining of relations on the basis of common column i.e. Rollno. Suppose any user of Library system need the Name, Address of any particular student and by joining of Library and General Office relations on the basis of column Rollno he/she can easily retrieve this information.
Thus, we can say that centralized system of DBMS reduces the redundancy of data to great extent but cannot eliminate the redundancy because RollNo is still repeated in all the relations.
2. Integrity can be enforced: Integrity of data means that data in database is always accurate, such that incorrect information cannot be stored in database. In order to maintain the integrity of data, some integrity constraints are enforced on the database. A DBMS should provide capabilities for defining and enforcing the constraints.
For Example: Let us consider the case of college database and suppose that college having only BTech, MTech, MSc, BCA, BBA and BCOM classes. But if a \.,ser enters the class MCA, then this incorrect information must not be stored in database and must be prompted that this is an invalid data entry. In order to enforce this, the integrity constraint must be applied to the class attribute of the student entity. But, in case of file system tins constraint must be enforced on all the application separately (because all applications have a class field).
In case of DBMS, this integrity constraint is applied only once on the class field of the
General Office (because class field appears only once in the whole database), and all other applications will get the class information about the student from the General Office table so the integrity constraint is applied to the whole database. So, we can conclude that integrity constraint can be easily enforced in centralized DBMS system as compared to file system.
3. Inconsistency can be avoided : When the same data is duplicated and changes are made at one site, which is not propagated to the other site, it gives rise to inconsistency and the two entries regarding the same data will not agree. At such times the data is said to be inconsistent. So, if the redundancy is removed chances of having inconsistent data is also removed.
Let us again, consider the college system and suppose that in case of General_Office file
it is indicated that Roll_Number 5 lives in Amritsar but in library file it is indicated that
Roll_Number 5 lives in Jalandhar. Then, this is a state at which tIle two entries of the same object do not agree with each other (that is one is updated and other is not). At such time the database is said to be inconsistent.
An inconsistent database is capable of supplying incorrect or conflicting information. So there should be no inconsistency in database. It can be clearly shown that inconsistency can be avoided in centralized system very well as compared to file system ..
Let us consider again, the example of college system and suppose that RollNo 5 is .shifted from Amritsar to Jalandhar, then address information of Roll Number 5 must be updated, whenever Roll number and address occurs in the system. In case of file system, the information must be updated separately in each application, but if we make updation only at three places and forget to make updation at fourth application, then the whole system show the inconsistent results about Roll Number 5.
In case of DBMS, Roll number and address occurs together only single time in General_Office table. So, it needs single updation and then an other application retrieve the address information from General_Office which is updated so, all application will get the current and latest information by providing single update operation and this single update operation is propagated to the whole database or all other application automatically, this property is called as Propagation of Update.
We can say the redundancy of data greatly affect the consistency of data. If redundancy is less, it is easy to implement consistency of data. Thus, DBMS system can avoid inconsistency to great extent.
4. Data can be shared: As explained earlier, the data about Name, Class, Father __name etc. of General_Office is shared by multiple applications in centralized DBMS as compared to file system so now applications can be developed to operate against the same stored data. The applications may be developed without having to create any new stored files.
5. Standards can be enforced : Since DBMS is a central system, so standard can be enforced easily may be at Company level, Department level, National level or International level. The standardized data is very helpful during migration or interchanging of data. The file system is an independent system so standard cannot be easily enforced on multiple independent applications.
6. Restricting unauthorized access: When multiple users share a database, it is likely that some users will not be authorized to access all information in the database. For example, account office data is often considered confidential, and hence only authorized persons are allowed to access such data. In addition, some users may be permitted only to retrieve data, whereas other are allowed both to retrieve and to update. Hence, the type of access operation retrieval or update must also be controlled. Typically, users or user groups are given account numbers protected by passwords, which they can use to gain access to the database. A DBMS should provide a security and authorization subsystem, which the DBA uses to create accounts and to specify account restrictions. The DBMS should then enforce these restrictions automatically.
7. Solving Enterprise Requirement than Individual Requirement: Since many types of users with varying level of technical knowledge use a database, a DBMS should provide a variety of user interface. The overall requirements of the enterprise are more important than the individual user requirements. So, the DBA can structure the database system to provide an overall service that is “best for the enterprise”.
For example: A representation can be chosen for the data in storage that gives fast access for the most important application at the cost of poor performance in some other application. But, the file system favors the individual requirements than the enterprise requirements
8. Providing Backup and Recovery: A DBMS must provide facilities for recovering from hardware or software failures. The backup and recovery subsystem of the DBMS is responsible for recovery. For example, if the computer system fails in the middle of a complex update program, the recovery subsystem is responsible for making sure that the .database is restored to the state it was in before the program started executing.
9. Cost of developing and maintaining system is lower: It is much easier to respond to unanticipated requests when data is centralized in a database than when it is stored in a conventional file system. Although the initial cost of setting up of a database can be large, but the cost of developing and maintaining application programs to be far lower than for similar service using conventional systems. The productivity of programmers can be higher in using non-procedural languages that have been developed with DBMS than using procedural languages.
10. Data Model can be developed : The centralized system is able to represent the complex data and interfile relationships, which results better data modeling properties. The data madding properties of relational model is based on Entity and their Relationship, which is discussed in detail in chapter 4 of the book.
11. Concurrency Control : DBMS systems provide mechanisms to provide concurrent access of data to multiple users.
Disadvantages of DBMS
The disadvantages of the database approach are summarized as follows:
1. Complexity : The provision of the functionality that is expected of a good DBMS makes the DBMS an extremely complex piece of software. Database designers, developers, database administrators and end-users must understand this functionality to take full advantage of it. Failure to understand the system can lead to bad design decisions, which can have serious consequences for an organization.
2. Size : The complexity and breadth of functionality makes the DBMS an extremely large piece of software, occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
3. Performance: Typically, a File Based system is written for a specific application, such as invoicing. As result, performance is generally very good. However, the DBMS is written to be more general, to cater for many applications rather than just one. The effect is that some applications may not run as fast as they used to.
4. Higher impact of a failure: The centralization of resources increases the vulnerability of the system. Since all users and applications rely on the ~vailabi1ity of the DBMS, the failure of any component can bring operations to a halt.
5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality provided. There is also the recurrent annual maintenance cost.
6. Additional Hardware costs: The disk storage requirements for the DBMS and the database may necessitate the purchase of additional storage space. Furthermore, to achieve the required performance it may be necessary to purchase a larger machine, perhaps even a machine dedicated to running the DBMS. The procurement of additional hardware results in further expenditure.
7. Cost of Conversion: In some situations, the cost oftlle DBMS and extra hardware may be insignificant compared with the cost of converting existing applications to run on the new DBMS and hardware. This cost also includes the cost of training staff to use these new systems and possibly the employment of specialist staff to help with conversion and running of the system. This cost is one of the main reasons why some organizations feel tied to their current systems and cannot switch to modern database technology.
When not to Use a DBMS
In spite of the advantages of using a DBMS, there are a few situations in which such a system may involve unnecessary overhead costs, as that would not be incurred in traditional file processing.
The overhead costs of using a DBMS are due to the following:
• High initial investment in hardware, software, and training.
• Generality that a DBMS provides for defining and processing data.
• Overhead for providing security, concurrency control, recovery, and integrity functions.
Additional problems may arise, if the database designers and DBA do not properly design the database or if the database systems applications are not implemented properly.
Hence, it may be more desirable to use regular files under the following circumstances:
• The database and applications are simple, well defined and not expected to change.
• There are tight real-time requirements for some programs that may not be met because of DBMS overhead.
• Multiple user access to data is not required.
• An application may need to manipulate the data in a way not supported by the query language.
Requirement for a DBMS
The software responsible for the management data in computers i.e. DBMS (like Oracle, Foxpro, SQL Server etc.) should meet the following requirements:
Provide data definition facilities
It should support Data Definition Language (DDL) and provides user accessible catalog Known as Data Dictionary.
Provide facilities for storing, retrieving and updating data
It should support Data Manipulation Language (DML), so that required data can be inserted, updated, deleted and retrieved.
Supports multiple view of data
The end user should have the facility of flexible query language so that required information can be accessed easily.
Provides facilities for specifying Integrity constraints
It should support the constraints like Primary key, foreign key during creation of tables so that only the valid information is stored in the database. As soon as, we try to insert any incorrect information it should display the error message.
Provide security of data
It should have the facilities for controlling access to data and prevent unauthorized access and update.
Provide concurrency control mechanism
It should allow simultaneous access and update of data by multiple users
Support Transactions
It should support all the properties of transaction known as ACID properties. It means a sequence of operations to be performed as a whole. In other words all operations are performed or none.
Provide facilities for database recovery
It should bring database back to consistent state after a failure such as disk failure, faulty program etc.
Provide facilities for database maintenance
It should support maintenance operations like unload, reload, mass insertion, deletion and validation of data.
Master and transaction file
A master file stores relatively static data. It changes occasionally and stores all the details of the object. For example, in case of banking software the customer file which contain the data about the customer like customer_id, account_no, account_type, name, address, phone_number etc. is a master file, because it contain the static data and whole information about the customer.
The other file, which contains the data about the customer transactions, is called as a Transaction file. The customer transaction file contains the data about the account_no,\ transaction_)d, date, transaction_type (e.g. deposit or withdrawal), amount, balance etc. It is dynamic file and updated each time for any withdrawal and deposit on a given account number.