File processing systems was an early attempt to computerize the manual filing system that we are all familiar with. A file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files.
In our own home, we probably have some sort of filing system, which contains receipts, guarantees, invoices, bank statements, and such like. When we need to look something up, we go to the filing system and search through the system starting from the first entry until we find what we want. Alternatively, we may have an indexing system that helps to locate what we want more quickly. For example we may have divisions in the filing system or separate folders for different types of item that are in some way logically related.
The manual filing system works well when the number of items to be stored is small. It even works quite adequately when there are large numbers of items and we have only to store and retrieve them. However, the manual filing system breaks down when we have to cross-reference or process the information in the files. For example, a typical real estate agent’s office might have a separate file for each property for sale or rent, each potential buyer and renter, and each member of staff.
Clearly the manual system is inadequate for this’ type of work. The file based system was developed in response to the needs of industry for more efficient data access. In early processing systems, an organization’s information was stored as groups of records in separate files.
In the traditional approach, we used to store information in flat files which are maintained by the file system under the operating system’s control. Here, flat files are files containing records having no structured relationship among them. The file handling which we learn under C/C ++ is the example of file processing system. The Application programs written in C/C ++ like programming languages go through the file system to access these flat. files as shown.
We’ll be covering the following topics in this tutorial:
Characteristics of File Processing System
Here is the list of some important characteristics of file processing system:
• It is a group of files storing data of an organization.
• Each file is independent from one another.
• Each file is called a flat file.
• Each file contained and processed information for one specific function, such as accounting or inventory.
• Files are designed by using programs written in programming languages such as COBOL, C, C++.
• The physical implementation and access procedures are written into database application; therefore, physical changes resulted in intensive rework on the part of the programmer.
• As systems became more complex, file processing systems offered little flexibility, presented many limitations, and were difficult to maintain.
Limitations of the File Processing System I File-Based Approach
There are following problems associated with the File Based Approach:
1. Separated and Isolated Data: To make a decision, a user might need data from two separate files. First, the files were evaluated by analysts and programmers to determine the specific data required from each file and the relationships between the data and then applications could be written in a programming language to process and extract the needed data. Imagine the work involved if data from several files was needed.
2. Duplication of data: Often the same information is stored in more than one file. Uncontrolled duplication of data is not required for several reasons, such as:
• Duplication is wasteful. It costs time and money to enter the data more than once
• It takes up additional storage space, again with associated costs.
• Duplication can lead to loss of data integrity; in other words the data is no longer consistent. For example, consider the duplication of data between the Payroll and Personnel departments. If a member of staff moves to new house and the change of address is communicated only to Personnel and not to Payroll, the person’s pay slip will be sent to the wrong address. A more serious problem occurs if an employee is promoted with an associated increase in salary. Again, the change is notified to Personnel but the change does not filter through to Payroll. Now, the employee is receiving the wrong salary. When this error is detected, it will take time and effort to resolve. Both these examples, illustrate inconsistencies that may result from the duplication of data. As there is no automatic way for Personnel to update the data in the Payroll files, it is difficult to foresee such inconsistencies arising. Even if Payroll is notified of the changes, it is possible that the data will be entered incorrectly.
3. Data Dependence: In file processing systems, files and records were described by specific physical formats that were coded into the application program by programmers. If the format of a certain record was changed, the code in each file containing that format must be updated. Furthermore, instructions for data storage and access were written into the application’s code. Therefore, .changes in storage structure or access methods could greatly affect the processing or results of an application.
In other words, in file based approach application programs are data dependent. It means that, with the change in the physical representation (how the data is physically represented in disk) or access technique (how it is physically accessed) of data, application programs are also affected and needs modification. In other words application programs are dependent on the how the data is physically stored and accessed.
If for example, if the physical format of the master/transaction file is changed, by making he modification in the delimiter of the field or record, it necessitates that the application programs which depend on it must be modified.
Let us consider a student file, where information of students is stored in text file and each field is separated by blank space as shown below:
I Rahat 35 Thapar
Now, if the delimiter of the field changes from blank space to semicolon as shown below:
1; Rahat; 35; Thapar
Then, the application programs using this file must be modified, because now it will token the field on semicolon; but earlier it was blank space.
4. Difficulty in representing data from the user’s view: To create useful applications for the user, often data from various files must be combined. In file processing it was difficult to determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data inter dependency and data isolation, limited the flexibility of file processing systems in providing users with ad-hoc information requests
6. Incompatible file formats: As the structure of files is embedded in the application programs, the structures are dependent on the application programming language. For example, the structure of a file generated by a COBOL program may be different from the structure of a file generated by a ‘C’ program. The direct incompatibility of such files makes them difficult to process jointly.
7. Data Security. The security of data is low in file based system because, the data is maintained in the flat file(s) is easily accessible. For Example: Consider the Banking System. The Customer Transaction file has details about the total available balance of all customers. A Customer wants information about his account balance. In a file system it is difficult to give the Customer access to only his data in the· file. Thus enforcing security constraints for the entire file or for certain data items are difficult.
8. Transactional Problems. The File based system approach does not satisfy transaction properties like Atomicity, Consistency, Isolation and Durability properties commonly known as ACID properties.
For example: Suppose, in a banking system, a transaction that transfers Rs. 1000 from account A to account B with initial values’ of A and B being Rs. 5000 and Rs. 10000 respectively. If a system crash occurred after the withdrawal of Rs. 1000 from account A, but before depositing of amount in account B, it will result an inconsistent state of the system. It means that the transactions should not execute partially but wholly. This concept is known as Atomicity of a transaction (either 0% or 100% of transaction). It is difficult to achieve this property in a file based system.
9. Concurrency problems. When multiple users access the same piece of data at same interval of time then it is called as concurrency of the system. When two or more users read the data simultaneously there is ll( problem, but when they like to update a file simultaneously, it may result in a problem.
For example: =
Let us consider a scenario where in transaction T 1 a user transfers an amout1t 1000 from
Account A to B (initial value of A is 5000 and B is 8000). In mean while, another transaction T2, tries to display the sum of account A and B is also executed. If both the transaction runs in parallel it may results inconsistency as shown below:
The above schedule results inconsistency of database and it shows Rs.12,000 as sum of accounts A and B instead of Rs .13,000. The problem occurs because second concurrently running transaction T2, reads A and B at intermediate point and computes its sum, which results inconsistent value.
10. Poor data modeling of real world. The file based system is not able to represent the complex data and interfile relationships, which results poor data modeling properties.
Case Study
In case of college database, there may be the number of applications like General Office, Library, Account Office, Hostel etc. Each of these applications may maintain the following information into own private file applications in case of file management system:
It is clear from the above file systems, that there are some common data items of the student which has to be mentioned in each application, like Roll no, Name, Class, Father_Name, Address, Phone_No, Date_of_birth etc. which are stored repeatedly in file system in each application. It will cause the problem of redundancy which results in wastage of storage space and difficulty in maintenance. So, we should look for a solution to these types of redundancies. The database approach discussed below will help us to provide the solution for above problems.