Today, multinational companies and large organizations have operations in many places within their origin country and other parts of the world. Each place of operations may generate large volume of data. For example, insurance companies may have data from thousands of local and external branches large retail chains have data from hundreds or thousands of stores and so on. Corporate decision maker require access of information from all such sources.
Therefore, the success of business depends on the effectively use of collective knowledge of the organization. But it is not so simple because it is not easy to understand and use this huge volume of data as illustrated in figure.
Data warehousing systems have emerged as one of the principal technological approaches to the development of newer, leaner, meaner and more profitable corporate organizations.
The original concept of a data warehouse was devised by IBM as the ‘information warehouse’ and presented as a solution for accessing data held in non-relational systems. The information warehouse was proposed to allow organizations to use their data archive to help them gain a business advantage. The concept of data warehousing is successfully presented by Bill Inmon, who is earned the title of ‘father of data warehousing’.
We’ll be covering the following topics in this tutorial:
Data Warehousing
Data warehouse is defined as “A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”
In this definition the data is:
• Subject-oriented as the warehouse is organized around the major subjects of the enterprise (such as customers, products, and sales) rather than major application areas (such as customer invoicing, stock control, and product sales). Date warehouse is designed to support decision making rather than application oriented data.
• Integrated because of the coming together of source data from different enterprise-wide applications systems. The source data is often inconsistent using, for example, different formats. The integrated data source must be made consistent to present a unified view of the data to the users.
• Time-variant because data in the warehouse is only accurate and valid at some point in· time or over some time interval.
• Non-volatile as the data is not updated in real time but is refreshed from on a regular basis from different data sources. New data is always added as a supplement to the database, rather than a replacement. The database continually absorbs this new data, incrementally integrating it with the previous data.
Benefits of Data Warehousing
The successful implementation of a data warehouse can bring major, benefits to an organization including:
• Potential high returns on investment
Implementation of data warehousing by an organization requires a huge investment typically from Rs 10 lack to 50 lacks. However, a study by the International Data Corporation (IDC) in 1996 reported that average three-year returns on investment (RO I) in data warehousing reached 401%.
• Competitive advantage
The huge returns on investment for those companies that have successfully implemented a data warehouse is evidence of the enormous competitive advantage that accompanies this technology. The competitive advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and untapped information on, for example, customers, trends, and demands.
• Increased productivity of corporate decision-makers
Data warehousing improves the productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data. It integrates data from multiple incompatible systems into a form that provides one consistent view of the organization. By transforming data into meaningful information, a data warehouse allows business managers to perform more substantive, accurate, and consistent analysis.
• More cost-effective decision-making
Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels.
• Better enterprise intelligence.
It helps to provide better enterprise intelligence.
• Enhanced customer service.
• It is used to enhance customer” service.
The need of data warehouse is illustrated in figure.
Problems of Data Warehousing
The problems associated with developing and managing a data warehousing are as follows:
Underestimation of resources of data loading
Some times we underestimate the time required to extract, clean, and load the data into the warehouse. It may take the significant proportion of the total development time, although some tools are there which are used to reduce the time and effort spent on this process.
Hidden problems with source systems
Some times hidden .problems associated with the source systems feeding the data warehouse may be identified after years of being undetected. For example, when entering the details of a new property, certain fields may allow nulls which may result in staff entering incomplete property data, even when available and applicable.
Required data not captured
In some cases the required data is not captured by the source systems which may be very important for the data warehouse purpose. For example the date of registration for the property may be not used in source system but it may be very important analysis purpose.
Increased end-user demands
After satisfying some of end-users queries, requests for support from staff may increase rather than decrease. This is caused by an increasing awareness of the users on the capabilities and value of the data warehouse. Another reason for increasing demands is that once a data warehouse is online, it is often the case that the number of users and queries increase together with requests for answers to more and more complex queries.
Data homogenization
The concept of data warehouse deals with similarity of data formats between different data sources. Thus, results in to lose of some important value of the data.
High demand for resources
The data warehouse requires large amounts of data.
Data ownership
Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that owned by one department has to be loaded in data warehouse for decision making purpose. But some time it results in to reluctance of that department because it may hesitate to share it with others.
High maintenance
Data warehouses are high maintenance systems. Any reorganization· of the business processes and the source systems may affect the data warehouse and it results high maintenance cost.
Long-duration projects
The building of a warehouse can take up to three years, which is why some organizations are reluctant in investigating in to data warehouse. Some only the historical data of a particular department is captured in the data warehouse resulting data marts. Data marts support only the requirements of a particular department and limited the functionality to that department or area only.
Complexity of integration
The most important area for the management of a data warehouse is the integration capabilities. An organization must spend a significant amount of time determining how well the various different data warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task, as there are a number of tools for every operation of the data warehouse.