• Skip to main content
  • Skip to primary sidebar
  • Skip to secondary sidebar
  • Skip to footer

Computer Notes

Library
    • Computer Fundamental
    • Computer Memory
    • DBMS Tutorial
    • Operating System
    • Computer Networking
    • C Programming
    • C++ Programming
    • Java Programming
    • C# Programming
    • SQL Tutorial
    • Management Tutorial
    • Computer Graphics
    • Compiler Design
    • Style Sheet
    • JavaScript Tutorial
    • Html Tutorial
    • Wordpress Tutorial
    • Python Tutorial
    • PHP Tutorial
    • JSP Tutorial
    • AngularJS Tutorial
    • Data Structures
    • E Commerce Tutorial
    • Visual Basic
    • Structs2 Tutorial
    • Digital Electronics
    • Internet Terms
    • Servlet Tutorial
    • Software Engineering
    • Interviews Questions
    • Basic Terms
    • Troubleshooting
Menu

Header Right

Home » Database » Advanced » What is Data Mining? and Explain Data Mining Techniques. Compare between Data Mining and Data Warehousing.
Next →
← Prev

What is Data Mining? and Explain Data Mining Techniques. Compare between Data Mining and Data Warehousing.

By Dinesh Thakur

The storing information in a data warehouse does not provide the benefits an organization is seeking. To realize the value of a data warehouse, it is necessary to extract the knowledge hidden within the warehouse. However, as the amount and complexity of the data in a data warehouse grows, it becomes increasingly difficult, if not impossible, for business analysts to identify trends and relationships in the data using simple query and reporting tools.

Data mining is one of the best way to extract meaningful trends and patterns from huge amounts of data. Data mining discovers .information within data warehouse that queries and reports cannot effectively reveal.

Introduction to Data Mining

The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions is know as Data Mining.

Data mining is concerned with the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. The focus of data mining is to find the information that is hidden and unexpected.

Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Although data mining is still a relatively new technology, it is already used in a number of industries. Table lists examples of applications of data mining in retail/marketing, banking, insurance, and medicine.

                                             Examples of data mining applications

Examples of data mining applications

Data Mining Techniques

There are four main operations associated with data mining techniques which include:

• Predictive modeling

• Database segmentation

• Link analysis

• Deviation detection.

Techniques are specific implementations of the· data mining operations. However, each operation has its own strengths and weaknesses. With this in mind, data mining tools sometimes offer a choice of operations to implement a technique.

Predictive Modeling

It is designed on a similar pattern of the human learning experience in using observations to form a model of the important characteristics of some task. It corresponds to the ‘real world’. It ‘is developed using a supervised learning approach, which has to phases: training and testing. Training phase is based on a large sample of historical data called a training set, while testing involves trying out the model on new, previously unseen data to determine its accuracy and physical performance characteristics.

It is commonly used in customer retention management, credit approval, cross-selling, and direct marketing. There are two techniques associated with predictive modeling. These are:

• Classification

• Value prediction

Classification

Classification is used to classify the records to form a finite set of possible class values. There are two specializations of classification: tree induction and neural induction. An example of classification using tree induction is shown in Figure.

Tree induction

In this example, we are interested in predicting whether a customer who is currently renting property is likely to be interested in buying property. A predictive model has determined that only two variables are of interest: the length· of the customer has rented property and the age of the customer. The model predicts that those customers who have rented for more than two years and are over 25 years old are the most likely to .be interested in buying property. An example of classification using neural induction is shown in Figure.

neural indutionA neural network contain collections of connected nodes with input, output, and processing at each node. Between the visible input and output layers may be a number of hidden processing layers. Each processing unit (circle) in one layer is connected to each processing unit in the next layer by a weighted value, expressing the strength of the relationship. This approach is an attempt to copy the way the human brain works· in recognizing patterns by arithmetically combining all the variables associated with a given data point.

Value prediction

It uses the traditional statistical techniques of linear regression and nonlinear regression. These techniques are easy to use and understand. Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representation of the average of all observations at that point in the plot. The problem with linear regression is that the technique only works well with linear data and is sensitive to those data values which do not conform to the expected norm. Although nonlinear regression avoids the main problems of linear regression, it is still not flexible enough to handle all possible shapes of the data plot. This is where the traditional statistical analysis methods and data mining methods begin to diverge. Applications of value prediction include credit card fraud detection and target mailing list identification.

Database Segmentation

Segmentation is a group of similar records that share a number of properties. The aim of database segmentation is to partition a database into an unknown number of segments, or clusters.

This approach uses unsupervised learning to discover homogeneous sub-populations in a database to improve the accuracy of the profiles. Applications of database segmentation include customer profiling, direct marketing, and cross-selling.

data-base segmentation

As shown in figure, using database segmentation, we identify the cluster that corresponds to legal tender and forgeries. Note that there are two clusters of forgeries, which is attributed to at least two gangs of forgers working on falsifying the banknotes.

Link Analysis

Link analysis aims to establish links, called associations, between the individual record sets of records, in a database. There are three specializations of link analysis. These are:

• Associations discovery

• Sequential pattern discovery

• Similar time sequence discovery.

Association’s discovery finds items that imply the presence of other items in the same event. There are association rules which are used to define association. For example, ‘when a customer rents property for more than two years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties’.

Sequential pattern discovery finds patterns between events such that the presence of one set of item is followed by another set of items in a database of events over a period of the. For example, this approach can be used to understand long-term customer buying behavior.

Time sequence discovery is used in the discovery of links between two sets of data that are time-dependent. For example, within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines.

Applications of link analysis include product affinity analysis, direct marketing, and stock price movement.

Deviation Detection

Deviation detection is a relatively new technique in terms of commercially available data mining tools. However, deviation detection is often a source of true discovery because it identifies outliers, which express deviation from some previously known expectation “and norm. This operation can be performed using statistics and visualization techniques.

Applications of deviation detection include fraud detection in the use of credit cards and insurance claims, quality control, and defects tracing.

visual detection of deviation

Data Mining and Data Warehousing

Data mining requires a single, separate, clean, integrated, and self-consistent source of data. A data warehouse is well equipped for providing data for mining for the following reasons:

• Data mining requires data quality and consistency of input data and data warehouse provides it.

• It is advantageous to mine data from multiple sources to discover as many interrelationships as possible. Data warehouse contain data from a number of sources.

• Query capabilities of the data warehouse helps in selecting the relevant information.

Due to integration of data mining and data warehouse many vendors are investigating number of techniques to support it.

You’ll also like:

  1. What is Data Warehouse? Benefits & Problems of Data Warehousing.
  2. Data Warehousing Architecture
  3. Comparison of OLTP Systems And Data Warehousing
  4. Explain Various DESIGN TECHNIQUES
  5. What is Structural Testing? Explain any Two Techniques used in it
Next →
← Prev
Like/Subscribe us for latest updates     

About Dinesh Thakur
Dinesh ThakurDinesh Thakur holds an B.C.A, MCDBA, MCSD certifications. Dinesh authors the hugely popular Computer Notes blog. Where he writes how-to guides around Computer fundamental , computer software, Computer programming, and web apps.

Dinesh Thakur is a Freelance Writer who helps different clients from all over the globe. Dinesh has written over 500+ blogs, 30+ eBooks, and 10000+ Posts for all types of clients.


For any type of query or something that you think is missing, please feel free to Contact us.


Primary Sidebar

DBMS

Database Management System

    • DBMS - Home
    • DBMS - Definition
    • DBMS - What is
    • DBMS - Entity Sets
    • DBMS - Components
    • DBMS - Languages
    • DBMS - Normalization
    • DBMS - Data Models
    • DBMS - Processing System
    • DBMS - Advantages
    • DBMS - ER-Model
    • DBMS - Functional Dependence
    • DBMS - Relational Model
    • DBMS - Architecture
    • DBMS - Network Model
    • DBMS - Approach
    • DBMS - Data Independence
    • DBMS - Relational Schema
    • DBMS - Instance
    • DBMS - Functions and Service
    • DBMS - Server
    • DBMS - DBA
    • DBMS - Instance & Schemas
    • DBMS - System Type
    • DBMS - DDL, DML and DCL
    • DBMS - Users
    • DBMS - Model
    • DBMS - System Structure
    • DBMS - Role of DBA
    • DBMS - Metadata
    • DBMS - ER-Diagram
    • DBMS - E-R Model Problems
    • DBMS - DBMS Vs.RDBMS
    • DBMS - Basic Construction of E-R
    • DBMS - E-R Notation
    • DBMS - Database View
    • DBMS - Concurrency Control
    • DBMS - Schema
    • DBMS - Procedure for Access
    • DBMS - Object
    • DBMS - dBase
    • DBMS - Relational Algebra
    • DBMS - Deadlock
    • DBMS - Relational Database
    • DBMS - Query
    • DBMS - Schema

DBMS Normal Forms

    • Database - CODD’S Rules
    • Database - 1NF
    • Database - 2NF
    • Database - 3NF
    • Database - 4NF
    • Database - 5NF
    • Database - BCNF

Advance Database

    • Database - File Organization
    • Database - Type Lock
    • Database - Transaction
    • Database - Key Type
    • Database - Relational Algebra
    • Database - Components
    • Database - Deadlock Detect
    • Database - Design Methodology
    • Database - Relational Operators
    • Database - Relational Calculus
    • Database - Lock Granularity
    • Database - Deadlocks Handling
    • Database - Concurrent Control
    • Database - Denormalization
    • Database - Starvation
    • Database - OODB
    • Database - Data Warehouse
    • Database - Fragmentation
    • Database - Data Replication
    • Database - Distributed
    • Database - Transparences
    • Database - ORDBMSS
    • Database - Data Mining
    • Database - Security
    • Database - DBTG
    • Database - OLAP
    • Database - Integrity
    • Database - Data Encryption
    • Database - Recover
    • Database - Data Protection

Some Other Advance Articls

  • Adv of Distributed DBMS
  • Homogeneous and Heterogeneous
  • Causes for Database Failure
  • DBMS Architecture
  • Features for Any DBMS
  • OLTP Systems Vs Data Warehousing
  • Data Warehousing Architecture

Other Links

  • DBMS - PDF Version

Footer

Basic Course

  • Computer Fundamental
  • Computer Networking
  • Operating System
  • Database System
  • Computer Graphics
  • Management System
  • Software Engineering
  • Digital Electronics
  • Electronic Commerce
  • Compiler Design
  • Troubleshooting

Programming

  • Java Programming
  • Structured Query (SQL)
  • C Programming
  • C++ Programming
  • Visual Basic
  • Data Structures
  • Struts 2
  • Java Servlet
  • C# Programming
  • Basic Terms
  • Interviews

World Wide Web

  • Internet
  • Java Script
  • HTML Language
  • Cascading Style Sheet
  • Java Server Pages
  • Wordpress
  • PHP
  • Python Tutorial
  • AngularJS
  • Troubleshooting

 About Us |  Contact Us |  FAQ

Dinesh Thakur is a Technology Columinist and founder of Computer Notes.

Copyright © 2025. All Rights Reserved.

APPLY FOR ONLINE JOB IN BIGGEST CRYPTO COMPANIES
APPLY NOW