What is Data Mining, and how is it used in Business?

Image Source : https://www.jstor.org/stable/community.36641027

Data mining is used to search for valuable information from vast amount of data collected over time. The information may be certain patterns or relationships that exists within a data. Businesses use data mining as an important decision making tool.

Business Use cases of Data Mining

The computer manufacturer – Dell, was interested in improving the productivity of its sales workforce. Researching the social network activity of potential leads, using LinkedIn and other sites, provided a richer amount of information about the potential customers, thereby allowing Dell to develop more personalized sales pitches to their clients.This is an example of mining customer data to help identify the marketing approach for a particular customer, based on its individual profile. The result – The number of prospects that needed to be contacted was cut by 50%, leaving only the most promising prospects, leading to a near doubling of the productivity and efficiency of the sales workforce.

The Mediclaim insurance provider uses Predictive Analytics as a tool to cut down on the number of fraudulent cases. When the claim is maid, the case is immediately passed through real-time predictive modeling tool to detect any anomalies. If any anomaly is observed, the Mediclaim case is re-directed to team of specialized analyst for close scrutiny before been rejected or accepted as special case. The result – Potential saving of thousand of dollars due to avoiding paying fraudulent claims.

With data mining, a retail store may find that certain products are sold more in one channel of distribution than in the others; certain products are sold together; certain products are sold more in one geographical location than in others; and certain products are sold when a certain event occurs. Wal-Mart, for example, has found that the sales of beer increase when a hurricane is imminent. This means that they have to hold more than the usual supply of beer when a hurricane is expected.

Some more use cases – With data mining, a financial analyst would like to know the characteristics of a company becoming insolvent; human resource managers would like to know the characteristics of a successful prospective employee; credit card departments would like to know which potential customers are more likely to pay back the debt and when a credit card is swiped, which transaction is fraudulent and which one is legitimate.

Data mining is the process of discovering useful patterns and trends in large data sets.

Predictive Analytics is the process of extracting information from large data set in order to make predictions and estimates about future outcomes.

What tasks can Data Mining accomplish?

Common Data Mining Tasks are:

Description
Prediction
Clustering
Classification
Association
Estimation

Description:

In Description, Analysts are interested in identifying trends and relationships using current and historical data. It’s sometimes called the Exploratory data analysis because it describes trends and relationships but doesn’t dig deeper.

Estimation:

In estimation, we use statistical tool called Regression Analysis to approximately estimate the value of dependent variable using a set of independent variables. Regression analysis uses least squares criterion to fit the line that best approximates the relationship between two variables.

Regression Line (Image source: Data Mining and Predictive Analytics, Page 12)

The scatter plot shows the Graduate GPA against undergraduate GPA for 1000 students. The blue line is regression line that best approximates the relationship between Graduate GPA and undergraduate GPA. Here the equation of regression line is: $\hat{y} = 1.24 + 0.67x$ . This tells us that estimated graduate GPA $\hat{y}$ is 1.24 plus 0.67 times student’s graduate GPA. This regression equation can be used to find out the approximate estimation of student’s graduate GPA for given value of its undergraduate GPA.

Classification:

Classification Tools are most commonly used in data mining. Classification tools are useful to distinguish different classes of objects or actions. For instance, an advertiser may want to know which aspect of its promotion is most appealing to consumers. Is it price, quality, and/or reliability of a product? Maybe it is a special feature that is missing on competitive products. The classification tools help give such information on all the products, making possible to use the advertising budget in a most effective manner.

Prediction:

Prediction is similar to classification and Estimation, except that for prediction results lie in the future. Examples of prediction tasks include:

Predicting the price of stock 3 weeks in future
Predicting the price of raw materials
Predicting whether customer will continue or discontinue the subscription service
Machine breakdown prediction

Any of the methods used for classification or estimation can also be used for prediction.

Clustering

Clustering refers to grouping of records, observations, or cases based on some similar features. A cluster is collection of records that are similar to one another, and dis-similar to records in other clusters. In classification, the grouping are pre decided, for e.g. good customer/bad customer. However, in clustering there are no pre decided groups. Instead, clustering algorithm tries to segment the entire data set into subgroups or clusters, where similarity of records within subgroups is maximized.

The most common use for clustering tools is probably in what economists refer to as “market segmentation.” The clustering algorithm, dependent upon characteristics such as income, wealth, geographic location, lifestyle, and so on, segments the customers. Each segment is then treated with a different marketing approach, one suited precisely to that particular segment. Please note here, that the Data Analyst won’t decide a priori the type of customer segments, this is something evolved by clustering algorithm based on patterns in the data.

Association:

The association task helps in finding which attributes “go together”. This is also called as Market Basket analysis or affinity analysis. The examples of association tasks in business include:

finding out which items in supermarket are purchased together, and which items are never purchased together.
what products certain groups of people purchase
Netflix recommends movies based on movie genre people have watched and rated in the past

Summary:

The premise of data mining is that there is lot more information locked up in the data, just as diamonds are locked up in the diamond mines. It is up to data analyst to unlock it using various data mining tools that we saw earlier.

We also referred to various Business use cases where Data Mining has been extensively used and resulted into saving of billions of dollars or doubling the efficiency of business processes.