When I speak to beginner data analysts, I could hear few miss-beliefs about data mining tools. I am writing this article to debunk some data mining fallacies. Here are the list of six Data Mining fallacies that I want to clarify:
Fallacy 1: Data Mining tools are automated tools that can be deployed on the data repositories to find answers to our problems.
Reality: There are no automatic data mining tools, which will automatically solve your problems. Rather data mining is a process, and there are methodologies available such as CRISP-DM, which streamlines the data mining process into the overall business plan of action.
Fallacy 2: The data mining tool is autonomous requiring no human intervention
Reality: Without skilled human intervention (such as Subject Matter Experts and Data Scientists), blind use of data mining software will only provide you with the wrong answer to the wrong question applied to wrong data type. Further, wrong analysis is worst than no analysis, because wrong analysis often lead to wrong decisions and potential expensive failures. Even after the model is deployed, a human intervention is required to update the model constantly in the light of new information. Continuous model health check-up and other quality parameters must be assessed, by human analysts.
Fallacy 3: Data Mining software packages are intuitive and easy to use
Reality: Regardless of the software you purchase for data mining task, you can not simply sit back after installing it and watching it solve all your data problems. Data comes in various formats, algorithms require data to be presented in specific format, this results into substantial data pre-processing. A data mining software (such as R or Python) comes with hundreds of packages that you can use to carry out the analysis. Thus, the use of these software is based on the level of complexity of data and good understanding of these libraries to carry out specific task.
Fallacy 4: Data mining will identify the cause of business problem
Reality: The data mining process will help you uncover hidden patterns in the data. Again, it is upto business manager to make a rightful use of the analysis to identify the root cause of the problem.
Fallacy 5: Data mining tools will automatically clean up the messy data.
Reality: As a preliminary phase in data mining process, data cleaning often deals with data that has not been examined in years. Therefor data cleaning step may require intervention of subject matter expert to deal with missing values, to filter the variables of interest, and to save the subsection of big data relevant for the analysis.
Fallacy 6: Data mining always provides positive results
Reality: When you mine the data for actionable knowledge, there is no guarantee of the positive results. If the business problem is not properly understood, the wrong data is used for mining, or the right stakeholders who understand the models are not involved in analysis, the entire data mining activity could lead to catastrophic results.
These fallacies are explained based on my discussion with various business analysts. These views are personal.