Data Mining

Data mining is a process of extraction and analysis of new data, usually from large data sets. It is used to predict valuable information, such as future developments and behaviors, which can be of help to companies and businessesí decisions on prices and costs. Data mining can make companiesí recorded data more meaningful by identifying patterns and correlations, which can provide the companies with important information more previously useless data. It is able to provide information through the technique of modeling; data on multiple situations must be input, which include known answers, where the data mining software must then analyze and recognize features that are shared. Afterwards, related situations with unknown answers can be input to the model and using recorded features figured out from previous situations, the model would be able to produce a predicted answer.

Techniques used in data mining are: Statistics, Neighborhoods and clustering. Statistics have all always been used even before data mining; they are a part of mathematics that is for collecting and portraying data. Neighborhood, also called nearest neighbor [1], basically predicts a prediction value by finding comparable instances with prediction values and predicting something that is close to that. Clustering is grouping records with each other, as a form of organization.

Data mining is still an evolving field, with a relatively new name, that is radically improving its accuracy of analysis, with the help of rapidly advancing technologies in other computing fields. It is now being used very extensively and is a mainstream field used in business. Despite the huge improvement and development, it is still not perfectly accurately predicting models, so there is still a chance for it to cause problems.

  1. When will the rapid rate of development in data mining come to a stop?
  2. What are the main issues/problems still found in data mining today?
  3. How accurate will data mining be 15 years from now?


http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm
http://www.thearling.com/text/dmwhite/dmwhite.htm
http://www.theregister.co.uk/2006/06/14/data_mining_future/
http://abbottanalytics.blogspot.com/2007/04/future-data-mining-trends.html
http://www.thearling.com/text/dmtechniques/dmtechniques.htm [1]

Revised at the ARC:

Data mining is a process of extraction and analysis of new data, usually from large data sets. It is used to predict valuable information, such as future developments and behaviors. This can be of help to companies and businessesí decisions on prices and costs. Data mining can make companiesí recorded data more meaningful by identifying patterns and correlations, which can provide the companies with important information more previously useless data. Data Mining is able to provide information through the technique of modeling; data on multiple situations must be input, which includes known answers, where the data mining software must then analyze and recognize features that are shared. Afterwards, related situations with unknown answers can be input to the model and, through using recorded features figured out from previous situations, the model would be able to produce a predicted answer.

Techniques used in data mining are: Statistics, Neighborhoods and Clustering. Statistics have all always been used even before data mining; they are a part of mathematics that is involves collecting and portraying data. Neighborhood, also called nearest neighbor [1], predicts a prediction value by finding comparable instances with prediction values and predicting something that is close. Clustering is grouping records with each other, as a form of organization.

Data mining is still an evolving field, with a relatively new name, that is radically improving its accuracy of analysis, with the help of rapidly advancing technologies in other computing fields. It is now being used very extensively and is a mainstream field used in business. Despite the huge improvement and development, it is still not perfectly accurately predicting models. Therefore there is still a chance for it to cause problems.