The main objective of this small article is to introduce data mining, why this is a key concept in database systems, types of data mining, patterns in data mining and at the end, a conclusion.
Data mining technique basically discovery patterns to identify in collections of data (Brookshear, 2012). Basically it is used in data base systems to identify patterns in relations of the domain and the tables / data stored. It is simply defined by Zaki and Meira Jr., as “Data mining is the process of discovering insightful, interesting, and novel patterns, as well as descriptive, understandable, and predictive model s from large-scale data”.
In order to achieve data mining goals, static data collections (copies of databases and data) are used and called data warehouses, which does not operate with constant updates a.d to record new information such as in regular databases. Data warehouses are simply snapshots of collections of databases or a single database (Brookshear, 2012).
According to Han “data mining has been treated as a popular synonym to knowledge discovery in databases, although some researchers view data mining as an essential step in knowledge discover. In general a knowledge discovery process consists of an interactive sequence of the following steps:
Data cleaning: which handles noisy, erroneous, missing or irrelevant data;
Data integration: where multiple, heterogeneous data sources may be integrated into one;
Data selection, where data relevant to the analysis task are retrieved from the database;
Data mining, which is an essential process were intelligent methods are applied in order to extract data and patterns;
Pattern evaluation: Which is to identify the truly interesting patterns representing knowledge based on some interestigness measures;
Knowledge presentation: Where visualization and knowledge representation techniques are used to the mined knowledge to the user;”
FORMS OF DATA MINING
Forms of data mining are described by Brookshear are defined by Han, as:
Class description: provides a concise and succinct summarization of a collection of data and distinguish from others.
Association: Association is the discovery of association relationships and correlations among a set of items.
Classification: Classification analyzes a set of training data (i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data.
Prediction: This mining function predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects.
Clustering analysis is to identify clusters embedded in the data, where a cluster is a collection of data objects that are “similar” to one another.
Time-series analysis: Time-series analysis is to analyze large of time series data to find certain regularities and interesting characteristics.
With large amounts of data and the wide spread of computers, internet and communications will always make technologies and techniques to support data mining key topics in computer science. The use of data mining to improve business operations and to extract data in such way which may help our world to be a better place can be considered a very important topic in the last decade and during today.
J. Han. 2016. Data Mining. [ONLINE] Available at: http://web.engr.illinois.edu/~hanj/pdf/ency99.pdf. [Accessed 13 February 2016].
Brooshear, J. G., 2012. Computer Science: An Overview. 11th ed. USA: Addison-Wesley.
Zaki, M. J., 2014. Data Mining and Analysis. 1st ed. New York: Cambridge Press.