State-of-the-art on clustering data streamsReport as inadecuate




State-of-the-art on clustering data streams - Download this document for free, or read online. Document in PDF available to download.

Big Data Analytics

, 1:13

Scalable, Intelligent Data Analytics and LearningScalable, Intelligent Data Analytics and Learning

Abstract

Clustering is a key data mining task. This is the problem of partitioning a set of observations into clusters such that the intra-cluster observations are similar and the inter-cluster observations are dissimilar. The traditional set-up where a static dataset is available in its entirety for random access is not applicable as we do not have the entire dataset at the launch of the learning, the data continue to arrive at a rapid rate, we can not access the data randomly, and we can make only one or at most a small number of passes on the data in order to generate the clustering results. These types of data are referred to as data streams. The data stream clustering problem requires a process capable of partitioning observations continuously while taking into account restrictions of memory and time. In the literature of data stream clustering methods, a large number of algorithms use a two-phase scheme which consists of an online component that processes data stream points and produces summary statistics, and an offline component that uses the summary data to generate the clusters. An alternative class is capable of generating the final clusters without the need of an offline phase. This paper presents a comprehensive survey of the data stream clustering methods and an overview of the most well-known streaming platforms which implement clustering.

KeywordsData stream clustering Streaming platforms State-of-the-art  Download fulltext PDF



Author: Mohammed Ghesmoune - Mustapha Lebbah - Hanene Azzag

Source: https://link.springer.com/







Related documents