Data Streams

Nowadays, data mining has to cope with increasing amounts of data that often arrive in a stream. This means that the data are continuously produced. The same tasks that are performed on batches of data can also be performed on data streams including classification, regression and pattern mining. Non-standard tasks include Density Estimation and Process Mining.

The main difficulty in handling data streams lies in the speed of the arriving data and therefore the large amount of data that has to be processed. Saving all data and processing it later is infeasible when dealing with such large amounts. Therefore, techniques have to be developed that process the data one by one or in smaller batches, continuously updating a model with new information.

Other challenges are the possible occurrence of concept drift, the emergence of new trends and concepts, but also the repeated/cyclic occurrence of the same concepts.