An improved input parameters-insensitive trajectory clustering algorithm.

Abstract: The existing trajectory clustering (TRACLUS) is sensitive to the input parameters ε and MinLns. The parameter valueis changed a little, but cluster results are entirely different. Aiming at this vulnerability, a shielding parameters sensitivity trajectory cluster (SPSTC) algorithm is proposed which is insensitive to the input parameters. Firstly, some definitions about the core distance and reachable distance of line segment are presented, and then the algorithm generates cluster sorting according to the core distance and reachable distance. Secondly, the reachable plots of line segment sets are constructed according to the cluster sorting and reachable distance. Thirdly, a parameterized sequence is extracted according to the reachable plot, and then the final trajectory cluster based on the parameterized sequence is acquired. The parameterized sequence represents the inner cluster structure of trajectory data. Experiments on real data sets and test data sets show that the SPSTC algorithm effectively reduces the sensitivity  to the input parameters, meanwhile it can obtain the better quality of the trajectory cluster.

Jiashun Chen

I am a visiting scholar in data mining lab of UOL. I received M.S. from China University of Geosciences, Ph.D. from Nanjing University of Aeronautics and Astronautics(NUAA) in China.His research interests are Data Mining.

Ensemble Framework for Missing Feature Problem in Data Stream Classification

Hanqing Hu, Mehmed Kantardzic

A dynamic data stream requires the classification framework to adapt to changes in the stream. A common strategy for adaptation is to train new models or update existing models when changes occur. However, in real world applications, some features of the data can be missing when training new models. This can be due to faulty devices or interruption in data transmission. The performance for new models trained with incomplete data may be negatively impacted. If no update to models occurs, performance may remain low even after the data stream is restored back to full feature. To solve this missing feature problem we propose Ensemble Framework for Missing Feature (EFMF). The framework trains new models using available features then update the model once the data stream is restored. Experimentally we show that our framework outperforms the two naïve approaches where the framework waits for all features and then trains new models and where the framework train with incomplete data with no update later on.

Hanqing Hu

I’m a ph.D student in the data mining lab. My research area is in stream mining.

Sliding Reservoir Approach for Delayed Labeling in Streaming Data Classification

Hanqing Hu, Mehmed Kantardzic

Download Paper

Abstract

When concept drift occurs within streaming data, a streaming data classification framework needs to update the learning model to maintain its performance. Labeled samples required for training a new model are often unavailable immediately in real world applications. This delay of labels might negatively impact the performance of traditional streaming data classification frameworks. To solve this problem, we propose Sliding Reservoir Approach for Delayed Labeling (SRADL). By combining chunk based semi-supervised learning with a novel approach to manage labeled data, SRADL does not need to wait for the labeling process to finish before updating the learning model. Experiments with two delayed-label scenarios show that SRADL improves prediction performance over the naïve approach by as much as 7.5% in certain cases. The most gain comes from 18-chunk labeling delay time with continuous labeling delivery scenario in real world data experiments.