A Bag-of-Features Framework for Time Series Classification Mustafa Gokce Baydogan* George Runger* Eugene Tuv† 11/13/2011 * Arizona State University † Intel Corporation Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Outline Time series classification Problem definition Literature review Motivation A Bag-of-Features Framework for Time Series Classification Approach Algorithm Computational experiments and results Conclusions and future work Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 2 Time Series Classification A supervised learning problem aimed at labeling temporally structured univariate (or multivariate) sequences of certain (or variable) length. Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 3 Time Series Classification Gun Point Problem OSU Leaf 0 50 Mustafa Gokce Baydogan, George Runger and Eugene Tuv 100 150 200 250 300 INFORMS Annual Meeting 2011, Charlotte 350 400 450 4 Time Series Classification Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 5 Literature review The algorithms proposed for time series classification can be divided into Instance-based methods predict a test instance based on its similarity to the training instances Requires a similarity measure (distance measure) Feature-based methods predict a test instance based on a model built on the feature vectors extracted from a set of instances Requires feature extraction methods and a prediction model to be built on the extracted features Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 6 Literature review Instance-based methods Nearest neighbor classifiers Euclidean distance Dynamic time warping (DTW) Fast but can not handle shift in time Strong solution known for time series problems in a variety of domains [1] Shapelets [2,3] finds local patterns in a time series that are highly predictive of a class based on a distance measure Heraldic shields Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 7 Literature review Feature-based methods Global information is compact representation of the series (such as mean/variance of the time series), they are often too rigid to represent time series Local patterns define the class They may shift over time Relation between certain patterns may be important The cardinality of the local feature set may vary. Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 8 Literature review Feature-based methods Most of the approaches extract features of intervals and build a prediction model Use of knots of a piecewise linear approximation of the series as features [4] Feature extraction through genetic algorithm (finding representative features), SVM to classify based on extracted features [5] Neural networks [6] Decision trees [7] Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 9 Motivation Although dynamic time warping (DTW) provides strong results, it is not suitable for real time applications [3] DTW has a time complexity of O(n2) the complexity reduces to O(n) using a lower bound (LB_Keogh [8]) where n is the length of the time series This complexity is for comparing only two series (train to test). DTW distance of a test data should be compared to every training data (or at least some of them) which increases the computation time significantly. Disadvantage of instance based classifier Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 10 Motivation Feature-based approaches assume that patterns exist in the same time interval over the instances, but a pattern that defines a certain class may exist anywhere in time. Each time series is standardized xi xi Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 11 Motivation DTW attempts to compensate for possible time translations between features, but with long time series, relatively short features of interest, and moderate noise, the capability for DTW is degraded. The discontinuity of certain patterns in time is another problem which affects performance of DTW. Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 12 A Bag-of-Features Framework for Time Series Classification Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 13 A Bag-of-Features Framework for Time Series Classification Bag of features is also referred as Bag of words in document analysis Bag of instances in multiple instance learning (MIL) Bag of frames in audio and speech recognition Three main steps local feature extraction codebook generation Similarity based Unsupervised Supervised classification from the codebook Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 14 Local feature extraction A subsequence s is generated randomly a lower bound on the subsequence length is set. It is a factor of the length of the time series (M). After partitioning subsequence to intervals, interval features are extracted (mean, variance, slope). Number of intervals (d) must be the same for all subsequences (the cardinality of the feature set must be the same), however the subsequence lengths are random. Random length intervals are used. We set a minimum interval length so that extracted features are valid. (i.e mean value of single time point is not meaningful) The number of intervals to partition a subsequence is set as Number of subsequences generated for each series Maximum number of intervals – number of intervals of a subsequence Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 15 Local feature extraction Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 16 Codebook generation After local feature extraction for each time series, a new dataset is generated where each subsequence from each time series becomes an instance. the class label defined for each instance is the class of the corresponding time series We use a classifier that generates a class probability estimate for each instance (subsequence) The estimate provides information on the strength of an assignment. Codebook consists of Frequency of predicted subsequence classes for each time series Histogram of the class probability estimates for each class number of bins of equal probability for generation is a parameter (b) Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 17 Codebook generation Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 18 Classification Global features can supplement the codebook A classifier is built on the codebook and the global features Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 19 TSBF Random Forest (RF) [9] is an ensemble of N decision trees Each tree is constructed using a different bootstrap sample from the original data About one-third of the cases are left out of the bootstrap sample and not used in the construction of the single tree. (out-of-bag (OOB) samples) At each node of each tree, a RF considers the best split based on only a random sample of features. The random selection reduces the variance of the classifier, and also reduces the computational complexity Random Forest (RF) is used as a classifier in our study. It It It It is fast (parallel implementation is even faster) is inherently multiclass (i.e. unlike SVM) can handle missing values provides class probability estimates based on OOB samples The estimates computed from OOB predictions are good estimates of the generalization error Other classifiers that generates similar information may have problems with overfitting. Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 20 Computational experiments and results Parameters 20 datasets from UCR time series database Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 21 Computational experiments and results Some of the parameters are fixed based on the OOB error rates (number of bins, number of trees) Based on the training data (not after seeing the results on test data) Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 22 Computational experiments and results TSBF is implemented in both R and C. We compare our algorithm to nearest neighbors (NN) classifiers with DTW. Two versions of DTW are considered: NNBestDTW searches for the best warping window, based on the training data, then uses the learned window on the test data. NNDTWNoWin has no warping window. NNBestDTW is known to be a strong solution for time series classification We replicate TSBF ten times because of its random nature and report average error rate on test data over ten replications Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 23 Computational experiments and results Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 24 Computational experiments and results Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 25 Computational experiments and results Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 26 Computational experiments and results *Windows 7 system with 8 GB RAM, dual core CPU (i7-3620M 2.7 GHz) Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 27 Conclusion and Future Work We present an approach that Can handle local information better Is faster than instance based classifiers Allows for integration of local information Performs better than competitive methods Provides insight about the problem Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 28 Conclusion and Future Work Knowledge extraction from random forests find patterns that are unique for each class Current approach works on series of the same length Variable importance measures from RF Modification of local feature extraction for series of variable length Extending to multivariate time series classification problems Straightforward Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 29 THANKS, Questions? This research was partially supported by ONR grant N00014-09-1-0656 The code of TSBF and the datasets are provided in http://www.mustafabaydogan.com/research/time-series-classification.html The paper will be available soon after submission. Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 30 References Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 31 References (continued) Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 32 Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 33

1/--страниц