Welcome to the UCR Time Series Classification/Clustering Page

Eamonn Keogh, Xiaopeng Xi, Li Wei, and Chotirat (Ann) Ratanamahatana 

                   

This data resource was funded by an NSF Career Award 0237918, from 2003 to 2008, and continues to be funded through NSF awards 0803410 and 0808770. Partial funding was also made available by a gift from ISCA technologies

This webpage has been created as a public service to the data mining/machine learning community, to encourage reproducible research for time series classification and clustering. 

Note that the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing. For this, email Dr. Keogh requesting a free CD-rom of larger datasets.  If you want datasets to test anomaly detection algorithms, many such datasets are here. A comparison of the results below with classic machine learning algorithms is here, thanks to Tony Bagnall and to Weka for this.

 Classification:

Name First paper Number of classes Size of training set Size of testing set Time series Length  1-NN Euclidean Distance 1-NN Best Warping Window DTW (r)  1-NN DTW, no Warping Window
Synthetic Control

 Pham 

6 300 train 300 test 60

0.12

 0.017 (6)  0.007
Gun-Point  Ratanamahatana  2  50  150  150  0.087  0.087 (0)  0.093
CBF    3  30  900  128  0.148  0.004 (11)  0.003
Face (all) Xi  14  560  1,690  131  0.286  0.192 (3)  0.192
OSU Leaf Gandhi   6  200  242  427  0.483  0.384 (7)  0.409
Swedish Leaf Soderkvist  15  500  625  128  0.213  0.157 (2) 0.210
50Words  Rath  50  450  455  270  0.369  0.242 (6)  0.310
Trace Roverso   4  100  100  275  0.24  0.01 (3)  0.0
Two Patterns  Geurts  4  1,000  4,000  128  0.09  0.0015 (4)  0.0
Wafer  Olszewski  2  1,000  6,174  152  0.005  0.005 (1)  0.020
Face (four)  Ratanamahatana  4  24  88  350  0.216  0.114 (2)  0.170
Lightning-2

 Eads

 2  60  61  637  0.246  0.131 (6)  0.131
Lightning-7  Eads  7  70  73  319  0.425  0.288 (5)  0.274
ECG Olszewski 2 100 100 96 0.12 0.12 (0) 0.23
Adiac

Jalba

37 390 391 176 0.389 0.391 (3) 0.396
Yoga Xi 2 300 3,000 426 0.170 0.155 (2) 0.164
Fish (readme) Lee 7 175 175 463 0.217 0.160(4) 0.167
Plane readme 7 105 105 144 0.038 0.0(5) 0
Car readme 4 60 60 577 0.267 0.233(1) 0.267
Beef
Tony Bagnall  30 30  470  0.467
0.467
0.5
Coffee
Tony Bagnall 2
28
28
286
0.25
0.179
0.179
OliveOil
Tony Bagnall 4
30
30
570
0.133
0.167
0.133
Please donate data!







How to get the datasets:

The Synthetic Control datasets are available above, and the code to reproduce the 1-NN Euclidean Distance result is available below. For the rest of the data, read on.

In order to get the password to the data, please carefully read the points below.

1)      Do not share the password or datasets with others (exception, co-authors on the current paper).

2)      If you modify the data in anyway (add noise, add warping etc), please give the modified data back to the archive before you submit your paper (that way a diligent reviewer can test your claims while the paper is under review).

3)      Where possible, we strongly advocate testing and publishing results on all datasets (to avoid cherry-picking), unless of course you are making an explicit claim for only a certain type of data (i.e. classifying short time series). In the event you dont have space in your paper, we suggest you create an extended tech report online and point to it.

4)      If you have additional datasets, we ask that you donate them to the archive in our simple format.

5)      We strongly encourage you to make only statistically significant claims about the relative performances of algorithms/distance measures. Consider the results on the Synthetic Control dataset. It would be tempting to say that unconstrained warping beats constrained warping. But unconstrained warping gets 2 wrong and constrained warping gets 5 wrong out of 300. This is not statistically significant evidence that one is better (in fact, you can show this by doing different random shuffles of the data and getting the opposite result). In contrast, either measure is better than Euclidean distance (which gets 36 wrong) using a two-tailed, paired t-test with a p-value = 0.01. We strongly advocate reading On Comparing Classifiers by Salzberg.

6)      When you write your paper, please make reproducibility your goal. In particular, explicitly state all parameters. A good guiding principle is to ask yourself Could a smart grad student get the exact same results as claimed in this paper with a days effort?. If the answer is no, we believe that something is wrong. Help the imaginary grad student by rewriting your paper.

7)  Where possible, make your code available (as we have done below).

8)  If you are advocating a new distance/similarity measure, we strongly recommend you test and report the 1-NN accuracy (as above). Note that this does not preclude the addition of other of tests, however the 1-NN test has the advantage of having no parameters and allowing comparisons between methods.

9)     Please reference the datasets in your paper as Keogh, E., Xi, X., Wei, L. & Ratanamahatana, C. A. (2006). The UCR Time Series Classification/Clustering Homepage: www.cs.ucr.edu/~eamonn/time_series_data/

After reading the above, cut out the text to either A or B below, sign it with your full name, and email it to Dr. Keogh. If you are a grad student/post-doc, you must discuss this with your adviser first and CC him/her when requesting the password. 

 A) I have read the points above, and agree to all of them. Please send me the password.

 B) I have read the points above, but I do not agree to all of them. In particular, I do NOT agree with... (please enumerate). Nevertheless, I want the data.  Please send me the password.

 

Code:

Here  is the code used to create the results shown in the table above (in Matlab). Note that the training step is completely separated from the testing step. In particular the classification algorithm can only "see" the training data, the training data labels, and one unlabeled test instance at a time. 

If you want to compare a new distance measure with the results above, all you need to do is to change one line of code in the Classification_Algorithm function! 

Note that this code is optimized for simplicity, not speed! Please do not report timing results using this code. Euclidean Distance can be speeded up using branch and bound, and DTW can be significantly speeded up using LB_Keogh.

function UCR_time_series_test %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (C) Eamonn Keogh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TRAIN = load('synthetic_control_TRAIN'); % Only these two lines need to be changed to test a different dataset. %
TEST  = load('synthetic_control_TEST' ); % Only these two lines need to be changed to test a different dataset. %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


TRAIN_class_labels = TRAIN(:,1);     % Pull out the class labels.
TRAIN(:,1) = [];                     % Remove class labels from training set.
TEST_class_labels = TEST(:,1);       % Pull out the class labels.
TEST(:,1) = [];                      % Remove class labels from testing set.

correct = 0; % Initialize the number we got correct

for i = 1 : length(TEST_class_labels) % Loop over every instance in the test set
      classify_this_object = TEST(i,:);
   this_objects_actual_class = TEST_class_labels(i);
   predicted_class = Classification_Algorithm(TRAIN,TRAIN_class_labels, classify_this_object);
   if predicted_class == this_objects_actual_class
       correct = correct + 1;
   end;
   disp([int2str(i), ' out of ', int2str(length(TEST_class_labels)), ' done']) % Report progress
end;

%%%%%%%%%%%%%%%%% Create Report %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
disp(['The dataset you tested has ', int2str(length(unique(TRAIN_class_labels))), ' classes'])
disp(['The training set is of size ', int2str(size(TRAIN,1)),', and the test set is of size ',int2str(size(TEST,1)),'.'])
disp(['The time series are of length ', int2str(size(TRAIN,2))])
disp(['The error rate was ',num2str((length(TEST_class_labels)-correct )/length(TEST_class_labels))])
%%%%%%%%%%%%%%%%% End Report %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Here is a sample classification algorithm, it is the simple (yet very competitive) one-nearest
% neighbor using the Euclidean distance.
% If you are advocating a new distance measure you just need to change the line marked "Euclidean distance"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function predicted_class = Classification_Algorithm(TRAIN,TRAIN_class_labels,unknown_object)
best_so_far = inf;
 for i = 1 : length(TRAIN_class_labels)
     compare_to_this_object = TRAIN(i,:);
     distance = sqrt(sum((compare_to_this_object - unknown_object).^2)); % Euclidean distance

        if distance < best_so_far
          predicted_class = TRAIN_class_labels(i);
     best_so_far = distance;
    end
end;

>> UCR_time_series_test

1 out of 300 done
2 out of 300 done
...
298 out of 300 done
299 out of 300 done
300 out of 300 done
The dataset you tested has 6 classes
The training set is of size 300, and the test set is of size 300.
The time series are of length 60
The error rate was 0.12

 


Clustering:

Name First paper Number of classes Size of dataset Time series Length  Euclidean Distance K-Means, Best of 10 Runs
Synthetic Control

 Pham 

6 300+300 60

 0.679

Gun-Point  Ratanamahatana  2  50+150  150 0.500
CBF    3  30+900  128 0.626
Face (all) Xi  14  560+1,690  131 0.36
OSU Leaf Gandhi   6  200+242  427 0.378
Swedish Leaf Soderkvist  15  500+625  128 0.406
50Words  Rath  50  450+455  270 0.420
Trace Roverso   4  100+100  275 0.485
Two Patterns  Geurts  4  1,000+4000  128 0.322
Wafer  Olszewski  2  1,000+6,174  152 0.625
Face (four)  Ratanamahatana  4  24+88  350 0.669
Lightning-2

 Eads

 2  60+61  637 0.611
Lightning-7  Eads  7  70+73  319 0.484
ECG Olszewski 2 100+100 96 0.698
Adiac

Jalba

37 390+391 176 0.384
Yoga Xi 2 300+3,000 426 0.517


Notes:

  1. Here "First paper" means first paper to use this data, but not (necessarily) using these training/test splits. In addition the data here may have been resampled, normalized or processed in other ways. 
  2. This approach, where we find the best width of the Sakoe-Chiba Band by a search over the training set is a special case of Ratanamahatana-Keogh Band classification where the threshold is the length of the time series. See Ratanamahatana, C. A. and Keogh. E. (2004). Making Time-series Classification More Accurate Using Learned Constraints and see Ratanamahatana, C. A. and Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong.
  3. D.T. Pham and A.B. Chan. (1998). Control Chart Pattern Recognition using a New Type of Self Organizing Neural Network.
  4. A.C. Jalba, M.H.F. Wilkinson, J.B.T.M. Roerdink, M.M. Bayer and S. Juggins. Automatic Diatom Identification using Contour Analysis by Morphological Curvature Scale Spaces.
  5. D. Eads, D. Hill, S. Davis, S. Perkins, J. Ma, R. Porter, and J. Theiler. "Genetic Algorithms and Support Vector Machines for Time Series Classification." Proc. SPIE 4787. pp. 74-85. July, 2002
  6. Robert T. Olszewski. Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2001.
  7. Geurts, P. (2002). Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, Department of Electrical Engineering, University of Liege, Belgium.
  8. Oskar J. O. Soderkvist. Computer Vision Classification of Leaves from Swedish Trees, Master thesis, Linkoping University, Sweden, 2001.
  9. Ashit Gandhi. Content-Based Image Retrieval: Plant Species Identification. Master thesis, Oregon State University, September, 2002.
  10. Rath, T. & Manmatha, R. (2003). Word Image matching using dynamic time warping. CVPR, Vol. II, pp.521-527.
  11. Roverso, D. (2000). Multivariate temporal classification by windowed wavelet decomposition and recurrent neural networks. In 3rd ANS International Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface, 2000.
  12. Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei & Chotirat Ann Ratanamahatana (2006). Fast Time Series Classification Using Numerosity Reduction. ICML 2006.
  13. Cherry picking, literally meaning harvesting cherries, is used metaphorically to accuse someone of pointing at individual cases which seem to confirm his or her position, while ignoring a significant portion of related cases that may contradict it.
  14. Salzberg S. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1997; 1:317--327
  15. Keogh, E. &  Kasetty, S. (2002). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 - 26, 2002. Edmonton, Alberta, Canada. pp 102-111.
  16. D.J. Lee, R. Schoenberger, D. Shiozawa, X. Xu, and P. Zhan, Contour Matching for a Fish Recognition and Migration Monitoring System. SPIE Optics East, Two and Three-Dimensional Vision Systems for Inspection, Control, and Metrology II, vol. 5606-05, p. 37-48, Philadelphia, PA, USA, October 25-28, 2004.
  17. The fish in question are mostly close to horizontal, facing left as they swam passed the camera, however a rotation invariant version of the distance measures does give better accuracy, see  LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures. VLDB 2006.
  18. Note for Car and Plane: The donor of these two datasets asked me not to make them available until his/her journal paper is accepted. As soon as I get word, I will put this data online.

 

People that have Downloaded the Data:

  • Shiyuan Liu, Li Lv.
  • Thiago Santos Quirino, Mei-Ling Shyu University of Miami.
  • Pierre-Franqois Marteau,
  • Cosmin Bocaniala, Lancaster University.
  • Yueguo Chen,Anthony K.H. Tung, Beng Chin Ooi, National University of Singapore.
  • Vernon Rego, Vernon Rego. Purdue University.
  • Mislav Malenica, Tomislav Smuc
  • Man Hon WONG, ZHOU Mi, The Chinese University of Hong Kong.
  • Guillaume Bouchard Xerox Research Centre Europe.
  • Hoa Vo and David Joslin Seattle University.
  • Carlotta Orsenigo , University degli Studi di Milano.
  • Dr. Paolo Ciaccia
  • Xiaoqing Weng, Jiaotong University.
  • Longin Jan Latecki and Qiang Wang, Temple University.
  • Tony Bagnall
  • Michail Vlachos, IBM.
  • Bernard Hugueney
  • Sourav Mukherjee
  • Marcos M. Campos (Oracle)
  • Ludmila I. Kuncheva
  • Edward Omiecinski and Jun Li
  • Victor Eruhimov, Intel
  • Rob Jasper
  • Andre Coelho
  • Gernot Herbst
  • Vit Niennattrakul
  • Nozomi Matsuda
  • Flavio Miguel Varejao and Idilio Drago
  • HAORIANTO COKROWIJOYO TJIOE
  • Molnar Miklos
  • Steinn Gudmundsson and Thomas Runarsson
  • Niall Adams and Sai Wing Man
  • Isabel Maria Marques da Silva, Maria Eduarda da Rocha Pinto Augusto da Silva and Joaquim Fernando Pinto da Costa
  • Panagiotis Papapetrou and George Kollios
  • Huang Tan
  • Sergio Guadarrama
  • Alicia Troncoso Lora
  • Pyry Avist
  • Peng-Yi Lai
  • Yong Fu
  • Soheil Bahrampour
  • Long Yao and Meng Bo
  • Robert Moskovitch and Yuval Shahar
  • Abdellali Kelil
  • Puspadevi Kuppusamy
  • Qun Dai and Songcan Chen
  • Lisa Gralewski
  • Maria Teresinha Arns Steiner and Rosangela Villwock
  • Amir Ahmad and Galvin Brown
  • Abhijit Jayant Kulkarni
  • Xingquan (Hill) Zhu
  • Amol Deshpande and Qiang Qiu
  • Vercellis Carlo and Gianni Alberti
  • Pamela Nerina Llop
  • Tobias Scheffer
  • Jochen Fischer
  • Mao Ye and Yingying Zhu
  • Cintia Lera
  • George Runger and Rohit Das
  • Andre Rodrigo Sanches and Nina Sumiko Tomita Hirata
  • Claudio Piciarelli and Gian Luca Foresti
  • Wei T. Yue 
  • Michael Botsch and Josef A. Nossek
  • Bingyu Sun
  • Sun, Fu-Shing
  • Babak Amiri
  • Xing ChunXiao and Du Xutao, Tsinghua University
  • Elloumi Samir , Sondess Bentekaya
  • Li Shijin
  • Erik Learned-Miller, Marwan A. Mattar
  • Chiranjib Bhattacharya,Karthik K
  • Nicandro Cruz Ramirez
  • Jiankui Guo Fudan University
  • Bin Z Zhang IBM
  • Yi-Dong Shen and Zhiyong Shen
  • Georgios Evangelidis, Leonidas Karamitopoulos
  • Hendrik Purwins
  • Jignesh M. Patel and Michael Morse
  • Gert Van Dijck and Marc Van Hulle
  • Chao Hui Lee and Vincent Tseng
  • Linh Tran (Boeing)
  • Hugo Alonso Vilares Monteiro and Joaquim Fernando Pinto da Costa
  • David Minnen
  • Tsuyoshi Mikami
  • Qiang Yang and Sinno Pan
  • Paolo Tormene
  • Hui Ding and Peter Scheuermann
  • Ronaldo Cristiano Prati
  • Christian Gruber and Bernhard Sick
  • Silvia Chiappa
  • Ankur Jain
  • Maria Cristina Ferreira de Oliveira and Aretha Barbosa Alencar
  • Febri Andriani
  • Myeong-Seon Gi
  • Pengtao jia
  • Farid Seifi.
  • Clodoaldo Aparecido de Moraes Lima
  • Konstantinos Blekas
  • Inderjit Dhillon and HYUK CHO
  • Aida Valls
  • Narayanan Chatapuram Krishnan and Sethuraman Panchanathan
  • Stephan Günnemann and Thomas Seidl
  • Thirumaran Ekambaram and M. Narasimha Murty
  • Christine Preisach and Lars Schmidt-Thieme
  • Dino Isa and Rajprasad Kumar
  • Feibao Zhuo
  • Frans van den Bergh
  • Kfir Glik
  • Xiao Yu
  • Yingying Zhu
  • Rosanna Verde and Antonio Balzanella
  • Paul Baggenstoss
  • Koichi ASAKURA and Wei Fan
  • Lucia Sacchi and Iyad Batal
  • Morné Neser
  • Luca Chiaravalloti
  • Tetsuya Nakamura
  • Daniel Alejandro Garcia Lopez
  • Richard J. Povinelli
  • Yi-Dong Shen  and Zhiyong Shen
  • Emmanuel Viennet
  • Remi Gaudin nicolas nicoloyannis
  • Phil Gross
  • Francois Portet
  • Qi He , Dr. Kuiyu Chang, Dr. Ee-Peng Lim
  • Fabio Antonio Pereira Reis
  • Damien Tessier
  • Cuvelier Etienne
  • Moataz El Ayadi & Mohamed S. Kamel
  • Hillol Kargupta
  • Andrey Ustyuzhanin
  • Hussam Alshraideh
  • John M. Trenkle
  • Boris Martinez Jimenez & Francisco Herrera Fernandez
  • Juan Jose Rodriguez
  • Zoltan Banko
  • Anjan Goswami & Debashis Mondal
  • Simon Kagwi Mwangi
  • Simone Fontolan and Alessandro Garghetti
  • Lin Zhang and Joe Song
  • Zhenzhi Huang and Zhenfeng He
  • Waleed Kadous
  • Hao Hu and Qiang Yang
  • Kay Robbins and Dragana Veljkovic
  • Daniel Pena, Regina Kaiser, Ana Laura Badagiani
  • Daniel Graves and Witold Pedrycz
  • Anne Denton
  • Julia Hunter and Martin Colley
  • Lucia Sacchi
  • Bernhard Seeger, Michael Grau.
  • John Maindonald
  • Balazs Torma
  • Nikolaos Chatzis
  • Daniel Smith
  • Abdul Razak, Khairuddin Omar
  • Elwin (Yong) lee
  • Alex Smola and Xinhua Zhang
  • Rudolf Kruse and Christian Moewes
  • Pekka Siirtola
  • Michael Berthold
  • David Bong and James Tan
  • Zhengzheng (Crystal) Xing and Jian Pei
  • Leticia Arco Garcia and Rafael Bello
  • Nuno Constantino Castro and Paulo Azeved
  • Ng Kam Swee
  • Antonio Irpino
  • Jong Myoung Ko
  • Jonas Richiardi
  • Dhaval Patel, Wynne Hsu and Lee Mong Lee.
  • Ville Hautamaki
  • Peter Sunehag
  • Richard Clements