Welcome to the UCR Time Series
Classification/Clustering Page
Eamonn Keogh,
Xiaopeng Xi, Li Wei, and Chotirat (Ann) Ratanamahatana

This data resource was funded by an NSF Career Award
0237918, from 2003 to 2008, and continues to be funded through NSF awards
0803410 and
0808770. Partial funding was also made available by a gift from
ISCA technologies
This webpage has been created as a public service to
the data
mining/machine learning community, to encourage reproducible
research for time
series classification and clustering.
Note that the data here is useful for testing
classification
/ clustering, and the accuracy of indexing techniques.
However the
datasets are too small to make claims about the efficiency of
indexing.
For this, email Dr. Keogh requesting a free CD-rom of larger datasets.
If you want datasets to test anomaly detection algorithms, many such
datasets
are here. A
comparison of the results below with classic machine learning
algorithms is here, thanks to Tony Bagnall and to Weka for this.
Classification:
| Name |
First paper |
Number of classes |
Size of training set |
Size of testing set |
Time series Length |
1-NN Euclidean
Distance |
1-NN
Best Warping Window DTW (r) |
1-NN DTW, no
Warping Window |
| Synthetic Control |
Pham
|
6 |
300 train |
300 test |
60 |
0.12
|
0.017 (6) |
0.007 |
| Gun-Point |
Ratanamahatana |
2 |
50 |
150 |
150 |
0.087 |
0.087 (0) |
0.093 |
| CBF |
|
3 |
30 |
900 |
128 |
0.148 |
0.004 (11) |
0.003 |
| Face (all) |
Xi |
14 |
560 |
1,690 |
131 |
0.286 |
0.192 (3) |
0.192 |
| OSU Leaf |
Gandhi |
6 |
200 |
242 |
427 |
0.483 |
0.384 (7) |
0.409 |
| Swedish Leaf |
Soderkvist |
15 |
500 |
625 |
128 |
0.213 |
0.157 (2) |
0.210 |
| 50Words |
Rath |
50 |
450 |
455 |
270 |
0.369 |
0.242 (6) |
0.310 |
| Trace |
Roverso |
4 |
100 |
100 |
275 |
0.24 |
0.01 (3) |
0.0 |
| Two Patterns |
Geurts |
4 |
1,000 |
4,000 |
128 |
0.09 |
0.0015 (4) |
0.0 |
| Wafer |
Olszewski |
2 |
1,000 |
6,174 |
152 |
0.005 |
0.005 (1) |
0.020 |
| Face (four) |
Ratanamahatana |
4 |
24 |
88 |
350 |
0.216 |
0.114 (2) |
0.170 |
| Lightning-2 |
Eads
|
2 |
60 |
61 |
637 |
0.246 |
0.131 (6) |
0.131 |
| Lightning-7 |
Eads |
7 |
70 |
73 |
319 |
0.425 |
0.288 (5) |
0.274 |
| ECG |
Olszewski |
2 |
100 |
100 |
96 |
0.12 |
0.12 (0) |
0.23 |
| Adiac |
Jalba
|
37 |
390 |
391 |
176 |
0.389 |
0.391 (3) |
0.396 |
| Yoga |
Xi |
2 |
300 |
3,000 |
426 |
0.170 |
0.155 (2) |
0.164 |
| Fish (readme) |
Lee |
7 |
175 |
175 |
463 |
0.217 |
0.160(4) |
0.167 |
| Plane |
readme |
7 |
105 |
105 |
144 |
0.038 |
0.0(5) |
0 |
| Car |
readme |
4 |
60 |
60 |
577 |
0.267 |
0.233(1) |
0.267 |
Beef
|
Tony Bagnall |
5 |
30 |
30 |
470 |
0.467
|
0.467
|
0.5
|
Coffee
|
Tony Bagnall |
2
|
28
|
28
|
286
|
0.25
|
0.179
|
0.179
|
OliveOil
|
Tony Bagnall |
4
|
30
|
30
|
570
|
0.133
|
0.167
|
0.133
|
| Please
donate data! |
|
|
|
|
|
|
|
|
How
to get the datasets:
The Synthetic
Control datasets are available above, and the code to reproduce the
1-NN Euclidean Distance
result is available below. For the rest of the data, read on.
In
order to get the password to the data,
please carefully read the points below.
1)
Do not share the password or datasets with others
(exception, co-authors on the current paper).
2)
If you modify the data in anyway (add noise, add warping etc),
please give the modified data back to the archive before you
submit your paper (that way a diligent reviewer can test your claims
while the paper is under review).
3)
Where possible, we strongly advocate testing and
publishing results on all datasets (to avoid cherry-picking),
unless of course you are making an explicit claim for only a certain type of
data (i.e. classifying short time series). In the
event you dont have space in your paper, we suggest you create an
extended tech report online and point to it.
4)
If you have additional datasets, we ask that you donate them
to the archive in our simple format.
5)
We strongly encourage you to make only statistically
significant claims about the relative performances of
algorithms/distance measures. Consider the results on the Synthetic
Control dataset. It would be tempting to say that unconstrained warping
beats constrained warping. But unconstrained warping gets 2 wrong and
constrained warping gets 5 wrong out of 300. This is not
statistically significant evidence that one is better (in fact, you can
show this by doing different random shuffles of the data and getting
the opposite result). In contrast, either measure is better
than Euclidean distance (which gets 36 wrong) using a two-tailed,
paired t-test with a p-value = 0.01. We strongly advocate reading On
Comparing Classifiers by Salzberg.
6)
When you write your paper, please make reproducibility your
goal. In particular, explicitly state all parameters. A good guiding
principle is to ask yourself Could a smart grad student get
the exact same results as claimed in this paper with a days effort?.
If the answer is no, we believe that something is wrong. Help the
imaginary grad student by rewriting your paper.
7)
Where possible, make your code available (as we
have done below).
8) If you are advocating a new distance/similarity
measure, we strongly recommend you test and report the 1-NN
accuracy (as above). Note that this does not preclude the addition of
other of tests, however the 1-NN test has the advantage of having no
parameters and allowing comparisons between methods.
9)
Please reference the datasets in your paper as
Keogh, E., Xi, X., Wei, L. &
Ratanamahatana, C. A. (2006). The UCR Time Series
Classification/Clustering Homepage:
www.cs.ucr.edu/~eamonn/time_series_data/
After
reading the above, cut out the text to either A or B below,
sign it with your full name, and email it to Dr. Keogh. If you are a
grad
student/post-doc, you must discuss this with your adviser first and CC
him/her
when requesting the password.
A)
I have read the points above, and agree to all of them. Please send me
the password.
B) I have
read the points above, but I do not agree to all of them. In
particular, I do NOT agree with... (please enumerate). Nevertheless, I
want the data. Please send me the password.
| Code:
Here is the code used to create the results shown in the
table above (in Matlab). Note that the training step is completely
separated from the testing step. In particular the classification
algorithm can only "see" the training data, the training data labels,
and one unlabeled test instance at a time.
If you want to compare a new distance measure with the results
above, all you need to do is to change one line of code in the Classification_Algorithm function!
Note that this code is optimized for simplicity, not speed!
Please do not report timing results using this code.
Euclidean Distance can be speeded up using branch
and bound, and DTW can be significantly speeded up using LB_Keogh.
|
function UCR_time_series_test %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (C) Eamonn Keogh
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TRAIN = load('synthetic_control_TRAIN'); %
Only these two lines need to be changed to test a different dataset. %
TEST = load('synthetic_control_TEST'
); % Only these two lines need to be changed to test a different
dataset. %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TRAIN_class_labels = TRAIN(:,1); % Pull out the class labels.
TRAIN(:,1) =
[];
% Remove class labels from training set.
TEST_class_labels = TEST(:,1); % Pull out the class labels.
TEST(:,1) =
[];
% Remove class labels from testing set.
correct = 0; % Initialize the number we got correct
for
i = 1 : length(TEST_class_labels) % Loop over
every instance in the test set
classify_this_object = TEST(i,:);
this_objects_actual_class = TEST_class_labels(i);
predicted_class = Classification_Algorithm(TRAIN,TRAIN_class_labels,
classify_this_object);
if predicted_class ==
this_objects_actual_class
correct = correct + 1;
end;
disp([int2str(i), ' out of ',
int2str(length(TEST_class_labels)), ' done']) %
Report progress
end;
%%%%%%%%%%%%%%%%%
Create Report
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
disp(['The dataset you tested has ',
int2str(length(unique(TRAIN_class_labels))), ' classes'])
disp(['The training set is of size ',
int2str(size(TRAIN,1)),', and the test set is of
size ',int2str(size(TEST,1)),'.'])
disp(['The time series are of length ',
int2str(size(TRAIN,2))])
disp(['The error rate was
',num2str((length(TEST_class_labels)-correct
)/length(TEST_class_labels))])
%%%%%%%%%%%%%%%%% End Report
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Here is a sample classification
algorithm, it is the simple (yet very competitive) one-nearest
% neighbor using the Euclidean
distance.
% If you are advocating a new
distance measure you just need to change the line marked "Euclidean
distance"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function predicted_class =
Classification_Algorithm(TRAIN,TRAIN_class_labels,unknown_object)
best_so_far = inf;
for i = 1 :
length(TRAIN_class_labels)
compare_to_this_object = TRAIN(i,:);
distance = sqrt(sum((compare_to_this_object -
unknown_object).^2)); % Euclidean distance
if distance <
best_so_far
predicted_class
= TRAIN_class_labels(i);
best_so_far = distance;
end
end;
|
| >>
UCR_time_series_test
1 out of 300
done
2 out of 300 done
...
298 out of 300 done
299 out of 300 done
300 out of 300 done
The dataset you tested has 6 classes
The training set is of size 300, and the test set is of size 300.
The time series are of length 60
The error rate was 0.12
|
Notes:
- Here "First paper" means
first paper to use this data, but not (necessarily) using these
training/test splits. In addition the data here may have been
resampled, normalized or processed in other ways.
- This approach,
where we find
the best width of the Sakoe-Chiba Band by a search over the training
set is a
special case of Ratanamahatana-Keogh
Band classification where the
threshold is the length of the time series. See Ratanamahatana,
C. A. and Keogh. E. (2004). Making
Time-series Classification More Accurate Using Learned Constraints and see Ratanamahatana, C. A. and
Keogh. E. (2004). Everything you
know about Dynamic Time Warping is Wrong.
- D.T. Pham and
A.B. Chan. (1998). Control Chart Pattern Recognition
using a New Type of Self Organizing Neural Network.
- A.C. Jalba, M.H.F.
Wilkinson, J.B.T.M. Roerdink, M.M. Bayer and S. Juggins.
Automatic Diatom Identification using Contour Analysis by Morphological
Curvature Scale Spaces.
- D. Eads, D. Hill,
S. Davis, S. Perkins, J. Ma, R. Porter, and J. Theiler.
"Genetic Algorithms and Support Vector Machines for Time Series
Classification." Proc. SPIE 4787. pp. 74-85. July, 2002
- Robert T. Olszewski.
Generalized Feature Extraction for Structural Pattern
Recognition in Time-Series Data. PhD thesis, Carnegie Mellon
University, Pittsburgh, PA, 2001.
- Geurts, P.
(2002). Contributions to decision tree induction: bias/variance
tradeoff and time series classification. PhD thesis, Department of
Electrical Engineering, University of Liege, Belgium.
- Oskar J. O.
Soderkvist. Computer Vision Classification of Leaves from
Swedish Trees, Master thesis, Linkoping University, Sweden, 2001.
- Ashit Gandhi.
Content-Based Image Retrieval: Plant Species Identification. Master
thesis, Oregon State University, September, 2002.
- Rath, T. & Manmatha,
R. (2003). Word Image matching using dynamic time warping.
CVPR, Vol. II, pp.521-527.
- Roverso, D. (2000).
Multivariate temporal classification by windowed wavelet decomposition
and recurrent neural networks. In 3rd ANS International Topical
Meeting on Nuclear Plant Instrumentation, Control and Human-Machine
Interface, 2000.
- Xiaopeng Xi, Eamonn
Keogh, Christian Shelton, Li Wei & Chotirat Ann Ratanamahatana
(2006). Fast Time Series Classification Using Numerosity
Reduction. ICML 2006.
- Cherry picking, literally meaning harvesting
cherries, is used metaphorically to accuse someone of pointing at
individual cases which seem to confirm his or her position, while
ignoring a significant portion of related cases that may contradict it.
- Salzberg S. On comparing classifiers:
Pitfalls to avoid and a recommended approach. Data Mining and
Knowledge Discovery 1997; 1:317--327
- Keogh, E. & Kasetty,
S. (2002). On the Need for Time Series Data Mining Benchmarks: A
Survey and Empirical Demonstration. In the 8th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. July 23 - 26, 2002.
Edmonton, Alberta, Canada. pp 102-111.
- D.J. Lee, R. Schoenberger, D. Shiozawa, X.
Xu, and P. Zhan, Contour Matching for a Fish Recognition and
Migration Monitoring System. SPIE Optics East, Two and
Three-Dimensional Vision Systems for Inspection, Control, and Metrology
II, vol. 5606-05, p. 37-48, Philadelphia, PA, USA, October 25-28, 2004.
- The fish in question are mostly close
to horizontal, facing left as they swam passed the camera, however a rotation
invariant version of the distance measures does give better
accuracy, see LB_Keogh Supports Exact Indexing of Shapes
under Rotation Invariance with Arbitrary Representations and Distance
Measures. VLDB 2006.
- Note for Car and Plane: The donor of
these two datasets asked me not to make them available until his/her
journal paper is accepted. As soon as I get word, I will put this data
online.
People that have Downloaded the Data:
- Shiyuan Liu, Li Lv.
- Thiago Santos Quirino, Mei-Ling Shyu University of Miami.
- Pierre-Franqois Marteau,
- Cosmin Bocaniala, Lancaster University.
- Yueguo Chen,Anthony K.H. Tung, Beng Chin Ooi, National University
of Singapore.
- Vernon Rego, Vernon Rego. Purdue University.
- Mislav Malenica, Tomislav Smuc
- Man Hon WONG, ZHOU Mi, The Chinese University of Hong Kong.
- Guillaume Bouchard Xerox Research Centre Europe.
- Hoa Vo and David Joslin Seattle University.
- Carlotta Orsenigo , University degli Studi di Milano.
- Dr. Paolo Ciaccia
- Xiaoqing Weng, Jiaotong University.
- Longin Jan Latecki and Qiang Wang, Temple University.
- Tony Bagnall
- Michail Vlachos, IBM.
- Bernard Hugueney
- Sourav Mukherjee
-
Marcos M. Campos (Oracle)
-
Ludmila I. Kuncheva
-
Edward Omiecinski and Jun Li
- Victor Eruhimov, Intel
- Rob Jasper
- Andre Coelho
- Gernot Herbst
- Vit Niennattrakul
- Nozomi Matsuda
- Flavio Miguel Varejao and Idilio Drago
- HAORIANTO COKROWIJOYO TJIOE
- Molnar Miklos
- Steinn Gudmundsson and Thomas Runarsson
- Niall Adams and Sai Wing Man
- Isabel Maria Marques da Silva, Maria Eduarda da Rocha
Pinto Augusto da Silva and Joaquim Fernando Pinto da Costa
- Panagiotis Papapetrou and George Kollios
- Huang Tan
- Sergio Guadarrama
- Alicia Troncoso Lora
- Pyry Avist
- Peng-Yi Lai
- Yong Fu
- Soheil Bahrampour
- Long Yao and Meng Bo
- Robert Moskovitch and Yuval Shahar
- Abdellali Kelil
- Puspadevi Kuppusamy
- Qun Dai and Songcan Chen
- Lisa Gralewski
- Maria Teresinha Arns Steiner and Rosangela Villwock
- Amir Ahmad and Galvin Brown
- Abhijit Jayant Kulkarni
- Xingquan (Hill) Zhu
- Amol Deshpande and Qiang Qiu
- Vercellis Carlo and Gianni Alberti
- Pamela Nerina Llop
- Tobias Scheffer
- Jochen Fischer
- Mao Ye and Yingying Zhu
- Cintia Lera
- George Runger and Rohit Das
- Omar U. Florez and Seungjin Lim
- Ruy Luiz Milidiu and Pedro Teixeira
- Mykola Galushka and Dave Patterson
- Rahul Sinha
- Minh Hoai Nguyen and Fernando de la Torre
- Eric Eaton
- NGUYEN Van Hanh
- Lucas F. Rosada
- Nicky Van Thuyne
- Skopal Tomáš and Michal Vajbar
- Carlo Piccardi and Martina Maggio
- Muhammad Aamir Khan
- Larry Deschaine
- Janosa Andras
- Andrew Starkey
|
- Andre Rodrigo Sanches and Nina Sumiko Tomita Hirata
- Claudio Piciarelli and Gian Luca Foresti
- Wei T. Yue
- Michael Botsch and Josef A. Nossek
- Bingyu Sun
- Sun, Fu-Shing
- Babak Amiri
- Xing ChunXiao and Du Xutao, Tsinghua University
- Elloumi Samir , Sondess Bentekaya
- Li Shijin
- Erik Learned-Miller, Marwan A. Mattar
- Chiranjib Bhattacharya,Karthik K
- Nicandro Cruz Ramirez
- Jiankui Guo Fudan University
- Bin Z Zhang IBM
- Yi-Dong Shen and Zhiyong Shen
- Georgios Evangelidis, Leonidas Karamitopoulos
- Hendrik Purwins
- Jignesh M. Patel and Michael Morse
- Gert Van Dijck and Marc Van Hulle
- Chao Hui Lee and Vincent Tseng
- Linh Tran (Boeing)
- Hugo Alonso Vilares Monteiro and Joaquim Fernando Pinto da Costa
- David Minnen
- Tsuyoshi Mikami
- Qiang Yang and Sinno Pan
- Paolo Tormene
- Hui Ding and Peter Scheuermann
- Ronaldo Cristiano Prati
- Christian Gruber and Bernhard Sick
- Silvia Chiappa
- Ankur Jain
- Maria Cristina Ferreira de Oliveira and Aretha Barbosa
Alencar
- Febri Andriani
- Myeong-Seon Gi
- Pengtao jia
- Farid Seifi.
- Clodoaldo Aparecido de Moraes Lima
- Konstantinos Blekas
- Inderjit Dhillon and HYUK CHO
- Aida Valls
- Narayanan Chatapuram Krishnan and Sethuraman Panchanathan
- Stephan Günnemann and Thomas Seidl
- Thirumaran Ekambaram and M. Narasimha Murty
- Christine Preisach and Lars Schmidt-Thieme
- Dino Isa and Rajprasad Kumar
- Feibao Zhuo
- Frans van den Bergh
- Kfir Glik
- Xiao Yu
- Yingying Zhu
- Rosanna Verde and Antonio Balzanella
- Paul Baggenstoss
- Koichi ASAKURA and Wei Fan
- Lucia Sacchi and Iyad Batal
- Morné Neser
- Luca Chiaravalloti
- vikram deshmukh
- Harri M.T. Saarikoski
- Caio Nogara Andreatta and Neusa Grando
- Ying Xie
- Ulf Großekathöfer
- Christophe Genolini
- Céline Robardet
- Musa Chemisto
- Wangmeng Zuo
- Paolo Remagnino
- Soumi Ray and Tim Oates
- Pierre Gançarski and Francois Petitjean
- Joydeep Ghosh and John Tourish
- Alireza-Xaker
- Wilfgang nejdl and ERNESTO DIAZ-AVILES
- Romain Tavenard and Laurent Amsaleg
- Hahn-Ming Lee, Christos Faloutsos,Hsing-Kuoh Pao,
Ching-Hao (Eric) Mao
|
- Tetsuya Nakamura
- Daniel Alejandro Garcia Lopez
- Richard J. Povinelli
- Yi-Dong Shen and Zhiyong Shen
- Emmanuel Viennet
- Remi Gaudin nicolas nicoloyannis
- Phil Gross
- Francois Portet
- Qi He , Dr. Kuiyu Chang, Dr. Ee-Peng Lim
- Fabio Antonio Pereira Reis
- Damien Tessier
- Cuvelier Etienne
- Moataz El Ayadi & Mohamed S. Kamel
- Hillol Kargupta
- Andrey Ustyuzhanin
-
Hussam Alshraideh
-
John M. Trenkle
-
Boris Martinez Jimenez & Francisco Herrera Fernandez
-
Juan Jose Rodriguez
-
Zoltan Banko
-
Anjan Goswami & Debashis Mondal
-
Simon Kagwi Mwangi
- Simone Fontolan and Alessandro Garghetti
- Lin Zhang and Joe Song
- Zhenzhi Huang and Zhenfeng He
- Waleed Kadous
- Hao Hu and Qiang Yang
- Kay Robbins and Dragana Veljkovic
- Daniel Pena, Regina Kaiser, Ana Laura Badagiani
- Daniel Graves and Witold Pedrycz
- Anne Denton
- Julia Hunter and Martin Colley
- Lucia Sacchi
- Bernhard Seeger, Michael Grau.
- John Maindonald
- Balazs Torma
- Nikolaos Chatzis
- Daniel Smith
- Abdul Razak, Khairuddin Omar
- Elwin (Yong) lee
- Alex Smola and Xinhua Zhang
- Rudolf Kruse and Christian Moewes
- Pekka Siirtola
- Michael Berthold
- David Bong and James Tan
- Zhengzheng (Crystal) Xing and Jian Pei
- Leticia Arco Garcia and Rafael Bello
- Nuno Constantino Castro and Paulo Azeved
- Ng Kam Swee
- Antonio Irpino
- Jong Myoung Ko
- Jonas Richiardi
- Dhaval Patel, Wynne Hsu and Lee Mong Lee.
- Ville Hautamaki
- Peter Sunehag
- Richard Clements
- Hichem Frigui and Walid MISSAOUI
- KASHIMA Toru
- Tang Ke and Guojie Song
- Jinfu Fan
- Ruchira Guha
- Fan Zhou and Wu Yue
- Chen Duansheng
- Cheng wencong
- Xiaoli Li
- Fedor Zhdanov and Vladimir Vovk
- WU Quan-Yuan
- Anuradha Kodali
- Keith Noah Snavely
- Hilario Navarro Veguillas and Jesus Bouso
|