![]() |
Welcome to the UCR Time Series Classification/Clustering Page |
![]() |
This data resource was funded by an NSF Career Award 0237918, from 2003 to 2008, and continues to be funded through NSF awards 0803410 and 0808770. Partial funding was also made available by a gift from ISCA technologies
This webpage has been created as a public service to the data mining/machine learning community, to encourage reproducible research for time series classification and clustering.
Note that the data here is useful for testing
classification
/ clustering, and the accuracy of indexing techniques.
However the
datasets are too small to make claims about the efficiency of
indexing.
For this, email Dr. Keogh requesting a free CD-rom of larger datasets.
If you want datasets to test anomaly detection algorithms, many such
datasets
are here. A
comparison of the results below with classic machine learning
algorithms is here, thanks to Tony Bagnall and to Weka for this.
| Name | First paper or data creator | Number of classes | Size of training set | Size of testing set | Time series Length | 1-NN Euclidean Distance |
1-NN Best Warping Window DTW (r) Note that r is the percentage of time series length |
1-NN DTW, no Warping Window |
| Synthetic Control | 6 | 300 train | 300 test | 60 |
0.12 |
0.017 (6) | 0.007 | |
| Gun-Point | Ratanamahatana | 2 | 50 | 150 | 150 | 0.087 | 0.087 (0) | 0.093 |
| CBF | 3 | 30 | 900 | 128 | 0.148 | 0.004 (11) | 0.003 | |
| Face (all) | Xi | 14 | 560 | 1,690 | 131 | 0.286 | 0.192 (3) | 0.192 |
| OSU Leaf | Gandhi | 6 | 200 | 242 | 427 | 0.483 | 0.384 (7) | 0.409 |
| Swedish Leaf | Soderkvist | 15 | 500 | 625 | 128 | 0.213 | 0.157 (2) | 0.210 |
| 50Words | Rath | 50 | 450 | 455 | 270 | 0.369 | 0.242 (6) | 0.310 |
| Trace | Roverso | 4 | 100 | 100 | 275 | 0.24 | 0.01 (3) | 0.0 |
| Two Patterns | Geurts | 4 | 1,000 | 4,000 | 128 | 0.09 | 0.0015 (4) | 0.0 |
| Wafer | Olszewski | 2 | 1,000 | 6,174 | 152 | 0.005 | 0.005 (1) | 0.020 |
| Face (four) | Ratanamahatana | 4 | 24 | 88 | 350 | 0.216 | 0.114 (2) | 0.170 |
| Lightning-2 | 2 | 60 | 61 | 637 | 0.246 | 0.131 (6) | 0.131 | |
| Lightning-7 | Eads | 7 | 70 | 73 | 319 | 0.425 | 0.288 (5) | 0.274 |
| ECG | Olszewski | 2 | 100 | 100 | 96 | 0.12 | 0.12 (0) | 0.23 |
| Adiac | 37 | 390 | 391 | 176 | 0.389 | 0.391 (3) | 0.396 | |
| Yoga | Xi | 2 | 300 | 3000 | 426 | 0.170 | 0.155 (2) | 0.164 |
| Fish (readme) | Lee | 7 | 175 | 175 | 463 | 0.217 | 0.160(4) | 0.167 |
| Plane | readme | 7 | 105 | 105 | 144 | 0.038 | 0.0(5) | 0 |
| Car | readme | 4 | 60 | 60 | 577 | 0.267 | 0.233(1) | 0.267 |
| Beef |
Tony Bagnall | 5 | 30 | 30 | 470 | 0.467 |
0.467(0) |
0.5 |
| Coffee |
Tony Bagnall | 2 |
28 |
28 |
286 |
0.25 |
0.179(3) |
0.179 |
| OliveOil |
Tony Bagnall | 4 |
30 |
30 |
570 |
0.133 |
0.167(1) |
0.133 |
| CinC_ECG_torso | 4 | 40 | 1380 | 1639 | 0.103 | 0.07(1) | 0.349 | |
| ChlorineConcentration | Lei Li & C. Faloutsos | 3 | 467 | 3840 | 166 | 0.35 | 0.35(0) | 0.352 |
| DiatomSizeReduction | rbg-web2.rbge.org.uk/ADIAC/ | 4 | 16 | 306 | 345 | 0.065 | 0.065(0) | 0.033 |
| ECGFiveDays | physionet.org & E. Keogh | 2 | 23 | 861 | 136 | 0.203 | 0.203(0) | 0.232 |
| FacesUCR | E. Keogh | 14 | 200 | 2050 | 131 | 0.231 | 0.088(12) | 0.0951 |
| Haptics | 5 | 155 | 308 | 1092 | 0.63 | 0.588(2) | 0.623 | |
| InlineSkate | Fabian Morchen | 7 | 100 | 550 | 1882 | 0.658 | 0.613(14) | 0.616 |
| ItalyPowerDemand | JJ van Wijk | 2 | 67 | 1029 | 24 | 0.045 | 0.045(0) | 0.05 |
| MALLAT | 8 | 55 | 2345 | 1024 | 0.086 | 0.086(0) | 0.066 | |
| MedicalImages | 10 | 381 | 760 | 99 | 0.316 | 0.253(20) | 0.263 | |
| MoteStrain | 2 | 20 | 1252 | 84 | 0.121 | 0.134(1) | 0.165 | |
| SonyAIBORobot SurfaceII | (D. Vail & M. Veloso) & E. Keogh | 2 | 27 | 953 | 65 | 0.141 | 0.141(0) | 0.169 |
| SonyAIBORobot Surface | (D. Vail & M. Veloso) & E. Keogh | 2 | 20 | 601 | 70 | 0.305 | 0.305(0) | 0.275 |
| StarLightCurves | Pavlos Protopapas | 3 | 1000 | 8236 | 1024 | 0.151 | 0.095(16) | 0.093 |
| Symbols | E. Keogh & J. Brady | 6 | 25 | 995 | 398 | 0.1 | 0.062(8) | 0.05 |
| TwoLeadECG | physionet.org & E. Keogh | 2 | 23 | 1139 | 82 | 0.253 | 0.132(5) | 0.096 |
| WordsSynonyms | E. Keogh | 25 | 267 | 638 | 270 | 0.382 | 0.252(8) | 0.351 |
| Cricket_X | M. H. Ko, G. West, S. Venkatesh, and M. Kuma. | 12 | 390 | 390 | 300 | 0.426 | 0.236(7) | 0.223 |
| Cricket_Y | 12 | 390 | 390 | 300 | 0.356 | 0.197(17) | 0.208 | |
| Cricket_Z | 12 | 390 | 390 | 300 | 0.38 | 0.18(7) | 0.208 | |
| uWaveGestureLibrary_X | Rice Efficient Computing Group, Rice University | 8 | 896 | 3582 | 315 | 0.261 | 0.227(4) | 0.273 |
| uWaveGestureLibrary_Y | 8 |
896 | 3582 | 315 | 0.338 | 0.301(4) | 0.366 | |
| uWaveGestureLibrary_Z | 8 | 896 | 3582 | 315 | 0.35 | 0.322(6) | 0.342 | |
| Non-Invasive Fetal ECG Thorax1 | Bing Hu and Eamonn Keogh | 42 | 1800 | 1965 | 750 | 0.171 | 0.185(1) | 0.209 |
| Non-Invasive Fetal ECG Thorax2 | Bing Hu and Eamonn Keogh | 42 | 1800 | 1965 | 750 | 0.120 | 0.129(1) | 0.135 |
How to get the datasets:
The Synthetic Control datasets are available above, and the code to reproduce the 1-NN Euclidean Distance result is available below. For the rest of the data, read on.
In order to get the password to the data1 and data2 please carefully read the points below.
Do not share the password or datasets with others (exception, co-authors on the current paper).
If you modify the data in anyway (add noise, add warping etc), please give the modified data back to the archive before you submit your paper (that way a diligent reviewer can test your claims while the paper is under review).
Where possible, we strongly advocate testing and publishing results on all datasets (to avoid cherry-picking), unless of course you are making an explicit claim for only a certain type of data (i.e. classifying short time series). In the event you don't have space in your paper, we suggest you create an extended tech report online and point to it. Please see A Complexity-Invariant Distance Measure for Time Series in SDM 2011 (esp Fig 14) for some ideas on how to visualize the accuracy results on so many datasets.
If you have additional datasets, we ask that you donate them to the archive in our simple format.
We strongly encourage you to make only statistically significant claims about the relative performances of algorithms/distance measures. Consider the results on the Synthetic Control dataset. It would be tempting to say that unconstrained warping beats constrained warping. But unconstrained warping gets 2 wrong and constrained warping gets 5 wrong out of 300. This is not statistically significant evidence that one is better (in fact, you can show this by doing different random shuffles of the data and getting the opposite result). In contrast, either measure is better than Euclidean distance (which gets 36 wrong) using a two-tailed, paired t-test with a p-value = 0.01. We strongly advocate reading On Comparing Classifiers by Salzberg.
When you write your paper, please make reproducibility your goal. In particular, explicitly state all parameters. A good guiding principle is to ask yourself Could a smart grad student get the exact same results as claimed in this paper with a days effort?. If the answer is no, we believe that something is wrong. Help the imaginary grad student by rewriting your paper.
Where possible, make your code available (as we have done below).
If you are advocating a new distance/similarity measure, we strongly recommend you test and report the 1-NN accuracy (as above). Note that this does not preclude the addition of other of tests, however the 1-NN test has the advantage of having no parameters and allowing comparisons between methods.
Please reference the datasets in your paper as Keogh, E., Zhu, Q., Hu, B., Hao. Y., Xi, X., Wei, L. & Ratanamahatana, C. A. (2011). The UCR Time Series Classification/Clustering Homepage: www.cs.ucr.edu/~eamonn/time_series_data/
After reading the above, cut out the text to either A or B below, sign it with your full name, and email it to Dr. Keogh. If you are a grad student/post-doc, you must discuss this with your adviser first and CC him/her when requesting the password.
A) I have read the points above, and agree to all of them. I have CCed my academic advisor. Please send me the password.
B) I have read the points above, but I do not agree to all of them. In particular, I do NOT agree with... (please enumerate). Nevertheless, I want the data. Please send me the password.
| Code:e:
If you want to compare a new distance measure with the results above, all you need to do is to change one line of code in the Classification_Algorithm function! p; Note that this code is optimized for simplicity, not speed! Please do not report timing results using this code. Euclidean Distance can be speeded up using branch and bound, and DTW can be significantly speeded up using LB_Keogh. |
function UCR_time_series_test %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (C) Eamonn Keogh
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TRAIN = load('synthetic_control_TRAIN'); % Only these two lines need to be changed to test a different dataset. % TEST = load('synthetic_control_TEST' ); % Only these two lines need to be changed to test a different dataset. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TRAIN_class_labels = TRAIN(:,1); % Pull out the class labels. TRAIN(:,1) = []; % Remove class labels from training set. TEST_class_labels = TEST(:,1); % Pull out the class labels. TEST(:,1) = []; % Remove class labels from testing set. correct = 0; % Initialize the number we got correct for
i = 1 : length(TEST_class_labels) % Loop over
every instance in the test set %%%%%%%%%%%%%%%%%
Create Report
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if distance <
best_so_far |
| >>
UCR_time_series_test
1 out of 300
done |
Notes:
People that have Downloaded the Data:
|
|
|