SAX (Symbolic Aggregate approXimation):

SAX is the first symbolic representation for time series that allows for dimensionality reduction and indexing with a lower-bounding distance measure.  In classic data mining tasks such as clustering, classification, index, etc., SAX is as good as well-known representations such as Discrete Wavelet Transform (DWT) and Discrete Fourier Transform (DFT), while requiring less storage space.  In addition, the representation allows researchers to avail of the wealth of data structures and algorithms in bioinformatics or text mining, and also provides solutions to many challenges associated with current data mining tasks.  One example is motif discovery, a problem which we recently defined for time series data.  There is great potential for extending and applying the discrete representation on a wide class of data mining tasks.

Download SAX.ppt: This presentation may be useful to gain some intuition into the utility of SAX.

Example

   The following time series is converted to string "acdbbdca"

   

Relevant Publications

Matlab Code

  • Copyright, Terms of use, and Disclaimer:
    The code is made freely available for non-commercial uses only, provided that the copyright header in each file not be removed, and suitable citation(s) (see below) be made for papers published based on the code. The code is not optimized for speed.  The authors are not responsible for any errors that might occur in the code, or any harm/loss caused by use of this code. The copyright of the code is retained by the authors. By downloading the code you agree to all the terms stated above.

  • Download:
    Update Log:

      10/28/03:      Subsequence extraction capability added to timeseries2symbol.m (so all other files were updated as well)
      11/26/03:      sax_visual.m added.
      04/26/04:      SAX.ppt added.
    sax.zip contains the following files:
    • README.txt
    • sax_demo.m: Demo
    • timeseries2symbol.m: Converts a time series to SAX string(s).
    • min_dist.m: Computes a distance function between two strings, which is guaranteed to lower bound the true Euclidean distance between the two original time series.
    • mindist_demo.m: Demonstrates how mindist lower bounds the true Euclidean distance.
    • sax_visual.m: A visual comparison between SAX and PAA. Shows how SAX can represent data in finer granularity while using no more space than PAA.   Added on 11/26/03.
    SAX.ppt This presentation may be useful to gain some intuition into the utility of SAX.

Cited By:

  • Chen, J. S., Moon, Y. S. & Yeung, H. W. (2005). Palmprint Authentication Using Time Series. In proceedings of the 5th International Conference on Audio- and Video-Based Biometric Person Authentication. Hilton Rye Town, NY. July 20-22.

  • Gaber, M. M., Zaslavsky, A. & Krishnaswamy, S. (2005). Mining Data Streams: A Review. ACM SIGMOD Record, Vol. 34, No. 1. June 2005.

  • Chen, L & Ozsu, M. T. (2005). Using Multi-Scale Histograms to Answer Pattern Existence and Shape Match Queries. In proceedings of the 17th International Conference on Scientific and Statistical Database Management (SSDBM). Santa Barbara, CA. June 27-29.

  • Morchen, F. & Ultsch, A. (2005). Optimizing Time Series Discretization for Knowledge Discovery. In proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, IL. Aug 21-24.

  • Morchen, F., Ultsch, A. & Hoos, O. (2005). Extracting Interpretable Muscle Activation Patterns with Time Series Knowledge Mining. International Journal of Knowledge-Based & Intelligent Engineering Systems.

  • Morchen, F., Ultsch, A., Thies, M., Lohken, I., Nocker, M., Stamm, C., Efthymiou, N. & Kummerer, M. (2005). MusicMiner: Visualizing Timbre Distance of Music as Topographical Maps. Tech Report. Department of Mathematics and Computer Science, University of Marburg, Germany.

  • Androulakis, I. P. (2005). New Approaches for Representing, Analyzing and Visualizing Complex Kinetic Mechanisms. . In proceedings of the 15th European Symposium on Computer Aided Process Engineering. Barcelona, Spain. May 29-June 1.

  • Makio, K., Tanaka, Y. & Uehara, K. (2005). Discovery of Skills from Motion Data. Tech Report

  • Zuo, X. & Jin, X. (2005). Accurate Symbolization of Time Series. In proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hanoi, Vietnam. May 18-20.

  • Liu, Z., Yu, J. X., Lin, X., Lu, H. & Wang, W. (2005). Where Are the Motifs in Time-Series Data. In proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hanoi, Vietnam. May 18-20.

  • Megalooikonomou, V., Wang, Q., Li, G. & Faloutsos, C. (2005). Multiresolution Symbolic Representation of Time Series. In proceedings of the 21st IEEE International Conference on Data Engineering (ICDE). Tokyo, Japan. Apr 5-9.

  • Hetland, M. L & Satrom, P. (2005). Evoluntionary Rule Mining in Time Series Databases. Machine Learning.

  • Bagnall, A. & Janacek, G. (2005). Clustering Time Series with Clipped Data. Machine Learning.

  • Jalili, S. & Alipour, M. A. (2004). Incremental Relation Exploration within Urban Traffic Flows. In proceedings of the 6th International Conference on Applied Computational Intelligence. Blankenberghe, Belgium. Sept 1-3.

  • Nukoolkit, C. & Rattanamahawichai, S. (2004). Clustering and Similarity Matching of Time Series Data with Sequence Alignment. In proceedings of the 1st Thailand Computer Science Conference. Bangkok, Thailand. Dec 16-17. pp. 18-23.

  • Goebel, V. & Plagemann, T. (2004). Tutorial: Data Stream Management Systems (DSMS) - Applications, Concepts, and Systems.. In proceedings of the 2nd International Workshop on Multimedia Interactive Protocols and Systems (MIPS) Grenoble, France. Nov 16-19.

  • Silvent, A., Dojat, M. & Garbay, C. (2004). Multi-level Temporal Abstraction for Medical Scenario Construction. International Journal of Adaptive Control and Signal Processing.

  • Fu, T. C., Chung, F. L., Luk, R. & Ng, C. M. (2004). Financial Time Series Indexing Based on Low Resolution Clustering . In proceedings of the Workshop on Temporal Data Mining: Algorithms, Theory and Applications, at the 4th IEEE International Conference on Data Mining (ICDM) Brighton, UK. Nov 1.

  • Duchene, F. Garbayl, C. & Rialle, V. (2004). Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de Grenoble, France.

  • Gaudin, R. & Nicoloyannis, N. (2004). Apprentissage non supervise de series temporelles ma l'aide des k-Means et d'une nouvelle methode d'agregation de series.

  • Tan, Z. & Tung, A. H. (2004). Substructure Clustering on Sequential 3d Object Datasets . In proceedings of the 20th International Conference on Data Engineering (ICDE). Boston, MA. Mar 30 - Apr 2.

  • Wu, Y. & Chang, E. Y. (2004). Distance Function Design and Fusion for Sequence Data. In proceedings of the 13th International Conference on Information and Knowledge Management (CIKM). Washington DC. Nov 8-13.

  • Kitaguchi, S. (2004). Extracting Feature based on Motif from a Chronic Hepatitis Dataset. In proceedings of the 18th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). Kanazawa, Japan. June 2-4.

  • Megalooikonomou, V., Li, G., Wang, Q. & Faloutsos, C. (2004). A Dimensionality Reduction Technique for Efficient Similarity Analysis of Time Series Databases. In proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM). Washington, D.C. Nov 8-13.

  • Denton, A. (2004). Density-Based Clustering of Time Series Subsequences. In proceedings of the 3rd Workshop on Mining Temporal and Sequential Data (TDM), in conjunction with the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA. Aug 22.

  • Tanaka, Y. & Uehara, K. (2004). Motif Discovery Algorithm from Motion Data. In proceedings of the 18th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). Kanazawa, Japan. June 2-4.

  • Zhang, H., Ho, T. B. & Lin, M. S. (2004). A Non-Parametric Wavelet Feature Extractor for Time-Series Classification. In proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Engineering (PAKDD). May 26-28. Sydney, Australia.

  • Moerchen, F. & Ultsch, A. (2004). Mining Hierarchical Temporal Patterns in Multivariate Time Series. In proceedings of the 27th German Conference on Artificial Intelligence (KI). Sept 20-24. Ulm, Germany.

  • Morchen, F. & Ultsch, A. (2004). Discovering Temporal Knowledge in Multivariate Time Series.

  • Hofmann, U., Miloucheva, I., Pfeiffenberger, T. & Strohmeier, F. (2004). Active Monitoring Toolkit for Longterm QoS Analysis in Large Scale Internet. In proceedings of the 2nd International Workshop on Inter-Domain Performance and Simulation (IPS). March 22-23. Budapest, Hungary.

  • Rombo, S. & Terracina, G. (2004). Discovering Representative Models in Large Time Series Databases. In proceedings of the 6th International Conference On Flexible Query Answering Systems (FQAS 2004). June 24-26. Lyon, France. Lecture Notes in Computer Science, Springer-Verlag.

  • Udechukwu, A., Barker, K. & Alhajj, R. (2004). An Efficient Framework for Time Series Trend Mining. In proceedings of the 6th International Conference on Enterprise Information Systems (ICEIS 2004). April 14-17. Porto, Portugal.

  • Udechukwu, A., Barker, K. & Alhajj, R. (2004). Discovering All Frequent Trends in Time Series. In proceedings of the 2004 Winter International Symposium on Information and Communication Technologies (WISICT 2004). Jan 5-8. Cancun, Mexico.

  • Bagnall, A. J. & Janakec, G. (2004). Clustering Time Series from ARMA Models with Clipped Data. Technical Report CMP-C04-01. School of Computing Science, University of East Anglia. February.

  • Celly, B. & Zordan, V. B. (2004). Animated People Textures. In proceedings of the 17th International Conference on Computer Animation and Social Agents (CASA 2004). July 7-9. Geneva, Switzerland.

  • Somayajulu G. Sripada, Ehud Reiter, Jim Hunter and Jin Yu (2003). Generating English Summaries of Time Series Data using the Gricean Maxims. In the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. August 24 - 27. Washington, DC, USA.

  • Tanaka, Y. & Uehara, K. (2003). Discover Motifs in Multi Dimensional Time-Series Using the Principal Component Analysis and the MDL Principle. In proceedings of the 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition. pp.252-265.

  • Ohsaki, M., Sato, Y., Yokoi, H. & Yamaguchi, T. (2003). A Rule Discovery Support System for Sequential Medical Data In the Case Study of a Chronic Hepatitis Dataset. ECML 2003.

  • Chen, L., Ozsu, T. & Oria, V. (2003). Symbolic Representation and Retrieval of Moving Object Trajectories. Technical Report CS-2003-30. University of Waterloo.

  • Chen, L. & Ozsu, T. (2003). Multi-Scale Histograms for Answering Queries over Time Series Data. In proceedings of the 20th International Conference on Data Engineering (ICDE). Mar 30 - Apr 2. Boston, MA.

  • Silvent, A. S., Carbay, C., Carry, P. Y. & Dojat, M. (2003). Data, Information and Knowledge for Medical Scenario Construction. In proceedings of the Intelligent Data Analysis In Medicine and Pharmacology Workshop (IDAMAP 2003). October. Protaras, Cyprus.

Contact Info

You are visitor #:


Page last updated: Apr 26, 2004