A Practical Time-Series Tutorial with Matlab

PKDD 2005

 

Michalis Vlachos,

IBM T.J. Watson Research Center, Hawthorne, NY, 10532

Tutorial Homepage: http://www.cs.ucr.edu/~mvlachos/PKDD05

 

Slides in PDF and Powerpoint format:

 

If you find the above slides useful and/or use them in the classroom,
I would appreciate an email with your comments.

 

Abstract:

Time-series are probably the most prevalent form of data storage and representation in the majority of scientific and financial fields. Examples can include industrial or environmental measurements, medical monitoring, stock market analysis, etc. However, in order to efficiently search and explore the ever-increasing amount of collected data, one needs to deploy intelligent techniques for data compression/representation, data organization/pruning and similarity characterization. This tutorial will demonstrate how the above tasks could be performed within the environment of the Matlab programming language and software tool, which is easily accessible in many academic institutions.

 

The tutorial will consist of two parts. The first part will cover the basics of the Matlab programming language and environment. The second part will demonstrate how to use Matlab in order to accomplish various time-series analysis and matching techniques, covering a variety of rudimentary and advanced methods. The most influential and state-of-the-art techniques from the most recent data-mining/database conferences will also be explained. Topics that will be addressed include:

 

  • Time-Series representations (Fourier, Wavelet, SVD, Symbolic)
  • Distance Functions and Lower Bounding (Euclidean, Time-Warping)
  • Clustering/Classification/Visualization (NN, Dendograms, kMeans, etc)
  • Test Cases and Applications

 

By the end of the tutorial the attendees will have a basic understanding of the Matlab language, and how it could be applied for solving various time-series analysis and matching problems.

 

 

Who should attend?

The goal of this tutorial is to convey basic and advanced time-series/data-mining techniques to its audience. Therefore, this tutorial addresses a wide audience such as:

  • Graduate and Undergraduate Students

  • Data Mining Researchers/Educators

  • Industry Developers

 


Outline of the Tutorial

 

Part A. Matlab Intro

·        Introduction

o       Matlab Environment

o       Script Execution

·        Basics

o       Data Structures (arrays and cells)

o       Array operations

o       Loop Structures/ Conditional Structures

o       Vectorizations

·        Input/Output (reading/writing files)

·        Debugging

·        Visualization

o       2D, 3D plots

o       exporting figures (eps, jpg, etc)

o       advanced techniques (latex etc)

 

Part B. Time-Series Analysis & Matching

·        Introduction and Motivation           

o       Why time-series are important. Applications.

·        Representation

o       Fourier

o       Wavelets

o       APCA

o       Piecewise linear

o       Singular Value Decomposition

o       Symbolic Approximations

o       Bounding Boxes (application for multiple dimensions)

·        Distance Functions

o       Euclidean (2 vector version, multi-vector optimization)

§         Parseval's Theorem

§         Euclidean in Time and Frequency

o       Correlation

o       Warping

o       LCSS

·        Lower Bounding

o       Speeding up Euclidean

o       Speeding up LCSS, DTW

·        Clustering/Classification/Visualization

o       Nearest Neighbor Classifier

o       Dendrograms

o       kMeans

·        Test Cases/Applications