Photo by Amos Nachoum

The UCR Matrix Profile Page

Funded by NSF  IIS 1161997 II, IIS 1510741, CNS 1544969

"(for an industrial IoT problem) Matrix Profiles perform well with almost no parameterisation needed." Anton et al ICDM 2018.
"While there will never be a mathematical silver bullet, we have discovered that the Matrix Profile, a novel algorithm developed by the Keogh research group at UC-Riverside, is a powerful tool." (full post). Andrew Van Benschoten, lead engineer at Target.
" If anybody has ever asked you to analyze time series data and to look for new insights then (the Matrix Profile) is definitely the open source tool that you'll want to add to your arsenal" Sean Law, Ameritrade. NABD 2019.
"(for) intrusion detection in industrial network traffic, distances as calculated with Matrix Profiles rises significantly during the attacks. a result, time series-based anomaly detection methods are capable of detecting deviations and anomalies.Schotten (2019).
"The MatrixProfile technique  is the state-of-the-art anomaly detection technique for continuous time series.Bart Goethals et. al. (ECML-PKDD 2019).
"Based on the concept of Matrix Profile ..without relying on time series synchronization.. the Railway Technologies Laboratory of Virginia Tech has been developing an automated onboard data analysis for the maintenance track system". Ahmadian et. al. JRC2019
"Matrix Profile is the state-of-the-art similarity-based outlier detection method".  Christian Jensen et. al. IJCAI-19
we use the exact method based on the Matrix Profile (to assess the effectiveness of therapy)" Funkner et al Procedia 2019.
"Recently, a research group from UCR have proposed a powerful tool - the Matrix Profile (MP) as a primitive...(we use it for) fault detection" Jing Zhang et al. ICPHM 2019
"Inspecting both graphs one can see that the matrix-profile algorithm was able to identify regions where there is a  change on the power level over the observed band." F Lobao 2019.
"RAMP builds upon an existing time series data analysis technique called Matrix Profile to detect anomalous distances...collected from scientific workflows in an online manner." Herath et. al. IEEE Big Data 2019
"Based on obtained results for the considered data set, matrix profiles turned out to be most suitable for the task of anomaly detection" Lohfink et al. VISSEC2019
"The computation speed and exactness of the Matrix Profile make it a powerful tool and (our) results back this." Barry & Crane AICS 2019
"(examining) manufacturing batches considering raw amperage (we found that the) Matrix Profile highlights anomalies" Hillion & O'Connell of TIBCO Data Science. re:Invent 2019. [video]
"we use the exact method based on the matrix profile to search for motifs (that) can be used to monitor the patient's condition, to assess the effectiveness of therapy or to assess the physician's actions". Funknera et al. YSCCS 2019


The Matrix Profile (and the algorithms to compute it: STAMP, STAMPI, STOMP, SCRIMP, SCRIMP++  and GPU-STOMP), has the potential to revolutionize time series data mining because of its generality, versatility, simplicity and scalability.  In particular it has implications for time series motif discovery, time series joins, shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery, clustering etc (note, for pure similarity search, we suggest you see MASS for Euclidean Distance, and the UCR Suite for DTW)

Our overarching claim has three parts:

1) Given only the Matrix Profile, most time series data mining tasks are trivial.
2) The Matrix Profile can be computed very efficiently.
3) Algorithms that are built on top the Matrix Profile inherit all its desirable properties. For example, we can use the Matrix Profile to find time series motifs. Because the Matrix Profile can be incrementally computed, we have the first incremental time series motif discovery algorithm. Because the Matrix Profile can be computed in an anytime fashion, we have the first anytime time series motif discovery algorithm. Because the Matrix Profile can leverage GPUs, we have..
The advantages of using the Matrix Profile (over hashing, indexing, brute forcing a dimensionality reduced representation etc.) for most time series data mining tasks include:

  1. It is exact: For motif discovery, discord discovery, time series joins etc., the Matrix Profile based methods provide no false positives or false dismissals.
  2. It is simple and parameter-free: In contrast, the more general spatial access method algorithms typically require building and tuning spatial access methods and/or hash function.
  3. It is space efficient: Matrix Profile construction algorithms requires an inconsequential space overhead, just linear in the time series length with a small constant factor, allowing massive datasets to be processed in main memory.
  4. It allows anytime algorithms: While our exact algorithms are extremely scalable, for extremely large datasets we can compute the Matrix Profile in an anytime fashion, allowing ultra-fast approximate solutions.
  5. It is incrementally maintainable: Having computed the Matrix Profile for a dataset, we can incrementally update it very efficiently. In many domains this means we can effectively maintain exact joins/motifs/discords on streaming data forever.
  6. It does not require the user to set similarity/distance thresholds: For time series joins, the Matrix Profile provides full joins, eliminating the need to specify a similarity threshold, which is an unintuitive task for time series.
  7. It can leverage hardware: Matrix Profile construction is embarrassingly parallelizable, both on multicore processors and in distributed systems.
  8. It has time complexity that is constant in subsequence length: This is a very unusual and desirable property; all known time series join/motif/discord algorithms scale poorly as the subsequence length grows. In contrast, we have shown time series joins/motifs with subsequences lengths up to 100,000, at least two orders of magnitude longer than any other work we are aware of.
  9. It can be constructed in deterministic time: All join/motif/discord algorithms we are aware of can radically different times to finish on two (even slightly) different datasets. In contrast, given only the length of the time series, we can precisely predict in advance how long it will take to compute the Matrix Profile
  10. It can handle missing data: Even in the presence of missing data, we can provide answers which are guaranteed to have no false negatives.

Given all these features, the Matrix Profile has implications for many, perhaps most, time series data mining tasks.


The First Matrix Profile Tutorial

Part 1: PPT or PDF
Part 2: PPT or PDF

MP Tutorial

100 Time Series Data Mining Questions (with Answers!)
Just the PDF
Code and Data in a ZIP

This document contains one hundred simple time series questions such as Have we ever seen a pattern that looks just like this? and Is there any pattern that is common to these two time series?, with worked examples of the answers, with all (simple!) code and data.

100 questions


Code by the Community
: Here we list some MP implementations by others. Naturally we neither take credit (or blame) for any of this work.  Python, R, Golang, Sean Law created an open sourced a distributed and multicore Python library. Check out the Matrix Profile Foundation. Parallel implementations of the Matrix Profile SCRIMP++ algorithm for high performance computing clusters based on MPI.

Code by the UCR Team:

GPU/CPU Code: This is the SCAMP source code on GitHub. The fastest matrix profile code on the planet.

Matlab Code: Version 3.0    
(Faster code is avaible, see 100 Questions above. But we strongly recommend you start with the below).

Please note this code is not STAMP or STOMP, but SCRIMP++ (which appears in Matrix Profile XI), which is as fast as STOMP, but also an anytime algorithm. The code is wrapped inside a simple Matlab GUI (that adds a lot of time overhead), to allow non-specailists to interact with their data. If you are writing a paper, please do not comparing timing results to this version, it would not be fair to us (contact us for the faster, but less user friendly code).

First read this, then download this data (some penguin data) and download the code into your Matlab path:
At the command line, type...

>> load MP_first_test_penguin_sample
>> [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfileVer3_website(smooth(penguin_sample,10) ,800);

As you can see from the image below, even though the data has 109,043 datapoints, and the subsequence is high dimensional (800 datapoints), in about a second we have already found some very interesting motifs:
The first is a valley happens during a dive.   

Happy motif hunting, from the Matrix Profile team

GUI  of Stamp Tool