
Photo by Amos Nachoum |
The
UCR Matrix Profile Page
Funded by NSF
IIS 1161997 II, IIS 1510741,
CNS 1544969
|
 |
observations
of the magnetosphere collected by the Cassini spacecraft in orbit
around Saturn.. ..in this case, the best-performing method was the
Matrix Profile.. Kiri L. Wagstaf et. al. NASA JPL.2020
(for an industrial IoT problem) Matrix Profiles perform well with almost no parameterisation needed. Anton et al ICDM 2018.
While
there will never be a mathematical silver bullet, we have discovered
that the Matrix Profile, a novel algorithm developed by the Keogh
research group at UC-Riverside, is a powerful tool. (full post). Andrew Van Benschoten, lead engineer at Target.
If
anybody has ever asked you to analyze time series data and to look for
new insights then (the Matrix Profile) is definitely the open source
tool that you'll want to add to your arsenal Sean Law, Ameritrade. NABD 2019.
(for)
intrusion detection in industrial network traffic, distances as
calculated with Matrix Profiles rises significantly during the attacks.
..as a result, time series-based anomaly detection methods are capable
of detecting deviations and anomalies. Schotten (2019).
The MatrixProfile technique is the state-of-the-art anomaly detection
technique for continuous time series. Bart Goethals et. al. (ECML-PKDD 2019).
Based on
the concept of Matrix Profile ..without relying on time series
synchronization.. the Railway Technologies Laboratory of Virginia Tech
has been developing an automated onboard data analysis for the
maintenance track system Ahmadian et. al. JRC2019
Matrix Profile is the state-of-the-art similarity-based outlier detection method. Christian Jensen et. al. IJCAI-19
we use the exact method based on the Matrix Profile (to assess the effectiveness of therapy)
Funkner et al Procedia 2019.
Recently, a research group from UCR have proposed a powerful tool - the Matrix Profile (MP) as a primitive...(we use it for) fault detection Jing Zhang et al. ICPHM 2019
Inspecting
both graphs one can see that the matrix-profile algorithm was able to
identify regions where there is a change on the power level over
the observed band. F Lobao 2019.
RAMP
builds upon an existing time series data analysis technique called
Matrix Profile to detect anomalous distances...collected from
scientific workflows in an online manner. Herath et. al. IEEE Big Data 2019
Based on
obtained results for the considered data set, matrix profiles turned
out to be most suitable for the task of anomaly detection Lohfink et al. VISSEC2019
The computation speed and exactness of the Matrix Profile make it a powerful tool and (our) results back this. Barry & Crane AICS 2019
(examining) manufacturing batches considering raw amperage (we found that the) Matrix Profile highlights anomalies Hillion & O'Connell of TIBCO Data Science. re:Invent 2019. [video]
we use the exact method based on the matrix profile to
search for motifs (that) can be used to monitor the patient's condition, to
assess the effectiveness of therapy or to assess the physician's actions. Funknera
et al. YSCCS 2019
(The Matrix Profile is a)
similarity join to measure the
similarity between two given sequences. we opt for the median of the profile
array as the representative distance (3D Dancing Move Synthesis from Music)"
Anh
et al. IEEE Robotics and Automation Letters (
video)
We were amazed by the power of MP and seek to incorporate it into our framework Ye and Ageno.
..adopting the concept of (the) Matrix Profile, we conduct the first attempt to.. J. Zuo et. al. Big Data 20019
The
accuracies obtained ...indicate that the Matrix Profile is useful for
the task at hand instead of using the CNN features directly Dhruv Batheja
To speed up online bad PMU data detection a fast discovery strategy is introduced based on (the Matrix Profile) Zhu and Hill.
Specifically, ALDI uses the matrix profile method to quantify the similarities of daily subsequences in time series meter data, Zoltan Nagy, Energy & Buildings (2020)
Our two-fold approach first leverages the Matrix Profile technique for time series data mining.. Nichiforov 2020.
the
class of matrix profile algorithms.. ..is a promising approach, as it
allows simplified post-processing and analysis steps by examining the
resulting matrix profile structureA. Raoofy et al.
We only
require information about the time of several critical incidents to
train our methods, as previously. To this end, we employ the Matrix
Profle.. Bellas. et al.
a
matrix-profile based algorithm applied across all trajectory data
against a validation set revealed four significant motifs which we
defined as motif A, B, C and D.. Fernandez Alvarez 2020.
The main building block of this (game analytics) algorithm is the matrix profile, Saadat and Sukthankar AAAI2020
News:
- Happy to hear that Sean Law has released STUMPY, a powerful and scalable Python Matrix Profile github library. Sean is completely independent from the UCR team, but has been a great advocate of the MP.
- I am delighted to announce the creation of the Matrix Profile Foundation. This is an independent volunteer organization devoted to "..
facilitate community awareness and adoption, we develop and maintain
multiple open-source implementations of the Matrix Profile algorithms.".
Note that this organization is completely independent of Dr. Keogh's
Lab. The only connection between the two groups is that they see sneak
peaks at Matrix Profile 'works in progress'.
The Matrix Profile (and the algorithms to compute it: STAMP,
STAMPI,
STOMP, SCRIMP, SCRIMP++, SWAMP and GPU-STOMP), has the potential to revolutionize time series
data mining because of its generality, versatility, simplicity and
scalability. In particular it has implications for time
series
motif discovery, time series joins, shapelet discovery (classification), density
estimation, semantic segmentation, visualization, rule discovery, clustering etc (note, for pure similarity search, we suggest you see MASS for Euclidean Distance, and the UCR Suite for DTW)
Our overarching claim has three parts:
1) Given only the Matrix
Profile, most time series data mining tasks are trivial.
2) The Matrix Profile can be computed very efficiently.
3) Algorithms that are built on top the Matrix Profile inherit all its
desirable properties. For
example, we can use the Matrix Profile to find time series motifs.
Because the Matrix Profile can be incrementally computed, we have the
first incremental time series motif discovery algorithm. Because the
Matrix Profile can be computed in an anytime fashion, we have the first
anytime time series motif discovery algorithm. Because the Matrix
Profile can leverage GPUs, we have..
The
advantages of using the Matrix Profile (over hashing, indexing, brute
forcing a dimensionality reduced representation etc.) for most time
series data mining tasks include:
- It is exact:
For motif discovery, discord discovery, time series joins etc., the
Matrix Profile based methods provide no false positives or false
dismissals.
-
It is simple and parameter-free:
In contrast, the more general spatial access method algorithms
typically require building and tuning spatial access methods and/or
hash function.
-
It is space efficient:
Matrix Profile construction algorithms requires an inconsequential
space overhead, just linear in the time series length with a small
constant factor, allowing massive datasets to be processed in main memory.
-
It allows anytime algorithms:
While our exact algorithms are extremely scalable, for extremely large
datasets we can compute the Matrix Profile in an anytime fashion,
allowing ultra-fast approximate solutions.
-
It is incrementally maintainable:
Having computed the Matrix Profile for a dataset, we can incrementally
update it very efficiently. In many domains this means we can
effectively maintain exact joins/motifs/discords on streaming data
forever.
-
It does
not require the user to set similarity/distance thresholds:
For time series joins, the Matrix Profile provides full joins,
eliminating the need to specify a similarity threshold, which is an
unintuitive task for time series.
-
It can leverage hardware: Matrix Profile construction is embarrassingly parallelizable, both on multicore
processors and in distributed systems.
-
It has time complexity that is constant
in subsequence length:
This is a very unusual and desirable property; all known time series
join/motif/discord algorithms scale poorly as the subsequence length
grows. In contrast, we have shown time series joins/motifs with
subsequences lengths up to 100,000, at least two orders of magnitude
longer than any other work we are aware of.
-
It can be constructed in deterministic time:
All join/motif/discord algorithms we are aware of can radically
different times to finish on two (even slightly) different datasets. In
contrast, given only the length of the time series, we can precisely
predict in advance how long it will take to compute the Matrix Profile
-
It can handle missing data: Even
in the presence of missing data, we can provide answers which are
guaranteed to have no false negatives.
Given all these features, the Matrix Profile has implications for many,
perhaps most, time series data mining tasks.
Papers:
- Matrix
Profile I: All Pairs Similarity Joins for Time Series: A Unifying View
that Includes Motifs, Discords and Shapelets. Chin-Chia
Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding,
Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh
(2016). IEEE ICDM
2016. [pdf] [slides]
- Matrix
Profile II: Exploiting a Novel Algorithm and GPUs to break the one
Hundred Million Barrier for Time Series Motifs and Joins. Yan
Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh,
Gareth Funning, Abdullah Mueen, Philip Berisk and Eamonn Keogh (2016). EEE ICDM 2016. [pdf] [slides] Shortlisted for best paper award.
- Matrix Profile III: The
Matrix Profile allows Visualization of Salient Subsequences in Massive
Time Series. Chin-Chia Michael Yeh, Helga Van Herle, Eamonn Keogh (2016). IEEE ICDM 2016. [pdf] [slides] Supporting Page.
- Matrix Profile IIII: Using Weakly Labeled Time Series to Predict Outcomes. Chin-Chia Michael Yeh, Nickolas Kavantzas and; Eamonn Keogh. VLDB 2017 [pdf] Munich Germany.
- Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. Hoang Anh Dau and Eamonn Keogh. [pdf] KDD'17, Halifax, Canada.
- Matrix Profile
VI: Meaningful Multidimensional Motif Discovery. Chin-Chia Michael Yeh, Nickolas Kavantzas, Eamonn Keogh. [pdf] ICDM 2017.
- Matrix
Profile VII: Time Series Chains: A New Primitive for Time Series Data
Mining. Yan Zhu, Makoto Imamura, Daniel Nikovski, and Eamonn Keogh. [pdf] ICDM 2017. Winner best paper award. [slides]
- Matrix Profile VIII: Domain Agnostic
Online Semantic Segmentation at Superhuman Performance Levels. Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar,
Liudmila Ulanova, and Eamonn Keogh. [pdf] ICDM
2017.
- Matrix Profile IX: Admissible Time Series Motif Discovery with Missing Data [temp link]. Yan Zhu, Abdullah Mueen and Eamonn Keogh.TKDE 2020
- Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series. Michele Linardi ,Yan Zhu ,Themis Palpanas and Eamonn Keogh. SIGMOD 2018.
- Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed.
Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar and Eamonn Keogh, ICDM 2018. [PDF]
- Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios. Shaghayegh Gharghabi, Shima Imani, Anthony Bagnall, Amirali Darvishzadeh, Eamonn Keogh. ICDM 2018. [expanded version PDF]
- Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining.
Shima Imani, Frank Madrid, Wei Ding, Scott Crouter, Eamonn Keogh. IEEE Big Knowledge 2018. [expanded version PDF].
- Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs: Breaking the Quintillion Pairwise Comparisons a Day Barrier. SoCC 2019. [pdf] [This paper had an interesting history]
- Matrix Profile XV: Time Series Consensus Motifs: A New Primitive for Finding Repeated Structure in Time Series Sets. Kaveh Kamgar, Shaghayegh Gharghabi, and Eamonn Keogh (2019). IEEE ICDM 2019. [pdf]
- Matrix Profile XVI: Time Series Semantic Motifs: A New Primitive for Finding Higher-Level Structure in Time Series. Shima Imani and Eamonn Keogh (2019). IEEE ICDM 2019. [pdf]
- Matrix Profile XVII:Indexing the Matrix Profile to Allow Arbitrary Range Queries. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Eamonn J. Keogh ICDE 2020.
- Matrix Profile XVIII: Time Series Mining in the Face of Fast Moving Streams using a Learned Approximate Matrix Profile. Zachary
Zimmerman, Nader Shakibay Senobari, Gareth Funning, Evangelos
Papalexakis, Samet Oymak, Philip Brisk, and Eamonn Keogh (2019). IEEE
ICDM 2019. [pdf]
- Matrix Profile XIX: Efficient and Effective Labeling of Massive Time Series Archives. Frank
Madrid, Shailendra Singh, Quentin Chesnais, Kerry Mauck and Eamonn
Keogh. DSAA 2019: International Conference on Data Science and Advanced
Analytics [pdf].
- Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile. Frank Madrid, Shima Imani, Ryan Mercer, Zacharay Zimmerman, Nader Shakibay, Eamonn Keogh. IEEE Big Knowledge 2019 [pdf]
- Matrix Profile XXI: MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives. Takaaki Nakamura, Makoto Imamura, Ryan Mercer and Eamonn Keogh. ICDM 2020 [pdf[
- Matrix Profile XXII: Exact Discovery of Time Series Motifs under DTW. Sara Alaee, Ryan Mercer, Kaveh Kamgar, Eamonn Keogh. ICDM 2020 [pdf]
- Time Series Joins, Motifs, Discords and Shapelets: a Unifying View that Exploits the Matrix Profile. Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum,
Yifei Ding, Hoang Anh Dau, Zachary Zimmerman, Diego Furtado Silva,
Abdullah Mueen, Eamonn Keogh. [pdf] Data Mining and Knowledge Discovery.
- Exploiting a Novel Algorithm and GPUs to Break the Ten Quadrillion Pairwise Comparisons Barrier for Time Series Motifs and Joins. Yan
Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh,
Gareth Funning, Abdullah Mueen, Philip Brisk and Eamonn Keogh. KAIS Journal [pdf]
- SiMPle: Assessing Music Similarity Using Subsequences. Diego Silva, Chin-Chia Michael Yeh, Gustavo
Batista and Eamonn Keogh (2016).
ISMIR 2016. [pdf]
- Fast Similarity Matrix Profile for Music Analysis and
Exploration. Diego Furtado
Silva, Chin-Chia Michael Yeh, Yan Zhu, Gustavo E. A. P. A. Batista, Eamonn J.
Keogh: (2017). IEEE Transactions on Multimedia [pdf].
- Time Series Chains: A Novel
Tool for Time Series Data Mining.Yan Zhu, Makoto Imamura, Daniel Nikovski, and Eamonn
Keogh. "Best of" track in IJCAI 2018.
- Introducing time series chains: a new primitive for time series data mining. Yan Zhu, Makoto Imamura, Daniel Nikovski, Eamonn J. Keogh: Knowl. Inf. Syst. 60(2): 1135-1161 (2019)
- VALMOD: A Suite for Easy and Exact Detection of Variable
Length Motifs in Data Series (2018). Michele Linardi, Yan Zhu, Themis Palpanas,
Eamonn J. Keogh: Demonstration track. SIGMOD Conference 2018: 1757-1760.
- Domain Agnostic Online Semantic Segmentation for
Multi-Dimensional Time Series. Shaghayegh Gharghabi, Chin-Chia Michael Yeh,
Yifei Ding, Wei Ding, Paul Hibbing, Samuel LaMunion, Andrew Kaplan, Scott E.
Crouter, Eamonn Keogh. Data Mining and Knowledge Discovery.
- Super‐Efficient Cross‐Correlation (SEC‐C): A Fast Matched Filtering Code Suitable for Desktop Computers. NS Senobari, GJ Funning, E Keogh, Y Zhu, CCM Yeh, Z Zimmerman. (2018) Seismological Research Letters [pdf].
- The Swiss Army Knife of Time Series Data Mining: Ten Useful Things you can do with the Matrix Profile and Ten Lines of Code.
Yan Zhu, Shaghayegh Gharghabi, Diego Furtado Silva, Hoang Anh Dau,
Chin-Chia Michael Yeh, Nader Shakibay Senobari, Abdulaziz Almaslukh,
Kaveh Kamgar, Zachary Zimmerman, Gareth Funning, Abdullah Mueen, Eamonn
Keogh.Data Mining and Knowledge Discovery [ pdf].
- Using the similarity Matrix Profile to investigate foreshock behavior of the 2004 Parkfield earthquake.Shakibay Senobari, N., Funning, G., Zimmerman, Z. Zhu, Y. and Keogh, E. (2018). American Geophysical Union [pdf]
- Discovering and Labeling Power System Events in Synchrophasor Data with Matrix Profile. Jie
Shi, Nanpeng Yu, Eamonn Keogh, Heng (Kevin) Chen, Koji Yamashita. 2019
IEEE Sustainable Power & Energy Conference. Selected as excellent
paper.
- Matrix profile goes MAD: variable-length motif and discord discovery in data series. Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn J. Keogh: Data Min. Knowl. Discov. 34(4): 1022-1071 (2020)
- An ultra-fast time series distance measure to allow data mining in more complex real-world deployments.
Shaghayegh Gharghabi, Shima Imani, Anthony J. Bagnall, Amirali
Darvishzadeh, Eamonn J. Keogh: Data Min. Knowl. Discov. 34(4):
1104-1135 (2020)
Tutorials:
The First Matrix Profile Tutorial
Part 1: PPT or PDF
Part 2: PPT or PDF
|

|
100 Time Series Data Mining Questions (with Answers!)
Just the PDF
Code and Data in a ZIP
This document contains one hundred simple time series questions such as Have we ever seen a pattern that looks just like this? and Is there any pattern that is common to these two time series?, with worked examples of the answers, with all (simple!) code and data.
|
|
Code by the Community: Here we list some MP implementations by others. Naturally we neither take credit (or blame) for any of this work. Python, R, Golang, Sean Law created an open sourced a distributed and multicore Python library. Check out the Matrix Profile Foundation. Parallel implementations of the Matrix Profile SCRIMP++ algorithm for high performance computing clusters based on MPI.
Code by the UCR Team:
GPU/CPU Code: This is the SCAMP source code on GitHub. The fastest matrix profile code on the planet.
Matlab Code: Version 3.0 (Faster code is avaible, see 100 Questions above. But we strongly recommend you start with the below).
Please
note this code is not STAMP or STOMP, but SCRIMP++
(which appears in Matrix Profile XI), which is as fast as STOMP,
but also an anytime algorithm. The code is wrapped inside a simple Matlab GUI
(that adds a lot of time overhead), to allow non-specailists to
interact with their data. If you are writing a paper, please do not
comparing timing results to this version, it would not be fair to us (contact us for the faster, but less user friendly code).
First read this, then download this data (some penguin data) and download the code into your Matlab path:
At the command line, type...
>> load MP_first_test_penguin_sample
>>
[matrixProfile, profileIndex, motifIndex, discordIndex] =
interactiveMatrixProfileVer3_website(smooth(penguin_sample,10) ,800);
As
you can see from the image below, even though the data has 109,043
datapoints, and the subsequence is high dimensional (800 datapoints),
in about a second we have already found some very interesting motifs:
The first is a valley happens during a dive.
Happy motif hunting, from the Matrix Profile team.
