TY - JOUR
T1 - Ranking and significance of variable-length similarity-based time series motifs
AU - Serrà, J
AU - Serra, I
AU - Corral, A
AU - Arcos, JL
PY - 2016/8/15
Y1 - 2016/8/15
N2 - The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank (previously obtained) motifs of different lengths and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could have an impact beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context and in diverse application scenarios.
AB - The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank (previously obtained) motifs of different lengths and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could have an impact beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context and in diverse application scenarios.
KW - Beta distribution
KW - Distance modeling
KW - Motif ranking
KW - Time series
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=uab_pure&SrcAuth=WosAPI&KeyUT=WOS:000374811000035&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1016/j.eswa.2016.02.026
DO - 10.1016/j.eswa.2016.02.026
M3 - Article
SN - 0957-4174
VL - 55
SP - 452
EP - 460
JO - Expert Systems with Applications
JF - Expert Systems with Applications
ER -