Massive time series correlation similarity computation
Abstract:
A system, computer program product, and method for computing a correlation matrix with respect to massive of time-series is described where a threshold ε is specified. The correlations greater than ε do not need to be computed. A distance tree is constructed and used. The distance tree organizes the time-series by their correlation estimations. The correlation similarity is computed with MapReduce function by taking advantage of the distance tree. In an efficient MapReduce manner, there is a small I/O waste for overlapped partition. The load is balanced for uneven data distribution and there is early pruning for unnecessary computation. There is no reliance on dimensional reduction and no reliance on coordinates.
Public/Granted literature
Information query
Patent Agency Ranking
0/0