DISTRIBUTED DATABASE JOB DATA SKEW DETECTION

    公开(公告)号:EP3375140A1

    公开(公告)日:2018-09-19

    申请号:EP15908108.2

    申请日:2015-11-13

    申请人: eBay Inc.

    IPC分类号: H04L12/24

    摘要: A system and method for identifying whether data skew is causing delays in a map phase and/or a reduce phase of a query of a distributed database. The system and method identify the values of various metrics relating to a database query. These metrics include map phase and reduce phase durations and various related metrics. The system and method gather statistics of multiple queries to determine correlation levels between the metrics and the map phase and reduce phase durations. Based on the statistics, the system and method determine whether one or both of the map and reduce phases for a query/response are taking longer than expected. If the durations are longer than expected, the system identifies the delay as caused by data skew and informs the originator of the query.