DISTRIBUTED DATABASE JOB DATA SKEW DETECTION

    公开(公告)号:EP3375140A1

    公开(公告)日:2018-09-19

    申请号:EP15908108.2

    申请日:2015-11-13

    申请人: eBay Inc.

    IPC分类号: H04L12/24

    摘要: A system and method for identifying whether data skew is causing delays in a map phase and/or a reduce phase of a query of a distributed database. The system and method identify the values of various metrics relating to a database query. These metrics include map phase and reduce phase durations and various related metrics. The system and method gather statistics of multiple queries to determine correlation levels between the metrics and the map phase and reduce phase durations. Based on the statistics, the system and method determine whether one or both of the map and reduce phases for a query/response are taking longer than expected. If the durations are longer than expected, the system identifies the delay as caused by data skew and informs the originator of the query.

    NEAR REAL-TIME FEATURE SIMULATION FOR ONLINE/OFFLINE POINT-IN-TIME DATA PARITY

    公开(公告)号:EP4446950A1

    公开(公告)日:2024-10-16

    申请号:EP24166265.9

    申请日:2024-03-26

    申请人: eBay Inc.

    IPC分类号: G06N20/00

    摘要: Near real-time feature simulation for online/offline point-in-time data parity is described. A computing device may assign, to respective events from a series of events, a series of time stamps associated with a near real-time (NRT) variable. The computing device may simulate a delay latency associated with processing the respective events via an online processing environment based on the series of time stamps. The computing device may provide the series of events and the simulated delay latency to a machine-learning model configured to model an outcome of the series of events using the simulated delay latency.