PUNCTUATION CONTROLLED MACHINE LEARNING MODEL TEMPORAL VALIDITY

    公开(公告)号:US20210182619A1

    公开(公告)日:2021-06-17

    申请号:US16713761

    申请日:2019-12-13

    IPC分类号: G06K9/62 G06F16/907 G06N20/00

    摘要: Data events of an event stream are processed in accordance with temporally valid machine learning models. A streaming node may receive data events via an event stream. Each data event may be associated with a timestamp. The streaming node may also utilize punctuation events that specify the temporal validity of available machine learning models. The streaming node performs a temporal join operation for each data event based on its timestamp and the temporal validity. If the data event's timestamp is less than or equal to the punctuation event's timestamp, the data event is provided to the temporally valid machine learning model for processing thereby. If the data event's timestamp is greater than the punctuation event's timestamp, the data event is held until a subsequent punctuation event specifying a later timestamp is received.

    Dynamic scaling for data processing streaming system

    公开(公告)号:US11095522B2

    公开(公告)日:2021-08-17

    申请号:US16547399

    申请日:2019-08-21

    IPC分类号: H04L29/06 H04L12/24

    摘要: Described herein is a system and method for dynamically scaling a stream processing system (e.g., “exactly once” data stream processing system). Various parameter(s) (e.g., user-configurable capacity, real-time load metrics, and/or performance counters) can be used to dynamically scale in and/or scale out the “exactly once” stream processing system without system restart. Delay introduced by this scaling operation can be minimized by utilizing a combination of mutable process topology (which can dynamically assign certain parts of the system to a new host machine) and controllable streaming processor movement with checkpoints and the streaming protocol controlled recovery which still enforces the “exactly once” delivery metric.

    Coded stream processing
    3.
    发明授权

    公开(公告)号:US10998919B2

    公开(公告)日:2021-05-04

    申请号:US16590461

    申请日:2019-10-02

    IPC分类号: H03M13/09 G06F9/54

    摘要: Described herein is a system and method for coded streaming data to facilitate recovery from failed or slow processor(s). A batch of processing stream data can be partitioned into a plurality of data chunks. Parity chunk(s) for the plurality of data chunks. The plurality of data chunks and the parity chunk(s) can be provided to processors for processing. Processed data of at least some (e.g., one or more) of the plurality of data chunks, and, processed data of parity chunk(s) are received. When it is determined that processed data for a pre-defined quantity of data chunks has not been received by a pre-defined period of time, the processed data for particular data chunk(s) of particular processor(s) from which processed data has not been received are determined based, at least in part, upon the received processed parity chunk(s) and the received processed data chunk(s).

    Coded stream processing
    4.
    发明授权

    公开(公告)号:US11496153B2

    公开(公告)日:2022-11-08

    申请号:US17222576

    申请日:2021-04-05

    IPC分类号: H03M13/09 G06F9/54

    摘要: Described herein is a system and method for coded streaming data to facilitate recovery from failed or slow processor(s). A batch of processing stream data can be partitioned into a plurality of data chunks. Parity chunk(s) for the plurality of data chunks. The plurality of data chunks and the parity chunk(s) can be provided to processors for processing. Processed data of at least some (e.g., one or more) of the plurality of data chunks, and, processed data of parity chunk(s) are received. When it is determined that processed data for a pre-defined quantity of data chunks has not been received by a pre-defined period of time, the processed data for particular data chunk(s) of particular processor(s) from which processed data has not been received are determined based, at least in part, upon the received processed parity chunk(s) and the received processed data chunk(s).

    Journaling of streaming anchor resource(s)

    公开(公告)号:US11226966B2

    公开(公告)日:2022-01-18

    申请号:US16590909

    申请日:2019-10-02

    摘要: Described herein is a system and method of journaling of a streaming anchor resource. An input node can store a value of a property associated with the streaming data in a persistent indexed data structure. The input node can generate an anchor that describes a particular point in time in a data stream. The anchor can include an index into the persistent indexed data structure of the stored value of the property associated with the streaming data. The generated anchor and streaming data can be provided to the downstream node. During recovery of a downstream node, the input node can utilize a received anchor to retrieve a value of a property associated with the streaming data from the persistent indexed data structure, and, provide a batch of data based upon the received anchor and the retrieved property value.

    AUTOMATED CLOUD-EDGE STREAMING WORKLOAD DISTRIBUTION AND BIDIRECTIONAL MIGRATION WITH LOSSLESS, ONCE-ONLY PROCESSING

    公开(公告)号:US20200379805A1

    公开(公告)日:2020-12-03

    申请号:US16426993

    申请日:2019-05-30

    摘要: Methods, systems, and computer program products are described herein for automated cloud-edge workload distribution and bidirectional migration with lossless, once-only data stream processing. A cloud service may provide workload and bidirectional migration management between cloud and edge to provide once-only processing of data streams before and after migration. Migrated logic nodes may begin processing data streams where processing stopped at source logic nodes before migration without data loss or repetition, for example, by migrating and using anchors in pull-based stream processing. Query logic implementing customer queries of data streams may be distributed to edge and/or cloud devices based on placement criteria. Query logic may be migrated from source to target edge and/or cloud devices based on migration criteria.

    Punctuation controlled machine learning model temporal validity

    公开(公告)号:US11625558B2

    公开(公告)日:2023-04-11

    申请号:US16713761

    申请日:2019-12-13

    IPC分类号: G06F16/907 G06K9/62 G06N20/00

    摘要: Data events of an event stream are processed in accordance with temporally valid machine learning models. A streaming node may receive data events via an event stream. Each data event may be associated with a timestamp. The streaming node may also utilize punctuation events that specify the temporal validity of available machine learning models. The streaming node performs a temporal join operation for each data event based on its timestamp and the temporal validity. If the data event's timestamp is less than or equal to the punctuation event's timestamp, the data event is provided to the temporally valid machine learning model for processing thereby. If the data event's timestamp is greater than the punctuation event's timestamp, the data event is held until a subsequent punctuation event specifying a later timestamp is received.

    Enhanced anchor protocol for event stream processing

    公开(公告)号:US11044291B2

    公开(公告)日:2021-06-22

    申请号:US16145456

    申请日:2018-09-28

    摘要: Described herein is a system and method for startup and/or recovery for stream processing. During a startup phase: start anchor request(s), each identifying a particular time, are accumulated until request(s) are pending from downstream nodes. A minimum time of the accumulated start anchor request(s) is determined. If the processing system is an input node, an anchor associated with the determined minimum time is generated. Otherwise, a start anchor request is provided to an upstream node identifying the determined minimum time. Once the anchor associated with the determined minimum time is received (or generated), the anchor is provided in response to a polled start anchor request for the determined minimum time from a downstream node. Asynchronous requests for batches of data bounded by two specific anchors are performed in accordance with information stored in an ordered collection of anchors during a recovery phase.

    Efficient out of process reshuffle of streaming data

    公开(公告)号:US11010171B2

    公开(公告)日:2021-05-18

    申请号:US16426683

    申请日:2019-05-30

    摘要: Methods, systems, apparatuses, and computer program products are provided for processing a stream of data. A maximum temporal divergence is established for data flushed to a data store from a plurality of upstream partitions. Each of a plurality of data flushers, each corresponding to an upstream partition, may obtain an item of data from a data producer. Each data flusher may determine whether flushing the data to the data store would exceed the maximum temporal divergence. Based at least on determining that flushing the data to the data store would not exceed the maximum temporal divergence, the data may be flushed to the data store for ingestion by a downstream partition and a data structure (e.g., a ledger) may be updated to indicate a time associated with the most recent item of data flushed to the data store.