-
公开(公告)号:US20210182619A1
公开(公告)日:2021-06-17
申请号:US16713761
申请日:2019-12-13
IPC分类号: G06K9/62 , G06F16/907 , G06N20/00
摘要: Data events of an event stream are processed in accordance with temporally valid machine learning models. A streaming node may receive data events via an event stream. Each data event may be associated with a timestamp. The streaming node may also utilize punctuation events that specify the temporal validity of available machine learning models. The streaming node performs a temporal join operation for each data event based on its timestamp and the temporal validity. If the data event's timestamp is less than or equal to the punctuation event's timestamp, the data event is provided to the temporally valid machine learning model for processing thereby. If the data event's timestamp is greater than the punctuation event's timestamp, the data event is held until a subsequent punctuation event specifying a later timestamp is received.
-
公开(公告)号:US11095522B2
公开(公告)日:2021-08-17
申请号:US16547399
申请日:2019-08-21
摘要: Described herein is a system and method for dynamically scaling a stream processing system (e.g., “exactly once” data stream processing system). Various parameter(s) (e.g., user-configurable capacity, real-time load metrics, and/or performance counters) can be used to dynamically scale in and/or scale out the “exactly once” stream processing system without system restart. Delay introduced by this scaling operation can be minimized by utilizing a combination of mutable process topology (which can dynamically assign certain parts of the system to a new host machine) and controllable streaming processor movement with checkpoints and the streaming protocol controlled recovery which still enforces the “exactly once” delivery metric.
-
公开(公告)号:US10998919B2
公开(公告)日:2021-05-04
申请号:US16590461
申请日:2019-10-02
摘要: Described herein is a system and method for coded streaming data to facilitate recovery from failed or slow processor(s). A batch of processing stream data can be partitioned into a plurality of data chunks. Parity chunk(s) for the plurality of data chunks. The plurality of data chunks and the parity chunk(s) can be provided to processors for processing. Processed data of at least some (e.g., one or more) of the plurality of data chunks, and, processed data of parity chunk(s) are received. When it is determined that processed data for a pre-defined quantity of data chunks has not been received by a pre-defined period of time, the processed data for particular data chunk(s) of particular processor(s) from which processed data has not been received are determined based, at least in part, upon the received processed parity chunk(s) and the received processed data chunk(s).
-
公开(公告)号:US11496153B2
公开(公告)日:2022-11-08
申请号:US17222576
申请日:2021-04-05
摘要: Described herein is a system and method for coded streaming data to facilitate recovery from failed or slow processor(s). A batch of processing stream data can be partitioned into a plurality of data chunks. Parity chunk(s) for the plurality of data chunks. The plurality of data chunks and the parity chunk(s) can be provided to processors for processing. Processed data of at least some (e.g., one or more) of the plurality of data chunks, and, processed data of parity chunk(s) are received. When it is determined that processed data for a pre-defined quantity of data chunks has not been received by a pre-defined period of time, the processed data for particular data chunk(s) of particular processor(s) from which processed data has not been received are determined based, at least in part, upon the received processed parity chunk(s) and the received processed data chunk(s).
-
公开(公告)号:US11226966B2
公开(公告)日:2022-01-18
申请号:US16590909
申请日:2019-10-02
发明人: Alexander Alperovich , Boris Shulman , Ke Liu
IPC分类号: G06F16/2455 , H04L29/06 , H04N21/24 , H04N21/845
摘要: Described herein is a system and method of journaling of a streaming anchor resource. An input node can store a value of a property associated with the streaming data in a persistent indexed data structure. The input node can generate an anchor that describes a particular point in time in a data stream. The anchor can include an index into the persistent indexed data structure of the stored value of the property associated with the streaming data. The generated anchor and streaming data can be provided to the downstream node. During recovery of a downstream node, the input node can utilize a received anchor to retrieve a value of a property associated with the streaming data from the persistent indexed data structure, and, provide a batch of data based upon the received anchor and the retrieved property value.
-
公开(公告)号:US11113197B2
公开(公告)日:2021-09-07
申请号:US16378426
申请日:2019-04-08
IPC分类号: G06Q30/02 , G06F12/0842
摘要: A method for joining an event stream with reference data includes loading a plurality of reference data snapshots from a reference data source into a cache. Punctuation events are supplied that indicate temporal validity for the plurality of reference data snapshots in the cache. A logical barrier is provided that restricts a flow of data events in the event stream to a cache lookup operation based on the punctuation events. The cache lookup operation is performed with respect to the data events in the event stream that are permitted to cross the logical barrier.
-
7.
公开(公告)号:US20200379805A1
公开(公告)日:2020-12-03
申请号:US16426993
申请日:2019-05-30
IPC分类号: G06F9/50 , G06F16/2455 , G06F16/2458
摘要: Methods, systems, and computer program products are described herein for automated cloud-edge workload distribution and bidirectional migration with lossless, once-only data stream processing. A cloud service may provide workload and bidirectional migration management between cloud and edge to provide once-only processing of data streams before and after migration. Migrated logic nodes may begin processing data streams where processing stopped at source logic nodes before migration without data loss or repetition, for example, by migrating and using anchors in pull-based stream processing. Query logic implementing customer queries of data streams may be distributed to edge and/or cloud devices based on placement criteria. Query logic may be migrated from source to target edge and/or cloud devices based on migration criteria.
-
公开(公告)号:US11625558B2
公开(公告)日:2023-04-11
申请号:US16713761
申请日:2019-12-13
IPC分类号: G06F16/907 , G06K9/62 , G06N20/00
摘要: Data events of an event stream are processed in accordance with temporally valid machine learning models. A streaming node may receive data events via an event stream. Each data event may be associated with a timestamp. The streaming node may also utilize punctuation events that specify the temporal validity of available machine learning models. The streaming node performs a temporal join operation for each data event based on its timestamp and the temporal validity. If the data event's timestamp is less than or equal to the punctuation event's timestamp, the data event is provided to the temporally valid machine learning model for processing thereby. If the data event's timestamp is greater than the punctuation event's timestamp, the data event is held until a subsequent punctuation event specifying a later timestamp is received.
-
公开(公告)号:US11044291B2
公开(公告)日:2021-06-22
申请号:US16145456
申请日:2018-09-28
IPC分类号: G06F15/16 , H04L29/06 , G06F16/2455 , G06F9/4401
摘要: Described herein is a system and method for startup and/or recovery for stream processing. During a startup phase: start anchor request(s), each identifying a particular time, are accumulated until request(s) are pending from downstream nodes. A minimum time of the accumulated start anchor request(s) is determined. If the processing system is an input node, an anchor associated with the determined minimum time is generated. Otherwise, a start anchor request is provided to an upstream node identifying the determined minimum time. Once the anchor associated with the determined minimum time is received (or generated), the anchor is provided in response to a polled start anchor request for the determined minimum time from a downstream node. Asynchronous requests for batches of data bounded by two specific anchors are performed in accordance with information stored in an ordered collection of anchors during a recovery phase.
-
公开(公告)号:US11010171B2
公开(公告)日:2021-05-18
申请号:US16426683
申请日:2019-05-30
发明人: Alexander Alperovich , Zhong Chen , Boris Shulman
IPC分类号: G06F15/173 , G06F9/38 , G06F16/27 , G06F16/25
摘要: Methods, systems, apparatuses, and computer program products are provided for processing a stream of data. A maximum temporal divergence is established for data flushed to a data store from a plurality of upstream partitions. Each of a plurality of data flushers, each corresponding to an upstream partition, may obtain an item of data from a data producer. Each data flusher may determine whether flushing the data to the data store would exceed the maximum temporal divergence. Based at least on determining that flushing the data to the data store would not exceed the maximum temporal divergence, the data may be flushed to the data store for ingestion by a downstream partition and a data structure (e.g., a ledger) may be updated to indicate a time associated with the most recent item of data flushed to the data store.
-
-
-
-
-
-
-
-
-