-
公开(公告)号:US11086687B2
公开(公告)日:2021-08-10
申请号:US16200360
申请日:2018-11-26
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
摘要: The technology disclosed herein relates to method, system, and computer program product (computer-readable storage device) embodiments for managing resource allocation in a stream processing framework. An embodiment operates by configuring an allocation of a task sequence and machine resources to a container, and by running the task sequence, wherein the task sequence is configured to be run continuously as a plurality of units of work corresponding to the task sequence. Some embodiments further include changing the allocation responsive to a determination of an increase in data volume. A query may be taken from the task sequence and processed. Responsive to the query, a real-time result may be returned. Query processing may involve continuously applying a rule to the data stream, in real time or near real time. The rule may be set via a query language. Additionally, the data stream may be partitioned into batches for parallel processing.
-
公开(公告)号:US20180307571A1
公开(公告)日:2018-10-25
申请号:US15954014
申请日:2018-04-16
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
CPC分类号: G06F11/1471 , G06F11/14 , G06F11/1438 , G06F11/202 , G06F11/2035 , G06F11/2048 , G06F2201/84
摘要: The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modern stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modern stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
-
公开(公告)号:US11288142B2
公开(公告)日:2022-03-29
申请号:US16793936
申请日:2020-02-18
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
摘要: The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modem stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modem stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
-
公开(公告)号:US11216302B2
公开(公告)日:2022-01-04
申请号:US16396522
申请日:2019-04-26
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
摘要: The technology disclosed provides a novel and innovative technique for compact deployment of application code to stream processing systems. In particular, the technology disclosed relates to obviating the need of accompanying application code with its dependencies during deployment (i.e., creating fat jars) by operating a stream processing system within a container defined over worker nodes of whole machines and initializing the worker nodes with precompiled dependency libraries having precompiled classes. Accordingly, the application code is deployed to the container without its dependencies, and, once deployed, the application code is linked with the locally stored precompiled dependencies at runtime. In implementations, the application code is deployed to the container running the stream processing system between 300 milliseconds and 6 seconds. This is drastically faster than existing deployment techniques that take anywhere between 5 to 15 minutes for deployment.
-
公开(公告)号:US20170083380A1
公开(公告)日:2017-03-23
申请号:US14994131
申请日:2016-01-12
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
CPC分类号: G06F9/5083 , G06F9/52
摘要: The technology disclosed relates to managing resource allocation to task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines. It also includes initially allocating multiple machines to a first container, initially allocating first set of stateful task sequences to the first container, running the first set of stateful task sequences as multiplexed units of work under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources. It further includes automatically modifying a number of machine resources and/or a number assigned task sequences to a container.
-
公开(公告)号:US10275278B2
公开(公告)日:2019-04-30
申请号:US15265817
申请日:2016-09-14
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
摘要: The technology disclosed provides a novel and innovative technique for compact deployment of application code to stream processing systems. In particular, the technology disclosed relates to obviating the need of accompanying application code with its dependencies during deployment (i.e., creating fat jars) by operating a stream processing system within a container defined over worker nodes of whole machines and initializing the worker nodes with precompiled dependency libraries having precompiled classes. Accordingly, the application code is deployed to the container without its dependencies, and, once deployed, the application code is linked with the locally stored precompiled dependencies at runtime. In implementations, the application code is deployed to the container running the stream processing system between 300 milliseconds and 6 seconds. This is drastically faster than existing deployment techniques that take anywhere between 5 to 15 minutes for deployment.
-
公开(公告)号:US09965330B2
公开(公告)日:2018-05-08
申请号:US14986401
申请日:2015-12-31
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
CPC分类号: G06F9/505 , G06F3/0613 , G06F3/0631 , G06F3/067 , G06F9/5061
摘要: The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.
-
公开(公告)号:US09842000B2
公开(公告)日:2017-12-12
申请号:US14986419
申请日:2015-12-31
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
CPC分类号: G06F9/5038 , G06F9/5072 , G06F9/5088 , G06F17/30516
摘要: The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.
-
9.
公开(公告)号:US20170075693A1
公开(公告)日:2017-03-16
申请号:US14986351
申请日:2015-12-31
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
CPC分类号: G06F9/5088 , G06F9/4881 , G06F2209/483
摘要: The technology disclosed improves existing streaming processing systems by allowing the ability to both scale up and scale down resources within an infrastructure of a stream processing system. In particular, the technology disclosed relates to a dispatch system for a stream processing system that adapts its behavior according to a computational capacity of the system based on a run-time evaluation. The technical solution includes, during run-time execution of a pipeline, comparing a count of available physical threads against a set number of logically parallel threads. When a count of available physical threads equals or exceeds the number of logically parallel threads, the solution includes concurrently processing the batches at the physical threads. Further, when there are fewer available physical threads than the number of logically parallel threads, the solution includes multiplexing the batches sequentially over the available physical threads.
摘要翻译: 所公开的技术通过允许在流处理系统的基础设施中扩展和缩小资源的能力来改进现有的流处理系统。 特别地,所公开的技术涉及一种用于流处理系统的调度系统,其基于运行时评估根据系统的计算容量来调整其行为。 该技术解决方案包括在流水线的运行时执行期间,将可用物理线程的数量与设定数量的逻辑并行线程进行比较。 当可用物理线程的数量等于或超过逻辑并行线程数时,该解决方案包括同时处理物理线程上的批处理。 此外,当存在比逻辑并行线程数量少的可用物理线程时,解决方案包括在可用物理线程上顺序复用批次。
-
公开(公告)号:US20200183796A1
公开(公告)日:2020-06-11
申请号:US16793936
申请日:2020-02-18
申请人: salesforce.com, inc.
发明人: Elden Gregory Bishop , Jeffrey Chao
IPC分类号: G06F11/14
摘要: The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modem stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modem stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
-
-
-
-
-
-
-
-
-