摘要:
A method, an apparatus and an article of manufacture for processing a random-walk based vertex-proximity query on a graph. The method includes computing at least one vertex cluster and corresponding meta-information from a graph, dynamically updating the clustering and corresponding meta-information upon modification of the graph, and identifying a vertex cluster relevant to at least one query vertex and aggregating corresponding meta-information of the cluster to process the query.
摘要:
Embodiments of the disclosure include a method for providing stream processing with runtime adaptation includes registering one or more events, wherein each of the events is associated with a stream processing application. The method also includes monitoring, by a processor, for an occurrence of the one or more events associated with the stream processing application, wherein each of the one or more events is associated with one or more runtime metrics. The method further includes receiving an event notification, wherein the event notification includes event identification and an event context and executing an adaptation of the stream processing application.
摘要:
A method to optimize performance of an operator on a computer system includes determining whether the system is busy, decreasing a software thread level within the operator if the system is busy, and increasing the software thread level within the operator if the system is not busy and a performance measure of the system at a current software thread level of the operator is greater than a performance measure of the system when the operator has a lower software thread level.
摘要:
Data analysis applications include model building components and stream processing components. To increase utility of the data analysis application, in one embodiment, the model building component of the data analysis application is managed. Management includes resource allocation and/or configuration adaptation of the model building component, as examples.
摘要:
Data sharing is facilitated in stream processing environments, including distributed stream processing environments. A processor of the stream processing environment obtains at least one of usage information for shared data of the stream processing environment, one or more pre-declared characteristics of the shared data, or performance information relating to the stream processing environment. Based on at least one of the usage information, the one or more pre-declared characteristics or the performance information, code is generated for managing the shared data.
摘要:
Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks from a graph of operators, wherein the declarative description expresses at least one stream processing task, generating one or more containers that encompass a combination of one or more stream processing operators, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application binary.
摘要:
State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance.
摘要:
Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks from a graph of operators, wherein the declarative description expresses at least one stream processing task, generating one or more containers that encompass a combination of one or more stream processing operators, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application binary.
摘要:
State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance.
摘要:
Techniques for scheduling a plurality of jobs sharing input are provided. The techniques include partitioning one or more input datasets into multiple subcomponents, analyzing a plurality of jobs to determine which of the plurality of jobs require scanning of one or more common subcomponents of the one or more input datasets, and scheduling a plurality of jobs that require scanning of one or more common subcomponents of the one or more input datasets, facilitating a single scanning of the one or more common subcomponents to be used as input by each of the plurality of jobs.