摘要:
Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks from a graph of operators, wherein the declarative description expresses at least one stream processing task, generating one or more containers that encompass a combination of one or more stream processing operators, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application binary.
摘要:
State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance.
摘要:
State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance.
摘要:
Data analysis applications include model building components and stream processing components. To increase utility of the data analysis application, in one embodiment, the model building component of the data analysis application is managed. Management includes resource allocation and/or configuration adaptation of the model building component, as examples.
摘要:
One embodiment of a method for on-demand de-marshalling of a network message includes receiving a first network message at a receiver device, wherein the first network message was sent from a sender device, and wherein the first network message comprises one or more attributes that characterize an object at the sender device, and further wherein the first network message is in an encoded format according to a network protocol, storing the first network message in a buffer of the receiver device in the encoded format, identifying, at the receiver device and prior to de-marshalling of any of the attributes, that at least one of the attributes is to be manipulated by the receiver device, de-marshalling, in response to the identifying, the identified attribute(s), and manipulating the identified attribute(s) at the receiver device.
摘要:
Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks, wherein the declarative description expresses at least one stream processing task, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application.
摘要:
In one embodiment, the invention is a method and apparatus for failure recovery for stream processing applications. One embodiment of a method for providing a failure recovery mechanism for a stream processing application includes receiving source code for the stream processing application, wherein the source code defines a fault tolerance policy for each of the components of the stream processing application, and wherein respective fault tolerance policies defined for at least two of the plurality of components are different, generating a sequence of instructions for converting the state(s) of the component(s) into a checkpoint file comprising a sequence of storable bits on a periodic basis, according to a frequency defined in the fault tolerance policy, initiating execution of the stream processing application, and storing the checkpoint file, during execution of the stream processing application, at a location that is accessible after failure recovery.
摘要:
Techniques for compiling a data stream processing application are provided. The techniques include receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function, determining, by the compiler, one or more characteristics of operators within the data stream processing application, grouping, by the compiler, the operators into one or more execution containers based on the one or more characteristics, and compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition.
摘要:
A method and computer program product for selecting an expression evaluation technique for domain-specific language (DSL) compilation. An application written in DSL for a programming task is provided, the application including a plurality of components configured by expressions. A technique that most quickly implements the programming task is selected from a plurality of techniques for evaluating the expressions. The DSL application is compiled in accordance with the selected expression evaluation technique to generate general-purpose programming language (GPL) code.
摘要:
An Open Database Connectivity (ODBC) proxy infrastructure to transparently route incoming queries to one or more selected query engines. The ODBC proxy receives a query from an application, and determines based on the characteristics of the query and the capabilities of the query engines which one or more query engines are to perform the query. The proxy then routes the query to the one or more query engines, which perform the query. The results are then returned to the proxy, which provides the results to the application.