-
公开(公告)号:US12050606B2
公开(公告)日:2024-07-30
申请号:US18112958
申请日:2023-02-22
发明人: Ian Schechter , Garth Dickie
IPC分类号: G06F16/2455 , G06F9/30 , G06F9/50 , G06F16/242 , G06F16/2457 , G06F16/25 , G06F16/901
CPC分类号: G06F16/2456 , G06F9/3005 , G06F9/5061 , G06F16/244 , G06F16/24573 , G06F16/254 , G06F16/9024
摘要: Techniques for generating a dataflow graph include generating a first dataflow graph with a plurality of first nodes representing first computer operations in processing data, with at least one of the first computer operations being a declarative operation that specifies one or more characteristics of one or more results of processing of data, and transforming the first dataflow graph into a second dataflow graph for processing data in accordance with the first computer operations, the second dataflow graph including a plurality of second nodes representing second computer operations, with at least one of the second nodes representing one or more imperative operations that implement the logic specified by the declarative operation, where the one or more imperative operations are unrepresented by the first nodes in the first dataflow graph.
-
公开(公告)号:US20240126748A1
公开(公告)日:2024-04-18
申请号:US18345852
申请日:2023-06-30
发明人: Jonah Egenolf , Marshall A. Isman , Ian Schechter
IPC分类号: G06F16/242 , G06F8/34 , G06F8/36 , G06F8/38 , G06F16/21 , G06F16/23 , G06F16/2452 , G06F16/2453 , G06F16/28
CPC分类号: G06F16/2423 , G06F8/34 , G06F8/36 , G06F8/38 , G06F16/211 , G06F16/2365 , G06F16/24524 , G06F16/24526 , G06F16/2453 , G06F16/24544 , G06F16/24545 , G06F16/288 , G06Q10/10
摘要: A method includes accessing a schema that specifies relationships among datasets, computations on the datasets, or transformations of the datasets, selecting a dataset from among the datasets, and identifying, from the schema, other datasets that are related to the selected dataset. Attributes of the datasets are identified, and logical data representing the identified attributes and relationships among the attributes is generated. The logical data is provided to a development environment, which provides access to portions of the logical data representing the identified attributes. A specification that specifies at least one of the identified attributes in performing an operation is received from the development environment. Based on the specification and the relationships among the identified attributes represented by the logical data, a computer program is generated to perform the operation by accessing, from storage, at least one dataset having the at least one of the attributes specified in the specification.
-
公开(公告)号:US11720583B2
公开(公告)日:2023-08-08
申请号:US17878106
申请日:2022-08-01
发明人: Ian Schechter , Tim Wakeling , Ann M. Wollrath
IPC分类号: G06F16/24 , G06F16/2458 , G06F16/13 , G06F16/25 , G06F16/28 , G06F16/17 , G06F16/901 , G06F9/50
CPC分类号: G06F16/2471 , G06F9/5066 , G06F16/13 , G06F16/1734 , G06F16/254 , G06F16/285 , G06F16/9024 , G06F16/284
摘要: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
公开(公告)号:US10437819B2
公开(公告)日:2019-10-08
申请号:US14746188
申请日:2015-06-22
IPC分类号: G06F16/24 , G06F16/242 , G06F16/28 , G06F16/901 , G06F16/2453 , G06F8/30 , G06F16/2452
摘要: Among other things, a method of generating a computer program based on an SQL query includes receiving a SQL query, including a reference to a first data set stored at a first data source, and including a reference to a second data set stored at a second data source different from the first data source, determining that the SQL query includes two or more commands, the commands including a first union-type operation, and a first aggregation operation, and determining that the SQL query describes that the first union-type operation shall be applied to at least a portion of data from the first data set, and applied to at least a portion of data from the second data set, determining that the SQL query describes that the first aggregation operation shall be applied to data resulting from the first union-type operation, and generating the computer program.
-
公开(公告)号:US09576028B2
公开(公告)日:2017-02-21
申请号:US14628643
申请日:2015-02-23
发明人: Ian Schechter , Glenn John Allin
IPC分类号: G06F17/30
CPC分类号: G06F17/30463 , G06F17/30398 , G06F17/30436 , G06F17/30477 , G06F17/30958
摘要: In one aspect, in general, a method of generating a dataflow graph representing a database query includes receiving a query plan from a plan generator, the query plan representing operations for executing a database query on at least one input representing a source of data, producing a dataflow graph from the query plan, wherein the dataflow graph includes at least one node that represents at least one operation represented by the query plan, and includes at least one link that represents at least one dataflow associated with the query plan, and altering one or more components of the dataflow graph based on at least one characteristic of the at least one input representing the source of data.
摘要翻译: 一方面,一般来说,生成表示数据库查询的数据流图的方法包括从计划生成器接收查询计划,所述查询计划表示用于对表示数据源的至少一个输入执行数据库查询的操作,产生 来自所述查询计划的数据流图,其中所述数据流图包括表示由所述查询计划表示的至少一个操作的至少一个节点,并且包括表示与所述查询计划相关联的至少一个数据流的至少一个链接,并且改变一个 基于表示数据源的至少一个输入的至少一个特性,数据流图的多个或多个组件。
-
公开(公告)号:US11593380B2
公开(公告)日:2023-02-28
申请号:US16862821
申请日:2020-04-30
发明人: Ian Schechter , Garth Dickie
IPC分类号: G06F16/2455 , G06F16/2457 , G06F16/242 , G06F16/25 , G06F16/901 , G06F9/30 , G06F9/50
摘要: Techniques for generating a dataflow graph include generating a first dataflow graph with a plurality of first nodes representing first computer operations in processing data, with at least one of the first computer operations being a declarative operation that specifies one or more characteristics of one or more results of processing of data, and transforming the first dataflow graph into a second dataflow graph for processing data in accordance with the first computer operations, the second dataflow graph including a plurality of second nodes representing second computer operations, with at least one of the second nodes representing one or more imperative operations that implement the logic specified by the declarative operation, where the one or more imperative operations are unrepresented by the first nodes in the first dataflow graph.
-
公开(公告)号:US11593369B2
公开(公告)日:2023-02-28
申请号:US15496891
申请日:2017-04-25
IPC分类号: G06F16/00 , G06F16/2453 , G06F16/242 , G06F16/25 , G06F16/2455 , G06F16/2458 , G06F16/732 , G06F16/2452
摘要: One method includes receiving a database query, receiving information about a database table in data storage populated with data elements, producing a structural representation of the database table that includes a formatted data organization reflective of the database table and is absent the data elements of the database table, and providing the structural representation and the database query to a plan generator capable of producing a query plan representing operations for executing the database query on the database table. Another method includes receiving a query plan from a plan generator, the plan representing operations for executing a database query on a database table, and producing a dataflow graph from the query plan, wherein the dataflow graph includes at least one node that represents at least one operation represented by the query plan, and includes at least one link that represents at least one dataflow associated with the query plan.
-
公开(公告)号:US20210279043A1
公开(公告)日:2021-09-09
申请号:US17025751
申请日:2020-09-18
发明人: Jonah Egenolf , Marshall A. Isman , Ian Schechter
IPC分类号: G06F8/34 , G06F8/36 , G06F8/38 , G06F16/242 , G06F16/2452
摘要: A method includes accessing a schema that specifies relationships among datasets, computations on the datasets, or transformations of the datasets, selecting a dataset from among the datasets, and identifying, from the schema, other datasets that are related to the selected dataset. Attributes of the datasets are identified, and logical data representing the identified attributes and relationships among the attributes is generated. The logical data is provided to a development environment, which provides access to portions of the logical data representing the identified attributes. A specification that specifies at least one of the identified attributes in performing an operation is received from the development environment. Based on the specification and the relationships among the identified attributes represented by the logical data, a computer program is generated to perform the operation by accessing, from storage, at least one dataset having the at least one of the attributes specified in the specification.
-
公开(公告)号:US20210232579A1
公开(公告)日:2021-07-29
申请号:US16862821
申请日:2020-04-30
发明人: Ian Schechter , Garth Dickie
IPC分类号: G06F16/2455 , G06F16/2457 , G06F16/901 , G06F16/25 , G06F16/242 , G06F9/30 , G06F9/50
摘要: Techniques for generating a dataflow graph include generating a first dataflow graph with a plurality of first nodes representing first computer operations in processing data, with at least one of the first computer operations being a declarative operation that specifies one or more characteristics of one or more results of processing of data, and transforming the first dataflow graph into a second dataflow graph for processing data in accordance with the first computer operations, the second dataflow graph including a plurality of second nodes representing second computer operations, with at least one of the second nodes representing one or more imperative operations that implement the logic specified by the declarative operation, where the one or more imperative operations are unrepresented by the first nodes in the first dataflow graph.
-
公开(公告)号:US09607073B2
公开(公告)日:2017-03-28
申请号:US14255579
申请日:2014-04-17
发明人: Ian Schechter , Tim Wakeling , Ann M. Wollrath
CPC分类号: G06F17/30545 , G06F9/5066 , G06F17/30091 , G06F17/30144 , G06F17/30563 , G06F17/30595 , G06F17/30598 , G06F17/30958
摘要: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
-
-
-
-
-
-
-
-