-
公开(公告)号:US10983967B2
公开(公告)日:2021-04-20
申请号:US15478177
申请日:2017-04-03
Applicant: Amazon Technologies, Inc.
Inventor: Dimitris Tsirogiannis , Nathan A. Binkert , Stavros Harizopoulos , Mehul A. Shah , Benjamin A. Sowell , Bryan D. Kaplan , Kevin R. Meyer
Abstract: A data transformation system includes a schema inference module and an export module. The schema inference module is configured to dynamically create a cumulative schema for objects retrieved from a first data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. Dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema. The export module is configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.
-
公开(公告)号:US20170206256A1
公开(公告)日:2017-07-20
申请号:US15478177
申请日:2017-04-03
Applicant: Amazon Technologies, Inc.
Inventor: Dimitris Tsirogiannis , Nathan A. Binkert , Stavros Harizopoulos , Mehul A. Shah , Benjamin A. Sowell , Bryan D. Kaplan , Kevin R. Meyer
IPC: G06F17/30
CPC classification number: G06F16/211 , G06F16/22 , G06F16/235 , G06F16/254 , G06F16/86
Abstract: A data transformation system includes a schema inference module and an export module. The schema inference module is configured to dynamically create a cumulative schema for objects retrieved from a first data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. Dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema. The export module is configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.
-
公开(公告)号:US11567972B1
公开(公告)日:2023-01-31
申请号:US15199486
申请日:2016-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Anurag Windlass Gupta , Andrew Edward Caldwell , Stavros Harizopoulos , Michail Petropoulos , Ramakrishna Kotla , John Benjamin Tobler
Abstract: A tree-based format may be implemented for data stored in a data store. A table may be maintained across one or multiple storage nodes in storage slabs. Storage slabs may be mapped to different nodes of a tree. Each node of the tree may be assigned a different range of distribution scheme values which identify what portions of the table are stored in the storage slab. Storage slabs mapped to child nodes in the tree may be assigned portions of the range of distribution scheme values assigned to a parent. Storage nodes may be added or removed for storing the table. Storage slabs may be moved from one storage node to another in order to accommodate the addition or removal of storage nodes.
-
公开(公告)号:US11500931B1
公开(公告)日:2022-11-15
申请号:US15996224
申请日:2018-06-01
Applicant: Amazon Technologies, Inc.
Inventor: Panagiotis Parchas , Christos Faloutsos , Anurag Windlass Gupta , Stavros Harizopoulos , Michail Petropoulos
IPC: G06F16/90 , G06F16/901 , G06F16/2455 , G06F16/2453
Abstract: Using a graph representation of join history may be performed to distribute database data. Join history may be collected, captured, or tracked which describes the history of join operations between columns of different tables in a database. A graph representation of the join history may be generated. The graph representation may indicate a likelihood of different joins that may be performed between the columns of the tables of a database. An evaluation of the join history may be performed to identify columns for tables in the database to distribute the data of the tables amongst multiple storage locations according to the identified columns.
-
公开(公告)号:US10318346B1
公开(公告)日:2019-06-11
申请号:US15274813
申请日:2016-09-23
Applicant: Amazon Technologies, Inc.
Inventor: Stavros Harizopoulos , Michail Petropoulos , Andrea Olgiati
Abstract: Data stores may implement prioritized scheduling of data store access requests. When new access requests are received, the new access requests may be scheduled for prioritized execution on processing resources. Access requests that are currently being executed with prioritized execution may be reprioritized to make additional capacity for prioritized execution of the new access requests. Prioritized execution may be automatically enabled or disabled for a data store based on monitoring of performance metrics for executing access requests.
-
-
-
-