-
公开(公告)号:US20210319023A1
公开(公告)日:2021-10-14
申请号:US16917489
申请日:2020-06-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Bailu DING , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
IPC: G06F16/2453 , G06F16/2455
Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.
-
公开(公告)号:US20230367771A1
公开(公告)日:2023-11-16
申请号:US17740660
申请日:2022-05-10
Applicant: Microsoft Technology Licensing, LLC
Inventor: Tarique Ashraf SIDDIQUI , Saehan JO , Wentao WU , Chi WANG , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
IPC: G06F16/2453 , G06F16/22 , G06F16/248 , G06F16/21 , G06F11/34
CPC classification number: G06F16/24549 , G06F11/3409 , G06F16/21 , G06F16/221 , G06F16/24539 , G06F16/248
Abstract: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.
-
公开(公告)号:US20210406744A1
公开(公告)日:2021-12-30
申请号:US16917857
申请日:2020-06-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Anshuman DUTT , Chi WANG , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.
-
公开(公告)号:US20240184798A1
公开(公告)日:2024-06-06
申请号:US18075365
申请日:2022-12-05
Applicant: Microsoft Technology Licensing, LLC
Inventor: Kris K. GANJAM , Yeye HE , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
IPC: G06F16/25
CPC classification number: G06F16/258
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
-
公开(公告)号:US20230385261A1
公开(公告)日:2023-11-30
申请号:US17897930
申请日:2022-08-29
Applicant: Microsoft Technology Licensing, LLC
IPC: G06F16/22 , G06F16/2453 , G06F11/34
CPC classification number: G06F16/2272 , G06F16/24542 , G06F11/3419
Abstract: A method of training an index filter for an index tuning system includes receiving a plurality of different workloads and a plurality of different databases, each database including different tables and each workload including a plurality of queries; generating labeled training by making optimizer calls to a query optimizer using query and index configuration pairs from the plurality of databases and the plurality of workloads; training an index filter model to identify signals in the labeled training data, the signals being indicative of a potential performance improvement associated with using an index configuration for a given query; training the index filter model to learn rules over the signals for identifying spurious indexes; and storing the index filter model in a memory.
-
6.
公开(公告)号:US20230315702A1
公开(公告)日:2023-10-05
申请号:US17832274
申请日:2022-06-03
Applicant: Microsoft Technology Licensing, LLC
Inventor: Wentao WU , Chi WANG , Tarique Ashraf SIDDIQUI , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
IPC: G06F16/21 , G06F16/2453 , G06F16/22
CPC classification number: G06F16/217 , G06F16/2453 , G06F16/2246
Abstract: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.
-
公开(公告)号:US20240028607A1
公开(公告)日:2024-01-25
申请号:US18374490
申请日:2023-09-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yeye HE , Kris K. GANJAM , Vivek Ravindranath NARASAYYA , Surajit CHAUDHURI
IPC: G06F16/25 , G06F16/245 , G06F16/21
CPC classification number: G06F16/258 , G06F16/245 , G06F16/211 , G06N5/025
Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
-
公开(公告)号:US20230315701A1
公开(公告)日:2023-10-05
申请号:US18331169
申请日:2023-06-07
Applicant: Microsoft Technology Licensing, LLC
Inventor: Meiyalagan BALASUBRAMANIAN , Lengning LIU , Aditya KUPPA , Kirk Hartmann FREIHEIT , Kalen WONG , Paula Budig GREVE , Patrick Clinton LITTLE , Lucas PRITZ , Yue WANG , Vivek Ravindranath NARASAYYA , Katchaguy AREEKIJSEREE , Yehe HE , Surajit CHAUDHURI , Gaurav Ghosh
IPC: G06F16/215 , G06F16/2455
CPC classification number: G06F16/215 , G06F16/24556
Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
-
9.
公开(公告)号:US20230259407A1
公开(公告)日:2023-08-17
申请号:US17674173
申请日:2022-02-17
Applicant: Microsoft Technology Licensing, LLC
Inventor: Willis LANG , Justin Grant MOELLER , Ajay KALHAN , Monika COLIC , Aleksandar CUKANOVIC , Nikola PUZOVIC , Marko STOJANOVIC , Jiaqi LIU , Arnd Christian KÖNIG , Yi SHAN , Vivek Ravindranath NARASAYYA
IPC: G06F9/50
CPC classification number: G06F9/5083 , G06F9/5077 , G06F9/5016 , G06F9/5044 , G06F9/5055
Abstract: Methods, systems, and computer program products are provided for a compute cluster comprising placement and load balancing (PLB) logic that receives data (e.g., state metadata) relating to a service (e.g., database service) executing on the compute cluster, from a resource manager executing on the compute cluster, via a first API associated with the resource manager. The PLB logic receives second data from the service via a second API and determines whether a PLB action is indicated based on one of the second data or a combination of the first data and the second data. When a PLB action is indicated, the PLB logic sends a command to the resource manager to execute the PLB action. The PLB logic also receives queries from clients external to the compute cluster and may spawn a child PLB logic to offload PLB operations, respond to queries, or perform software validation in the child.
-
公开(公告)号:US20220414099A1
公开(公告)日:2022-12-29
申请号:US17361016
申请日:2021-06-28
Applicant: Microsoft Technology Licensing, LLC
IPC: G06F16/2453 , G06F11/34 , G06N20/00
Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.
-
-
-
-
-
-
-
-
-