Identifying joins of tables of a database

    公开(公告)号:US11960484B2

    公开(公告)日:2024-04-16

    申请号:US17500508

    申请日:2021-10-13

    申请人: ThoughtSpot, Inc.

    摘要: Identifying table joins includes obtaining respective casting similarities between pairs of columns of a first table and a second table. Each pair of columns includes a first column of the first table and a second column of the second table. Ones of the pairs of columns not satisfying a casting similarity condition are discarded to obtain first join candidates. Respective string similarities for the first join candidates are obtained. Ones of the first join candidates not satisfying a string similarity condition are discarded to obtain second join candidates. Final join candidates are obtained using the respective casting similarities and the respective string similarities of the second join candidates. A selected join candidate of the final join candidates is received from a user.

    Phrase Indexing
    43.
    发明公开
    Phrase Indexing 审中-公开

    公开(公告)号:US20240070176A1

    公开(公告)日:2024-02-29

    申请号:US18501101

    申请日:2023-11-03

    申请人: ThoughtSpot, Inc.

    摘要: Intent-resolution using a phrase index may include obtaining data expressing a usage intent, the data expressing the usage intent including an unresolved data portion, identifying a phrase fragment based on the data expressing the usage intent and a defined phrase pattern, the phrase fragment including the unresolved data portion, identifying, by a processor, an indexed phrase as by searching a phrase index based on the phrase fragment, wherein the indexed phrase at least partially matches the phrase fragment in accordance with the defined phrase pattern, in response to identifying the indexed phrase, obtaining a resolved request representing the data expressing the usage intent in accordance with the indexed phrase, generating a data query in accordance with the resolved request and a defined structured query language, obtaining results data responsive to execution of the data query by a database that implements the defined structured query language, and outputting the results data.

    Just-In-Time Injection In A Distributed Database

    公开(公告)号:US20230401210A1

    公开(公告)日:2023-12-14

    申请号:US18455774

    申请日:2023-08-25

    申请人: ThoughtSpot, Inc.

    摘要: A request for database results is received from a query coordinator at a database instance of a distributed database. The request includes a query execution instruction of a query plan and an indication of override instructions corresponding to the query execution instruction. The override instructions are such that they do not modify the query plan. The database instance includes the override instructions in a set of high-level language query instructions. The database instance performs just-in-time compiling of the set of high-level language query instructions to obtain a machine language query for performing the query execution instruction of the query plan. The database instance executes the machine language query to obtain the database results. The database instance then transmits the database results to the query coordinator.

    Distributed pseudo-random subset generation

    公开(公告)号:US11836136B2

    公开(公告)日:2023-12-05

    申请号:US18075665

    申请日:2022-12-06

    申请人: ThoughtSpot, Inc.

    摘要: Distributed pseudo-random subset generation includes obtaining a data-query indicating a first table having a first column including unique values, a second table having a second column including unique values, a join clause joining the first table and the second table on the first column and the second column, and a limit value, pseudo-random filtering the first table to obtain left intermediate data and left filtering criteria, pseudo-random filtering the second table to obtain right intermediate data and right filtering criteria, obtaining intermediate results data by full outer joining the left intermediate data and the right intermediate data, obtaining results data by filtering the intermediate results data using most-restrictive filtering criteria among the left filtering criteria and the right filtering criteria, and outputting the results data, wherein outputting the results data includes limiting the cardinality of rows of the results data to be at most the limit value.

    Approximate unique count
    46.
    发明授权

    公开(公告)号:US11748264B1

    公开(公告)日:2023-09-05

    申请号:US17962904

    申请日:2022-10-10

    申请人: ThoughtSpot, Inc.

    摘要: Obtaining an approximate unique count for a column from a table from a database includes, generating, for a value from an unevaluated row, a hash value in a defined range of hash values, determining a cardinality of leading zeros in the hash value, identifying a bucket with respect to the hash value from a plurality of buckets corresponding to the defined range of hash values, wherein the buckets from the plurality of buckets correspond with respective non-overlapping portions of the defined range of hash values, such that the hash value is in the portion of the defined range of hash values corresponding to the bucket, and appending to an unsorted sparse representation a bucket identifier for the bucket and the cardinality of the leading zeros, and, in response to a determination that unevaluated rows are unavailable in the table, determining the approximate unique count using the unsorted sparse representation.

    Compacted Table Data Files Validation
    47.
    发明公开

    公开(公告)号:US20230252016A1

    公开(公告)日:2023-08-10

    申请号:US18301453

    申请日:2023-04-17

    申请人: ThoughtSpot, Inc.

    IPC分类号: G06F16/23 G06F16/22

    摘要: A first replay log is replayed to generate a first replay result. Replaying the first replay log includes replacing, in the first replay result, a first value of a first field included in a first command in the first replay log with a first hash value responsive to a determination that the first field is not utilized as a condition in at least one command included in the first replay log. A second replay log is replayed to generate a second replay result. The first replay result and the second replay result are compared to verify that the first replay log and the second replay log are equivalent.

    Aggregation operations in a distributed database

    公开(公告)号:US11720570B2

    公开(公告)日:2023-08-08

    申请号:US17214247

    申请日:2021-03-26

    申请人: ThoughtSpot, Inc.

    摘要: Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query coordinator, the intermediate results to obtain the results data. Receiving the intermediate results data includes receiving, from a first database instance, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause.

    Identifying Joins Of Tables Of A Database

    公开(公告)号:US20230112250A1

    公开(公告)日:2023-04-13

    申请号:US17500508

    申请日:2021-10-13

    申请人: ThoughtSpot, Inc.

    IPC分类号: G06F16/2453 G06F16/22

    摘要: Identifying table joins includes obtaining respective casting similarities between pairs of columns of a first table and a second table. Each pair of columns includes a first column of the first table and a second column of the second table. Ones of the pairs of columns not satisfying a casting similarity condition are discarded to obtain first join candidates. Respective string similarities for the first join candidates are obtained. Ones of the first join candidates not satisfying a string similarity condition are discarded to obtain second join candidates. Final join candidates are obtained using the respective casting similarities and the respective string similarities of the second join candidates. A selected join candidate of the final join candidates is received from a user.

    Object Scriptability
    50.
    发明申请

    公开(公告)号:US20230101890A1

    公开(公告)日:2023-03-30

    申请号:US18062134

    申请日:2022-12-06

    申请人: ThoughtSpot, Inc.

    摘要: Object scriptability includes receiving a high-level language script describing at least one data-analysis object, including a node representing the data-analysis object in a graph-based data structure including a plurality of nodes, where each node from the plurality of nodes represents a respective data-analysis object in a data analysis system, where each node from the plurality of nodes is connected to at least one other node from the plurality of nodes by an edge, and where the edge represents a relationship between the respective objects in the data analysis system.