STORING SEMI-STRUCTURED DATA
    1.
    发明公开

    公开(公告)号:US20240220538A1

    公开(公告)日:2024-07-04

    申请号:US18604990

    申请日:2024-03-14

    申请人: Google LLC

    发明人: Martin Probst

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing semi-structured data. One of the methods includes maintaining a plurality of schemas; receiving a first semi-structured data item; determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas; and in response to determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas: generating a new schema, encoding the first semi-structured data item in the first data format to generate the first new encoded data item in accordance with the new schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema.

    Multidimensional machine learning data and user interface segment tagging engine apparatuses, methods and systems

    公开(公告)号:US11676070B1

    公开(公告)日:2023-06-13

    申请号:US17111477

    申请日:2020-12-03

    申请人: Momentum NA, Inc.

    摘要: The Multidimensional Machine Learning Data and User Interface Segment Tagging Engine Apparatuses, Methods and Systems (“MLUI”) transforms ambient condition data, sales data, user interface selections, cognitive intelligence question input inputs via MLUI components into project projections, campaigns, user interface visualizations, cognitive intelligence question output outputs. A cognitive intelligence (CI) datapoint identifier cache datastructure is generated, the CI datapoint identifier cache datastructure configured to comprise a category identifier and an entity segment identifier. A CI datapoint value cache datastructure is generated, the CI datapoint value cache datastructure configured to comprise a set of module datastructures, each module datastructure corresponding to a module identifier associated with the category identifier, each module datastructure comprising a set of metric datastructures, each metric datastructure corresponding to a set of calculated metrics. The generated CI datapoint identifier cache datastructure and the generated CI datapoint value cache datastructure are stored as a key-value pair.

    Distinct value estimation for query planning

    公开(公告)号:US11663213B2

    公开(公告)日:2023-05-30

    申请号:US17105014

    申请日:2020-11-25

    申请人: Cloudera, Inc.

    摘要: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

    Selecting a normalized form for conversion of a query expression

    公开(公告)号:US11609911B2

    公开(公告)日:2023-03-21

    申请号:US16720481

    申请日:2019-12-19

    发明人: Jason Arnold

    IPC分类号: G06F16/2453 G06F16/835

    摘要: A method for execution by a query processing module includes determining a query expression indicating a query for execution. An operator tree is generated based on a nested ordering of a plurality of operators indicated by the query expression. Conjunctive normal form (CNF) conversion cost data is generated based on the operator tree, and disjunctive normal form (DNF) conversion cost data is also generated based on the operator tree. Conversion selection data is generated based on the CNF conversion cost data and the DNF conversion cost data. The conversion selection data indicates a selection to perform either a CNF conversion or a DNF conversion. A normalized query expression is generated by performing either the CNF conversion or the DNF conversion upon the query expression based on the conversion selection data. Execution of the query is facilitated in accordance with the normalized query expression.

    Evaluating XML full text search
    7.
    发明授权

    公开(公告)号:US11481439B2

    公开(公告)日:2022-10-25

    申请号:US17129085

    申请日:2020-12-21

    IPC分类号: G06F16/835

    摘要: Techniques are described to improve query evaluation in computer systems. In an embodiment, a system receives a full text query for evaluation against a collection of hierarchically marked data object sets. The query specifies token(s) and context(s) which indicate hierarchical location(s) to match within a queried hierarchical data structure. To evaluate the query, the system determines a) data object set(s) that contain the query specified token(s) using token list(s), and/or b) data object set(s) that contain the query specified context(s) using label list(s).

    Rules-Based Targeted Content Message Serving Systems and Methods

    公开(公告)号:US20220156796A1

    公开(公告)日:2022-05-19

    申请号:US17665840

    申请日:2022-02-07

    摘要: A method of serving targeted content messages for display in a website accessed in a browser program of a networked computer communicatively connected to a network at a network address for communications, delivers uniquely targeted content messages displayed in websites viewed in web browsers. The method includes placing a script device in a website file, processing the website file, together with the script device by a particular web browser on download of the website file, including by determining the network address of the networked computer, determining an identifier of the website file, and sending an artifact representing the network address and the identifier over the network to a server computer. The method also includes detecting the network address and the identifier by the server computer, querying a database for a database article related to the network address and the identifier, constructing a script program stored in memory of the server computer for the particular browser and website file, and constructing an ad device stored in memory of the web browser device from the script program. The method further includes calling the server computer by the ad device by communication of an identifier representing an action of the web browser device, receiving the identifier by the server computer, querying the database for a select message artifact related to the script program, the identifier, the website file, and the web browser, and responding by the server computer to the web browser with the select message artifact. A message represented by the select message artifact is displayed in the website then viewed in a browser window of the web browser. Messages can be prioritized and are uniquely targeted in content, based on real-time activities of the web browser.

    SEGMENT CONTENT OPTIMIZATION DELIVERY SYSTEM AND METHOD

    公开(公告)号:US20220156795A1

    公开(公告)日:2022-05-19

    申请号:US17529814

    申请日:2021-11-18

    IPC分类号: G06Q30/02 H04W4/21 G06F16/835

    摘要: A method for identifying segments of a population of user devices communicating on a communications network. The segments correspond to user devices of the population exhibiting comparable behavioral patterns detectable by the communications network. A plurality of marketing systems are accessible on the communications network, and each of the plurality of marketing systems include respective use data corresponding to respective ones of the population for the marketing system. The method includes retrieving by a processor the respective use data for the population, from the plurality of marketing systems, determining by the processor if the respective use data exceeds a threshold for particular behavioral pattern of interest, for the respective use data, determining by the processor a unique identifier for each user device of the use data, grouping by the processor in a database, the respective use data in relation to the unique identifier, for each user device of the use data that exceeds the threshold, and mapping by the processor in the database, the behavioral pattern of the respective use data for each user device of the use data that exceeds the threshold. Behavioral patterns are determined for the respective segment, and related to the user devices of the segment. Content for delivery to the segment is sequenced, and placeholder in the sequence is stored in relation to each user device of the segment, to ensure that each next sequential content is delivered to the respective user device.

    Structured record retrieval
    10.
    发明授权

    公开(公告)号:US11294874B2

    公开(公告)日:2022-04-05

    申请号:US16521934

    申请日:2019-07-25

    发明人: Taro Ikai

    摘要: An approach to structured record retrieval permits transmission and storage of records in a native concise format, without requiring that the records be interpreted and stored in a tabular form. Such storage of the records in a tabular form might double the space required, and more generally, requires substantially more space in applications in which there are many optional elements. In some embodiments, each message is parsed according to a specification of the message structure (e.g., according to a “grammar” for the message), and during parsing field values in predefined positions in the structure are extracted and added to an index structure that associates record identifiers with the (position, value) pairs.