Materialized table refresh using multiple processing pipelines

    公开(公告)号:US12216654B2

    公开(公告)日:2025-02-04

    申请号:US18362898

    申请日:2023-07-31

    Applicant: Snowflake Inc.

    Abstract: A system for a materialized table (MT) refresh using multiple processing pipelines includes at least one hardware processor coupled to memory storing instructions. The instructions cause the at least one hardware processor to perform operations including determining dependencies among a plurality of intermediate MTs generated from a source MT. The source MT uses a table definition with a query on one or more base tables and a lag duration value. A graph snapshot of dependencies among the plurality of intermediate MTs is generated. Processing pipelines are configured. Each of the processing pipelines corresponds to a subset of the plurality of intermediate MTs indicated by the graph snapshot. Responsive to detecting an instruction for a refresh operation on the source MT, refreshes on corresponding intermediate MTs of the plurality of intermediate MTs in each processing pipeline of the processing pipelines are performed to complete the refresh operation on the source MT.

    METHODS AND SYSTEMS FOR FILE GENERATION AND STORAGE

    公开(公告)号:US20250028681A1

    公开(公告)日:2025-01-23

    申请号:US18400265

    申请日:2023-12-29

    Applicant: Inkit, Inc.

    Inventor: Michael McCarthy

    Abstract: The present disclosure generally relates to methods, systems, apparatuses, and non-transitory computer readable media for managing file generation and retention. These systems may be of used across a wide range of businesses to significantly automate the generation and storage of files. By utilizing nested file templates as described herein, systems of the present disclosure may increase the ease in which a file template may be generated, stored, retrieved, and retained. Moreover, by using the file retention storage system described below, system of the present disclosure may simplify the process of ensuring files are retained for an appropriate amount of time and further assuring that files are deleted when they no longer need to be retained.

    System and method for entity disambiguation for customer relationship management

    公开(公告)号:US12169508B2

    公开(公告)日:2024-12-17

    申请号:US17481866

    申请日:2021-09-22

    Abstract: Systems and methods for disambiguating company profiles are disclosed. The system builds a database of candidate companies with timed attributes. The system further ingests timed company metadata using a PostgreSQL database. The system disambiguates location and geocoding by matching text patterns and cross-referencing one or more identified location components against one or more geocode databases and classifies and disambiguates company name component from a company name associated with the company using a conditional random field (CRF) model. Further, the system disambiguates employee attributes using a Latent Dirichlet Allocation (LDA) topic model algorithm and train a tree model for pre-selection of candidate companies for the company. The system trains a similarity model for comparison of the candidate companies and in response to a determination that two given candidate companies are same merge the company profiles associated with the given candidate companies.

    End-to-end topology stitching and representation

    公开(公告)号:US12131164B2

    公开(公告)日:2024-10-29

    申请号:US17486888

    申请日:2021-09-27

    CPC classification number: G06F9/44505 G06F7/14

    Abstract: End-to-end topology stitching and representation is described. An example includes instructions for receiving, at a server, a set of configuration data for an infrastructure stack, the set of configuration data including configuration data for each of a plurality of domains of the infrastructure stack; parsing the received set of configuration data; stitching together an end-to-end topology for the plurality of domains of the infrastructure stack based at least in part on the parsed set of configuration data; and generating a representation of the end-to-end topology of the infrastructure stack.

    METHOD AND SYSTEM FOR GENERATING MACHINE LEARNING TRAINING DATA STREAMS USING UNSTRUCTURED DATA

    公开(公告)号:US20240330751A1

    公开(公告)日:2024-10-03

    申请号:US18193801

    申请日:2023-03-31

    CPC classification number: G06N20/00 G06F7/14 G06F16/38

    Abstract: Techniques described herein relate to a method for managing training data. The method includes making a determination that the first stream request is associated with unstructured data; in response to the determination: obtaining a manifest associated with the unstructured data based on the stream request, wherein the manifest comprises metadata; loading the unstructured data into a cache using the manifest; merging the metadata with the unstructured data to generate training data, wherein the training data comprises a plurality of training data examples; generating augmented training data using the training data and the stream specification; generating a mini-batch sequence using the augmented training data and the stream specification; creating a mini-batch sequence queue and a stream endpoint; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint, wherein the mini-batch sequence is used by a training environment to train a machine learning model.

Patent Agency Ranking