Determining visual overlap of images by using box embeddings

    公开(公告)号:US11836965B2

    公开(公告)日:2023-12-05

    申请号:US17398443

    申请日:2021-08-10

    Applicant: Niantic, Inc.

    CPC classification number: G06V10/751 G06F18/214 G06N3/088 G06V10/421 G06V10/50

    Abstract: An image matching system for determining visual overlaps between images by using box embeddings is described herein. The system receives two images depicting a 3D surface with different camera poses. The system inputs the images (or a crop of each image) into a machine learning model that outputs a box encoding for the first image and a box encoding for the second image. A box encoding includes parameters defining a box in an embedding space. Then the system determines an asymmetric overlap factor that measures asymmetric surface overlaps between the first image and the second image based on the box encodings. The asymmetric overlap factor includes an enclosure factor indicating how much surface from the first image is visible in the second image and a concentration factor indicating how much surface from the second image is visible in the first image.

    Generating realistic counterfactuals with residual generative adversarial nets

    公开(公告)号:US11836633B2

    公开(公告)日:2023-12-05

    申请号:US17469339

    申请日:2021-09-08

    Applicant: Vettery, Inc.

    CPC classification number: G06N3/088 G06N3/045

    Abstract: Techniques for generating counterfactuals in connection with machine learning models. The techniques include applying a trained machine learning model to an input to obtain a first outcome; determining whether the first outcome has a value in a set of one or more target values; when it is determined that the first outcome does not have a value in the set of one or more target values, generating a counterfactual input at least in part by applying a trained neural network model to the input to obtain a corresponding output, the corresponding output indicating changes to be made to one or more values of one or more attributes of the input to obtain the counterfactual input, and generating feedback based on the counterfactual input.

    Data Pruning Tool and Related Aspects
    94.
    发明公开

    公开(公告)号:US20230385601A1

    公开(公告)日:2023-11-30

    申请号:US18031406

    申请日:2020-10-12

    CPC classification number: G06N3/04 G06F11/3696 G06N3/088

    Abstract: A method and related aspects are disclosed for determining one or more regions of interest in a multi-dimensional data set comprising a plurality of parameter sets, each parameter set comprising a parameter set identifier, a plurality of dimensions of selection conditions for assessing a configurable physical entity, and an indication of an assessed characteristic of the configurable physical entity. The method comprises at least mapping, using a self-organising map, SOM, model which uses competitive group learning, the multi-dimensional data set onto an edge-connected surface mesh of neurons; identifying at least one cluster of neurons on the surface mesh based on a category of the assessed characteristic; identifying a set of ranges of boundary values for the selection conditions for each cluster, each range of boundary values comprising a maximum and a minimum weight value of the weights representing that selection condition of the neurons in that cluster; and determining one or more regions of interest which associate the boundary values of the selection conditions of each cluster with one or more test case identifiers for the test cases represented by the neurons in that cluster. The method may be implemented in some embodiments as a data pruning tool.

    ONE MODEL UNIFYING STREAMING AND NON-STREAMING SPEECH RECOGNITION

    公开(公告)号:US20230368779A1

    公开(公告)日:2023-11-16

    申请号:US18357225

    申请日:2023-07-24

    Applicant: Google LLC

    Abstract: A transformer-transducer model for unifying streaming and non-streaming speech recognition includes an audio encoder, a label encoder, and a joint network. The audio encoder receives a sequence of acoustic frames, and generates, at each of a plurality of time steps, a higher order feature representation for a corresponding acoustic frame. The label encoder receives a sequence of non-blank symbols output by a final softmax layer, and generates, at each of the plurality of time steps, a dense representation. The joint network receives the higher order feature representation and the dense representation at each of the plurality of time steps, and generates a probability distribution over possible speech recognition hypothesis. The audio encoder of the model further includes a neural network having an initial stack of transformer layers trained with zero look ahead audio context, and a final stack of transformer layers trained with a variable look ahead audio context.

Patent Agency Ranking