METHOD AND APPARATUS FOR CLASSIFYING IMAGES USING AN ARTIFICIAL INTELLIGENCE MODEL

    公开(公告)号:US20220309774A1

    公开(公告)日:2022-09-29

    申请号:US17701209

    申请日:2022-03-22

    Abstract: An apparatus for performing image processing, may include at least one processor configured to: input an image to a vision transformer comprising a plurality of encoders that correspond to at least one fixed encoder and a plurality of adaptive encoders; process the image via the at least one fixed encoder to obtain image representations; determine one or more layers of the plurality of adaptive encoders to drop, by inputting the image representations to a policy network configured to determine layer dropout actions for the plurality of adaptive encoders; and obtain a class of the input image using remaining layers of the plurality of adaptive encoders other than the dropped one or more layers.

    System and method for supervised contrastive learning for multi-modal tasks

    公开(公告)号:US12183062B2

    公开(公告)日:2024-12-31

    申请号:US17589535

    申请日:2022-01-31

    Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

    SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS

    公开(公告)号:US20230177338A1

    公开(公告)日:2023-06-08

    申请号:US18073383

    申请日:2022-12-01

    CPC classification number: G06N3/082 G06V10/82 G06V10/772

    Abstract: A method includes obtaining, using a first electronic device, a weight matrix associated with a trained transformer model. The method also includes factorizing the weight matrix into a dictionary weight matrix and an intermediate matrix. The method further includes pruning the intermediate matrix to generate a sparse intermediate matrix. The method also includes fine-tuning the sparse intermediate matrix based on a training dataset to generate a fine-tuned sparse intermediate matrix. The method further includes determining an index matrix and a coefficient matrix based on the fine-tuned sparse intermediate matrix. In addition, the method includes deploying the dictionary weight matrix, the index matrix, and the coefficient matrix to a second electronic device without deploying the weight matrix to the second electronic device. A number of parameters in the dictionary weight matrix, the index matrix, and the coefficient matrix is smaller than a number of parameters in the weight matrix.

    Structured Pruning of Vision Transformer

    公开(公告)号:US20230073835A1

    公开(公告)日:2023-03-09

    申请号:US17900126

    申请日:2022-08-31

    Abstract: In one embodiment, a method includes accessing a batch B of a plurality of images, wherein each image in the batch is part of a training set of images used to train a vision transformer comprising a plurality of attention heads. The method further includes determining, for each attention head A, a similarity between (1) the output of the attention head evaluated using each image in the batch and the (2) output of each attention head evaluated using each image in the batch. The method further includes determining, based on the determined similarities, an importance score for each attention head; and pruning, based on the importance scores, one or more attention heads from the vision transformer.

Patent Agency Ranking