-
公开(公告)号:US20220309774A1
公开(公告)日:2022-09-29
申请号:US17701209
申请日:2022-03-22
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Burak Uzkent , Vasili Ramanishka , Yilin Shen , Hongxia Jin
IPC: G06V10/82 , G06V10/764
Abstract: An apparatus for performing image processing, may include at least one processor configured to: input an image to a vision transformer comprising a plurality of encoders that correspond to at least one fixed encoder and a plurality of adaptive encoders; process the image via the at least one fixed encoder to obtain image representations; determine one or more layers of the plurality of adaptive encoders to drop, by inputting the image representations to a policy network configured to determine layer dropout actions for the plurality of adaptive encoders; and obtain a class of the input image using remaining layers of the plurality of adaptive encoders other than the dropped one or more layers.
-
公开(公告)号:US20240080423A1
公开(公告)日:2024-03-07
申请号:US18057126
申请日:2022-11-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Wenbo Li , Zhipeng Mo , Yi Wei , Burak Uzkent , Qian Lou , Yilin Shen , Hongxia Jin
IPC: H04N9/64
CPC classification number: H04N9/64
Abstract: A method includes obtaining raw image data, where the raw image data includes data values each having most significant bits and least significant bits. The method also includes providing the raw image data to a trained machine learning model and generating processed image data using the trained machine learning model. The method further includes presenting an image based on the processed image data. The trained machine learning model is trained to modulate a feature map associated with the most significant bits of the data values of the raw image data based on the least significant bits of the data values of the raw image data in order to generate a fusion of the most significant bits and the least significant bits of the data values of the raw image data.
-
公开(公告)号:US20230245435A1
公开(公告)日:2023-08-03
申请号:US17589535
申请日:2022-01-31
Applicant: Samsung Electronics Co., Ltd.
Inventor: Changsheng Zhao , Burak Uzkent , Yilin Shen , Hongxia Jin
IPC: G06V10/80 , G06V10/778 , G06V10/774 , G06F40/279
CPC classification number: G06V10/811 , G06V10/778 , G06V10/774 , G06F40/279
Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.
-
公开(公告)号:US12183062B2
公开(公告)日:2024-12-31
申请号:US17589535
申请日:2022-01-31
Applicant: Samsung Electronics Co., Ltd.
Inventor: Changsheng Zhao , Burak Uzkent , Yilin Shen , Hongxia Jin
IPC: G06V10/80 , G06F40/279 , G06V10/774 , G06V10/778
Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.
-
公开(公告)号:US20230177338A1
公开(公告)日:2023-06-08
申请号:US18073383
申请日:2022-12-01
Applicant: Samsung Electronics Co., Ltd.
Inventor: Qian Lou , Yen-Chang Hsu , Burak Uzkent , Ting Hua , Yilin Shen , Hongxia Jin
IPC: G06N3/082 , G06V10/82 , G06V10/772
CPC classification number: G06N3/082 , G06V10/82 , G06V10/772
Abstract: A method includes obtaining, using a first electronic device, a weight matrix associated with a trained transformer model. The method also includes factorizing the weight matrix into a dictionary weight matrix and an intermediate matrix. The method further includes pruning the intermediate matrix to generate a sparse intermediate matrix. The method also includes fine-tuning the sparse intermediate matrix based on a training dataset to generate a fine-tuned sparse intermediate matrix. The method further includes determining an index matrix and a coefficient matrix based on the fine-tuned sparse intermediate matrix. In addition, the method includes deploying the dictionary weight matrix, the index matrix, and the coefficient matrix to a second electronic device without deploying the weight matrix to the second electronic device. A number of parameters in the dictionary weight matrix, the index matrix, and the coefficient matrix is smaller than a number of parameters in the weight matrix.
-
公开(公告)号:US20230073835A1
公开(公告)日:2023-03-09
申请号:US17900126
申请日:2022-08-31
Applicant: Samsung Electronics Co., Ltd.
Inventor: Miao Yin , Burak Uzkent , Yilin Shen , Hongxia Jin
IPC: G06V10/70 , G06V10/774 , G06V10/776 , G06V10/74
Abstract: In one embodiment, a method includes accessing a batch B of a plurality of images, wherein each image in the batch is part of a training set of images used to train a vision transformer comprising a plurality of attention heads. The method further includes determining, for each attention head A, a similarity between (1) the output of the attention head evaluated using each image in the batch and the (2) output of each attention head evaluated using each image in the batch. The method further includes determining, based on the determined similarities, an importance score for each attention head; and pruning, based on the importance scores, one or more attention heads from the vision transformer.
-
-
-
-
-