Patent search ap:("SAMSUNG ELECTRONICS CO. Page LTD.") AND inv:"Zhiming HU"

1.

发明公开
METHOD OF PROCESSING MULTIMODAL RETRIEVAL TASKS, AND AN APPARATUS FOR THE SAME 审中-公开

公开(公告)号：US20230237089A1

公开(公告)日：2023-07-27

申请号：US18099711

申请日：2023-01-20

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Zhiming HU , Lan Xiao , Mele Kemertas , Caleb Ryan Phillips , Igbal Ismail Mohomed , Afsaneh Fazly

IPC: G06F16/538 , G06F16/2455

CPC classification number: G06F16/538 , G06F16/2455

Abstract: A method for multimodal content retrieval, may include: receiving a search query corresponding to a request for content; aggregating word features extracted from the search query based on a first set of learned weights; aggregating region features extracted from each of a plurality of images, based on a second set of learned weights, independently of the word features; computing a similarity score between the aggregated words features and the aggregated region features for each of the plurality of images; selecting candidate images from the plurality of images based on the similarity scores between each of the plurality of images and the search query; and selecting at least one final image from the candidate images as a response to the search query, based on attended similarity scores of the candidate images with respect to the search query.

2.

发明公开
METHOD OF PROCESSING MULTIMODAL TASKS, AND AN APPARATUS FOR THE SAME 审中-公开

公开(公告)号：US20230259779A1

公开(公告)日：2023-08-17

申请号：US17981024

申请日：2022-11-04

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Ning YE , Zhiming HU

IPC: G06N3/08 , G06N3/04 , G06N3/063 , G06F16/738

CPC classification number: G06N3/084 , G06N3/0445 , G06N3/063 , G06F16/738

Abstract: An electronic device may obtain a query from a user input; obtain a sequence of frames of one or more input videos; select frames from the sequence of frames of the one or more input videos, via a sampler neural network configured to extract features from the sequence of frames that are input to the sampler neural network, determine temporal dependencies between the extracted features, and determine an action of selecting or skipping for each of the sequence of frames; and identify a video that matches the query via a multimodal neural network configured to receive the selected frames and the query, and output the video that matches the query, among the one or more input videos, wherein the sampler neural network and the multimodal neural network are jointly trained based on an aggregated loss that combines an accuracy loss that represents an accuracy of determining the video that matches the query, and an efficiency loss that reflects a proportion of frames being passed to the multimodal neural network.

3.

发明申请
IMAGE AND VIDEO CLASSIFICATION 有权

公开(公告)号：US20250086936A1

公开(公告)日：2025-03-13

申请号：US18670139

申请日：2024-05-21

Applicant: SAMSUNG ELECTRONICS CO., LTD

Inventor： Salar HOSSEINI KHORASGANI , Zhiming HU , Iqbal Ismail MOHAMED , Weiming REN

IPC: G06V10/764 , G06T7/11

Abstract: Provided are system, method, and device for determining a classification of an image. According to embodiments, the method may include: determining, by a patch sampler model, selection probabilities of a first plurality of patches included in an image; selecting, by the patch sampler model, a second plurality of patches from among the first plurality of patches of the image based on the selection probabilities; and determining a classification of the image by processing the second plurality of patches through an encoder; wherein the patch sampler model may be trained based on a sampling loss which indicates a difference between the selection probabilities and attention scores of the image obtained via the encoder.

4.

发明申请
SYSTEM, METHOD, AND COMPUTER PROGRAM FOR MULTIMODAL VIDEO RETRIEVAL 有权

公开(公告)号：US20240370491A1

公开(公告)日：2024-11-07

申请号：US18428626

申请日：2024-01-31

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Zhiming HU , Ning YE , Iqbal Ismail MOHOMED , Salar HOSSEINI KHORASGANI

IPC: G06F16/74 , G06F16/71 , G06F16/738 , G06N3/0499 , G06V10/776 , G06V10/82 , G06V10/94 , G06V20/40

Abstract: Provided are system, method, and device for performing multimodal video retrieval. According to embodiments, the method may include: obtaining a first plurality of frames of a video; selecting a second plurality of frames from among the first plurality of frames using a frame selection module, wherein a number of the second plurality of frames may be less than a number of the first plurality of frames; determining a representation of the video based on the selected second plurality of frames using a neural network model; and storing the representation of the video in a memory.

5.

发明申请
METHOD OF LIVE VIDEO EVENT DETECTION BASED ON NATURAL LANGUAGE QUERIES, AND AN APPARATUS FOR THE SAME 有权

公开(公告)号：US20220138489A1

公开(公告)日：2022-05-05

申请号：US17402877

申请日：2021-08-16

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Ning YE , Zhiming HU , Caleb Ryan PHILLIPS , Iqbal Ismail MOHOMED

IPC: G06K9/62 , G06K9/00 , G06F16/732 , G06N20/00

Abstract: A method of real-time video event detection includes: obtaining, based on a natural language query, a query vector; performing multimodal feature extraction on a video stream to obtain a video vector, obtaining a similarity score by comparing the query vector to the video vector; comparing the similarity score to a predetermined threshold; and activating, based on the similarity score being above the predetermined threshold, an action trigger. The multimodal feature extraction is performed using a plurality of overlapping windows that include sequential frames of the video stream.

Patent Agency Ranking