-
公开(公告)号:US20230237089A1
公开(公告)日:2023-07-27
申请号:US18099711
申请日:2023-01-20
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Zhiming HU , Lan Xiao , Mele Kemertas , Caleb Ryan Phillips , Igbal Ismail Mohomed , Afsaneh Fazly
IPC: G06F16/538 , G06F16/2455
CPC classification number: G06F16/538 , G06F16/2455
Abstract: A method for multimodal content retrieval, may include: receiving a search query corresponding to a request for content; aggregating word features extracted from the search query based on a first set of learned weights; aggregating region features extracted from each of a plurality of images, based on a second set of learned weights, independently of the word features; computing a similarity score between the aggregated words features and the aggregated region features for each of the plurality of images; selecting candidate images from the plurality of images based on the similarity scores between each of the plurality of images and the search query; and selecting at least one final image from the candidate images as a response to the search query, based on attended similarity scores of the candidate images with respect to the search query.
-
公开(公告)号:US20230259779A1
公开(公告)日:2023-08-17
申请号:US17981024
申请日:2022-11-04
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Ning YE , Zhiming HU
IPC: G06N3/08 , G06N3/04 , G06N3/063 , G06F16/738
CPC classification number: G06N3/084 , G06N3/0445 , G06N3/063 , G06F16/738
Abstract: An electronic device may obtain a query from a user input; obtain a sequence of frames of one or more input videos; select frames from the sequence of frames of the one or more input videos, via a sampler neural network configured to extract features from the sequence of frames that are input to the sampler neural network, determine temporal dependencies between the extracted features, and determine an action of selecting or skipping for each of the sequence of frames; and identify a video that matches the query via a multimodal neural network configured to receive the selected frames and the query, and output the video that matches the query, among the one or more input videos, wherein the sampler neural network and the multimodal neural network are jointly trained based on an aggregated loss that combines an accuracy loss that represents an accuracy of determining the video that matches the query, and an efficiency loss that reflects a proportion of frames being passed to the multimodal neural network.
-
公开(公告)号:US20250086936A1
公开(公告)日:2025-03-13
申请号:US18670139
申请日:2024-05-21
Applicant: SAMSUNG ELECTRONICS CO., LTD
Inventor: Salar HOSSEINI KHORASGANI , Zhiming HU , Iqbal Ismail MOHAMED , Weiming REN
IPC: G06V10/764 , G06T7/11
Abstract: Provided are system, method, and device for determining a classification of an image. According to embodiments, the method may include: determining, by a patch sampler model, selection probabilities of a first plurality of patches included in an image; selecting, by the patch sampler model, a second plurality of patches from among the first plurality of patches of the image based on the selection probabilities; and determining a classification of the image by processing the second plurality of patches through an encoder; wherein the patch sampler model may be trained based on a sampling loss which indicates a difference between the selection probabilities and attention scores of the image obtained via the encoder.
-
公开(公告)号:US20240370491A1
公开(公告)日:2024-11-07
申请号:US18428626
申请日:2024-01-31
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Zhiming HU , Ning YE , Iqbal Ismail MOHOMED , Salar HOSSEINI KHORASGANI
IPC: G06F16/74 , G06F16/71 , G06F16/738 , G06N3/0499 , G06V10/776 , G06V10/82 , G06V10/94 , G06V20/40
Abstract: Provided are system, method, and device for performing multimodal video retrieval. According to embodiments, the method may include: obtaining a first plurality of frames of a video; selecting a second plurality of frames from among the first plurality of frames using a frame selection module, wherein a number of the second plurality of frames may be less than a number of the first plurality of frames; determining a representation of the video based on the selected second plurality of frames using a neural network model; and storing the representation of the video in a memory.
-
公开(公告)号:US20220138489A1
公开(公告)日:2022-05-05
申请号:US17402877
申请日:2021-08-16
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Ning YE , Zhiming HU , Caleb Ryan PHILLIPS , Iqbal Ismail MOHOMED
IPC: G06K9/62 , G06K9/00 , G06F16/732 , G06N20/00
Abstract: A method of real-time video event detection includes: obtaining, based on a natural language query, a query vector; performing multimodal feature extraction on a video stream to obtain a video vector, obtaining a similarity score by comparing the query vector to the video vector; comparing the similarity score to a predetermined threshold; and activating, based on the similarity score being above the predetermined threshold, an action trigger. The multimodal feature extraction is performed using a plurality of overlapping windows that include sequential frames of the video stream.
-
-
-
-