-
公开(公告)号:US20220300740A1
公开(公告)日:2022-09-22
申请号:US17387889
申请日:2021-07-28
Applicant: Samsung Electronics Co., Ltd.
Inventor: Saurabh Sahu , Palash Goyal
Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.
-
公开(公告)号:US11989939B2
公开(公告)日:2024-05-21
申请号:US17387889
申请日:2021-07-28
Applicant: Samsung Electronics Co., Ltd.
Inventor: Saurabh Sahu , Palash Goyal
IPC: G06V20/40 , G06F18/214
CPC classification number: G06V20/41 , G06F18/214
Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.
-
公开(公告)号:US20220245424A1
公开(公告)日:2022-08-04
申请号:US17368683
申请日:2021-07-06
Applicant: Samsung Electronics Co., Ltd.
Inventor: Palash Goyal , Saurabh Sahu , Shalini Ghosh , Hyun Chul Lee
Abstract: A method includes accessing video data that includes at least two different modalities. The method also includes using a convolutional neural network layer to incorporate temporal coherence into a machine learning model architecture configured to process the video data. The method further includes learning dependency among the at least two different modalities in an attention space of the machine learning model architecture. In addition, the method includes predicting one or more correlations among the at least two different modalities.
-
-