-
公开(公告)号:US20210264134A1
公开(公告)日:2021-08-26
申请号:US17079111
申请日:2020-10-23
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Caleb Ryan PHILLIPS
Abstract: A method may include receiving, by a virtual assistant of a user device, an input from a user, the virtual assistant being based on software. The method may include obtaining, by the virtual assistant of the user device and via a sensor of the user device, audio information or video information of the user. The method may include determining, by the virtual assistant of the user device, an identity of the user based on the audio information or the video information of the user and a set of facial embeddings and speech embeddings that is correlated with the user, the set of facial embeddings and speech embeddings being generated using a facial embedding model, a speech embedding model, and a sound source localization model. The method may include performing, by the virtual assistant of the user device, an action based on the input and the identity of the user.
-
公开(公告)号:US20220138489A1
公开(公告)日:2022-05-05
申请号:US17402877
申请日:2021-08-16
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Ning YE , Zhiming HU , Caleb Ryan PHILLIPS , Iqbal Ismail MOHOMED
IPC: G06K9/62 , G06K9/00 , G06F16/732 , G06N20/00
Abstract: A method of real-time video event detection includes: obtaining, based on a natural language query, a query vector; performing multimodal feature extraction on a video stream to obtain a video vector, obtaining a similarity score by comparing the query vector to the video vector; comparing the similarity score to a predetermined threshold; and activating, based on the similarity score being above the predetermined threshold, an action trigger. The multimodal feature extraction is performed using a plurality of overlapping windows that include sequential frames of the video stream.
-