Patent search ap:("SAMSUNG ELECTRONICS CO. Page LTD.") AND inv:"Caleb Ryan PHILLIPS"

1.

发明申请
SYSTEM AND METHOD FOR PERSONALIZATION IN INTELLIGENT MULTI-MODAL PERSONAL ASSISTANTS 有权

公开(公告)号：US20210264134A1

公开(公告)日：2021-08-26

申请号：US17079111

申请日：2020-10-23

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Caleb Ryan PHILLIPS

IPC: G06K9/00 , G10L17/22 , G06N20/00

Abstract: A method may include receiving, by a virtual assistant of a user device, an input from a user, the virtual assistant being based on software. The method may include obtaining, by the virtual assistant of the user device and via a sensor of the user device, audio information or video information of the user. The method may include determining, by the virtual assistant of the user device, an identity of the user based on the audio information or the video information of the user and a set of facial embeddings and speech embeddings that is correlated with the user, the set of facial embeddings and speech embeddings being generated using a facial embedding model, a speech embedding model, and a sound source localization model. The method may include performing, by the virtual assistant of the user device, an action based on the input and the identity of the user.

2.

发明申请
METHOD OF LIVE VIDEO EVENT DETECTION BASED ON NATURAL LANGUAGE QUERIES, AND AN APPARATUS FOR THE SAME 有权

公开(公告)号：US20220138489A1

公开(公告)日：2022-05-05

申请号：US17402877

申请日：2021-08-16

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Ning YE , Zhiming HU , Caleb Ryan PHILLIPS , Iqbal Ismail MOHOMED

IPC: G06K9/62 , G06K9/00 , G06F16/732 , G06N20/00

Abstract: A method of real-time video event detection includes: obtaining, based on a natural language query, a query vector; performing multimodal feature extraction on a video stream to obtain a video vector, obtaining a similarity score by comparing the query vector to the video vector; comparing the similarity score to a predetermined threshold; and activating, based on the similarity score being above the predetermined threshold, an action trigger. The multimodal feature extraction is performed using a plurality of overlapping windows that include sequential frames of the video stream.

Patent Agency Ranking