-
公开(公告)号:WO2019133698A1
公开(公告)日:2019-07-04
申请号:PCT/US2018/067654
申请日:2018-12-27
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant
CPC classification number: G10L15/24 , G06K9/00664 , G10L15/22 , G10L25/63 , G10L2015/225 , G10L2015/226
Abstract: The present teaching relates to method, system, medium, and implementations for enabling communication with a user. Information representing surrounding of a user engaged in an on-going dialogue is received via the communication platform, wherein the information includes a current response from the user in the on-going dialogue and is acquired from a current scene in which the user is present and captures characteristics of the user and the current scene. Relevant features are extracted from the information. A state of the user is estimated based on the relevant features and a dialogue context surrounding the current scene is determined based on the relevant features. A feedback directed to the current response of the user is generated based on the state of the user and the dialogue context.
-
2.
公开(公告)号:WO2019161241A1
公开(公告)日:2019-08-22
申请号:PCT/US2019/018270
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant
Abstract: The present teaching relates to method, system, medium, and implementations for identifying object of interest. Image data acquired by a camera with respect to a scene are received. One or more users are detected, during a period of time, from the image data who are present at the scene. Three dimensional (3D) gazing rays of the one or more users during the period of time are estimated. One or more intersections of such 3D gazing rays are identified and are used to determine at least one object of interest of the one or more users.
-
公开(公告)号:WO2019161237A1
公开(公告)日:2019-08-22
申请号:PCT/US2019/018264
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant , DHARNE, Ashwin
IPC: G06K9/00 , G06K9/20 , H04N21/84 , H04N21/845
Abstract: The present teaching relates to method, system, medium, and implementations for determining a type of a scene. Image data acquired by a camera with respect to a scene are received and one or more objects present in the scene are detected therefrom. The detected objects are recognized based on object recognition models. The spatial relationships among the detected objects are then determined based on the image data. The recognized objects and their spatial relationships are then used to infer a type of the scene in accordance with at least one scene context-free grammar model.
-
4.
公开(公告)号:WO2019161198A1
公开(公告)日:2019-08-22
申请号:PCT/US2019/018215
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant , DHARNE, Ashwin
Abstract: The present teaching relates to method, system, medium, and implementations for speech recognition. An audio signal is received that represents a speech of a user engaged in a dialogue. A visual signal is received that captures the user uttering the speech. A first speech recognition result is obtained by performing audio based speech recognition based on the audio signal. Based on the visual signal, lip movement of the user is detected and a second speech recognition result is obtained by performing lip reading based speech recognition. The first and the second speech recognition results are then integrated to generate an integrated speech recognition result.
-
5.
公开(公告)号:WO2019133689A1
公开(公告)日:2019-07-04
申请号:PCT/US2018/067641
申请日:2018-12-27
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant
Abstract: The present teaching relates to method, system, medium, and implementation for activating an animatronic device. Information about a user is obtained for whom an animatronic device is to be configured to carry out a dialogue with the user. The animatronic device includes a head portion and a body portion and the head portion is configured based on one of a plurality of selectable head portions. One or more preferences of the user are identified from the obtained information and used to select, from the plurality of selectable head portions, a first selected head portion. A configuration of the head portion of the animatronic device is then configured based on the first selected head portion for carrying out the dialogue.
-
公开(公告)号:WO2019161229A1
公开(公告)日:2019-08-22
申请号:PCT/US2019/018253
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant
Abstract: The present teaching relates to method, system, medium, and implementations for understanding a three dimensional (3D) scene. Image data acquired by a camera at different time instances with respect to the 3D scene are received wherein the 3D scene includes a user or one or more objects. The face of the user is detected and tracked at different time instances. With respect to some of the time instances, a 2D user profile representing a region in the image data occupied by the user is generated based on a corresponding face detected and a corresponding 3D space in the 3D scene is estimated based on calibration parameters associated with the camera. Such estimated 3D space occupied by the user in the 3D scene is used to dynamically update a 3D space occupancy record of the 3D scene.
-
公开(公告)号:WO2019161196A2
公开(公告)日:2019-08-22
申请号:PCT/US2019/018212
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant , DHARNE, Ashwin
IPC: G10L15/25
Abstract: The present teaching relates to method, system, medium, and implementations for detecting a source of speech sound in a dialogue. A visual signal acquired from a dialogue scene is first received, where the visual signal captures a person present in the dialogue scene. A human lip associated with the person is detected from the visual signal and tracked to detect whether lip movement is observed. If lip movement is detected, a first candidate source of sound is generated corresponding to an area in the dialogue scene where the lip movement occurred.
-
公开(公告)号:WO2019133684A1
公开(公告)日:2019-07-04
申请号:PCT/US2018/067634
申请日:2018-12-27
Applicant: DMAI, INC.
Inventor: RAJAB, Nawar , SHUKLA, Nishant
Abstract: The present teaching relates to method, system, and medium for cross network communications. Information related to an application running on a user device is first received, which includes a state of the application and sensor data obtained with respect to a user interacting with the application on the user device. A request is sent to an application server for an instruction of a state transition of the application. A light weight model (LWM) for an object involved in the state transition is received and is personalized based on at least one of the sensor data and one or more preferences related to the user to generate a personalized model (PM) for the object, which is then sent to the user device.
-
公开(公告)号:WO2019161196A3
公开(公告)日:2019-08-22
申请号:PCT/US2019/018212
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant , DHARNE, Ashwin
Abstract: The present teaching relates to method, system, medium, and implementations for detecting a source of speech sound in a dialogue. A visual signal acquired from a dialogue scene is first received, where the visual signal captures a person present in the dialogue scene. A human lip associated with the person is detected from the visual signal and tracked to detect whether lip movement is observed. If lip movement is detected, a first candidate source of sound is generated corresponding to an area in the dialogue scene where the lip movement occurred.
-
10.
公开(公告)号:WO2019161193A3
公开(公告)日:2019-08-22
申请号:PCT/US2019/018209
申请日:2019-02-15
Applicant: DMAI, INC.
Inventor: SHUKLA, Nishant
IPC: G06F17/30 , G10L15/26 , G10L15/30 , G10L15/183 , G10L15/197
Abstract: The present teaching relates to method, system, medium, and implementations for speech recognition in a spoken language. Upon receiving a speech signal representing an utterance of a speaker in one of a plurality of spoken languages, speech recognition is performed based on the speech signal in accordance with a plurality of speech recognition models corresponding to the plurality of spoken languages to generate a plurality of text strings each of which represents a speech recognition result in a corresponding one of the plurality of spoken languages. With respect to each of the plurality of text strings associated with a corresponding spoken language, a likelihood that the utterance is in the corresponding spoken language is computed. A spoken language of the utterance is determined based on the likelihood with respect to each of the plurality of text strings.
-
-
-
-
-
-
-
-
-