-
公开(公告)号:US10008209B1
公开(公告)日:2018-06-26
申请号:US15273830
申请日:2016-09-23
发明人: Yao Qian , Jidong Tao , David Suendermann-Oeft , Keelan Evanini , Alexei V. Ivanov , Vikram Ramanarayanan
摘要: Systems and methods are provided for providing voice authentication of a candidate speaker. Training data sets are accessed, where each training data set comprises data associated with a training speech sample of a speaker and a plurality of speaker metrics, where the plurality of speaker metrics include a native language of the speaker. The training data sets are used to train a neural network, where the data associated with each training speech sample is a training input to the neural network, and each of the plurality of speaker metrics is a training output to the neural network. Data associated with a speech sample is provided to the neural network to generate a vector that contains values for the plurality of speaker metrics, and the values contained in the vector are compared to values contained in a reference vector associated with a known person to determine whether the candidate speaker is the known person.
-
公开(公告)号:US11861317B1
公开(公告)日:2024-01-02
申请号:US17245045
申请日:2021-04-30
IPC分类号: G06F40/35 , G06F40/205 , G10L25/30
CPC分类号: G06F40/35 , G06F40/205 , G10L25/30
摘要: Human-machine dialog is characterized by receiving data comprising a recording of an individual interacting with a dialog application simulating a conversation. Thereafter, the received data is parsed using automated speech recognition to result in text comprising a plurality of words. Features are extracted from the parsed data and then input an ensemble of different machine learning models each trained to generate a score characterizing a plurality of different dialog constructs. Thereafter, scores generated by the machine learning models for each of the dialog constructs are fused. A performance score is then generated based on the fused scores which characterizes a conversational proficiency of the individual interacting with the dialog application. Data can then be provided which includes or otherwise characterizes the generated score. Related apparatus, systems, techniques and articles are also described.
-
公开(公告)号:US11222627B1
公开(公告)日:2022-01-11
申请号:US16197704
申请日:2018-11-21
发明人: Yao Qian , Rutuja Ubale , Vikram Ramanarayanan , Patrick Lange , David Suendermann-Oeft , Keelan Evanini , Eugene Tsuprun
摘要: Systems and methods are provided for conducting a simulated conversation with a language learner include determining a first dialog state of the simulated conversation. First audio data corresponding to simulated speech based on the dialog state is transmitted. Second audio data corresponding to a variable length utterance spoken in response to the simulated speech is received. A fixed dimension vector is generated based on the variable length utterance. A semantic label is predicted for the variable-length utterance based on the fixed dimension vector. A second dialog state of the simulated conversation is determined based on the semantic label, and third audio data corresponding to simulated speech is transmitted based on the second dialog state.
-
4.
公开(公告)号:US10592733B1
公开(公告)日:2020-03-17
申请号:US15600206
申请日:2017-05-19
发明人: Vikram Ramanarayanan , David Suendermann-Oeft , Patrick Lange , Alexei V. Ivanov , Keelan Evanini , Yao Qian , Eugene Tsuprun , Hillary R. Molloy
摘要: Systems and methods are provided providing a spoken dialog system. Output is provided from a spoken dialog system that determines audio responses to a person based on recognized speech content from the person during a conversation between the person and the spoken dialog system. Video data associated with the person interacting with the spoken dialog system is received. A video engagement metric is derived from the video data, where the video engagement metric indicates a level of the person's engagement with the spoken dialog system.
-
公开(公告)号:US10283142B1
公开(公告)日:2019-05-07
申请号:US15215649
申请日:2016-07-21
发明人: Zhou Yu , Vikram Ramanarayanan , David Suendermann-Oeft , Xinhao Wang , Klaus Zechner , Lei Chen , Jidong Tao , Yao Qian
摘要: Systems and methods are provided for a processor-implemented method of analyzing quality of sound acquired via a microphone. An input metric is extracted from a sound recording at each of a plurality of time intervals. The input metric is provided at each of the time intervals to a neural network that includes a memory component, where the neural network provides an output metric at each of the time intervals, where the output metric at a particular time interval is based on the input metric at a plurality of time intervals other than the particular time interval using the memory component of the neural network. The output metric is aggregated from each of the time intervals to generate a score indicative of the quality of the sound acquired via the microphone.
-
6.
公开(公告)号:US11556754B1
公开(公告)日:2023-01-17
申请号:US15452809
申请日:2017-03-08
发明人: Vikram Ramanarayanan , Saad Khan
摘要: Systems and methods for computer-implemented evaluation of a performance are provided. In a first aspect, a computer-implemented method of evaluating an interaction generates a first temporal record of first behavior features exhibited by a first entity during an interaction between a first entity and a second entity. A second temporal record is generated including second behavior features exhibited by a second entity during an interaction with a first entity. A determination is made that a first feature of a first temporal record is associated with a second feature of a second temporal record. The length of time that passes between the first feature and second feature is evaluated, and a determination is made that the length of time satisfies a temporal condition. A co-occurrence record associated with a first feature and a second feature is generated and included in a co-occurrence record data-structure.
-
公开(公告)号:US11238844B1
公开(公告)日:2022-02-01
申请号:US16255220
申请日:2019-01-23
摘要: Systems and methods for identifying a person's native language and/or non-native language based on code-switched text and/or speech, are presented. The systems may be trained using various methods. For example, a language identification system may be trained using one or more code-switched corpora. Text and/or speech features may be extracted from the corpora and used, in combination with a per-word language identify of the text and/or speech, to train at least one machine learner. Code-switched text and/or speech may be received and processed by extracting text and/or speech features. These features may be fed into the at least one machine learner to identify the person's native language.
-
公开(公告)号:US11132913B1
公开(公告)日:2021-09-28
申请号:US15133775
申请日:2016-04-20
发明人: Vikram Ramanarayanan , Mark Katz , Eric Steinhauer , Ravindran Ramaswamy , David Suendermann-Oeft
摘要: Systems and methods are provided for acquiring physical-world data indicative of interactions of a subject with an avatar for evaluation. An interactive avatar is provided for interaction with the subject. Speech from the subject to the avatar is captured, and automatic speech recognition is performed to determine content of the subject speech. Motion data from the subject interacting with the avatar is captured. A next action of the interactive avatar is determined based on the content of the subject speech or the motion data. The next action of the avatar is implemented, and a score for the subject is determined based on the content of the subject speech and the motion data.
-
9.
公开(公告)号:US10607504B1
公开(公告)日:2020-03-31
申请号:US15272903
申请日:2016-09-22
发明人: Vikram Ramanarayanan , David Suendermann-Oeft , Patrick Lange , Alexei V. Ivanov , Keelan Evanini , Yao Qian , Zhou Yu
摘要: Systems and methods are provided for implementing an educational dialog system. An initial task model is accessed that identifies a plurality of dialog states associated with a task, a language model configured to identify a response meaning associated with a received response, and a language understanding model configured to select a next dialog state based on the identified response meaning. The task is provided to a plurality of persons for training. The task model is updated by revising the language model and the language understanding model based on responses received to prompts of the provided task, and the updated task is provided to a student for development of speaking capabilities.
-
公开(公告)号:US10176365B1
公开(公告)日:2019-01-08
申请号:US15133640
申请日:2016-04-20
摘要: Computer-implemented systems and methods for evaluating a performance are provided. Motion of a user in a performance is detected using a motion capture device. Data collected by the motion capture device is processed with a processing system to identify occurrences of first and second types of actions by the user. The data collected by the motion capture device is processed with the processing system to determine values indicative of amounts of time between the occurrences. A non-verbal feature of the performance is determined based on the identified occurrences and the values. A score for the performance is generated using the processing system by applying a computer scoring model to the non-verbal feature.
-
-
-
-
-
-
-
-
-