摘要:
A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.
摘要:
The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.
摘要:
Systems and methods of monitoring communications channels and automatically providing selective notifications through a network that messages containing useful information, transmitted in the form of voice content, have been received. Keywords are compared with textual data transcribed from voice messages receive on a channel. The textual data and the keywords are compared, and upon identifying a correlation therebetween, a notification is automatically generated that indicates receipt of a given message, the existence of the correlation with the keywords, and an identity of the channel, so that client terminals can receive the message and also receive subsequent or related messages.
摘要:
Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.
摘要:
Provided are a terminal and server of a speaker-adaptation speech-recognition system and a method for operating the system. The terminal in the speaker-adaptation speech-recognition system includes a speech recorder which transmits speech data of a speaker to a speech-recognition server, a statistical variable accumulator which receives a statistical variable including acoustic statistical information about speech of the speaker from the speech-recognition server which recognizes the transmitted speech data, and accumulates the received statistical variable, a conversion parameter generator which generates a conversion parameter about the speech of the speaker using the accumulated statistical variable and transmits the generated conversion parameter to the speech-recognition server, and a result displaying user interface which receives and displays result data when the speech-recognition server recognizes the speech data of the speaker using the transmitted conversion parameter and transmits the recognized result data.
摘要:
Systems and methods of rendering a textual animation are provided. The methods include receiving an audio sample of an audio signal that is being rendered by a media rendering source. The methods also include receiving one or more descriptors for the audio signal based on at least one of a semantic vector, an audio vector, and an emotion vector. Based on the one or more descriptors, a client device may render the textual transcriptions of vocal elements of the audio signal in an animated manner. The client device may further render the textual transcriptions of the vocal elements of the audio signal to be substantially in synchrony to the audio signal being rendered by the media rendering source. In addition, the client device may further receive an identification of a song corresponding to the audio sample, and may render lyrics of the song in an animated manner.
摘要:
A voice-activated signal generator is a device to produce output signals responsive to spoken commands. The device accepts only predetermined commands and responsively generates specific output signals such as a pulse, a series of pulses, a voltage level, or a periodic waveform. The device is suitable for triggering an oscilloscope, or controlling a circuit under test, or activating another instrument. The invention also enables safely controlling a hazardous system such as a high voltage system, hands-free and with precise timing determined by the user. Also disclosed are fast, compact, robust algorithms for analyzing spoken commands, and particularly for detecting voiced and unvoiced sound, and for identifying commands by comparing the order of sound intervals in the spoken command to templates that represent the predetermined commands. The device may have one output or multiple outputs in parallel, all controlled by voice commands with precision output timing.
摘要:
Estimating cognitive-load of a user in human-machine interaction by identifying an expression of cognitive-load within a user expression captured by a dialogue system and using a user model to estimate a level of the cognitive-load based on the expression of cognitive-load.
摘要:
An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.
摘要:
A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).