Abstract:
The present teaching relates to method, system, medium, and implementations for an automated dialogue companion. Multimodal input data associated with a user engaged in a dialogue of a certain topic in a dialogue scene are first received and used to extract features representing a state of the user and relevant information associated with the dialogue scene. A current state of the dialogue characterizing the context of the dialogue is generated based on the state of the user and the relevant information associated with the dialogue scene. A response communication for the user is determined based on a dialogue tree corresponding to the dialogue of the certain topic, the current state of the dialogue, and utilities learned based on historic dialogue data and the current state of the dialogue.
Abstract:
A computer-implemented method to populate an electronic record may include generating first transcript data of first audio of a first speaker during a conversation between the first speaker and a second speaker. The method may also include generating second transcript data of second audio of the second speaker during the conversation and identifying one or more words from the first transcript data as being a value for a record field based on the identified words corresponding to the record field and the one or more words being from the first transcript data and not being from the second transcript data. The method may further include providing the identified words to an electronic record database as a value for the record field of a user record of the first speaker.
Abstract:
Provided is a system where users performing a coordinated process are localized in a complex environment based upon audio input. Audio commands are detected and executed based on system user vocalization. Available commands are limited by user status, location, process type and process progress. Command execution is limited by the presence and locations of system users, non-users, or extraneous equipment.
Abstract:
A method for configuring an automated, speech driven self-help system based on prior interactions between a plurality of customers and a plurality of agents includes: recognizing, by a processor, speech in the prior interactions between customers and agents to generate recognized text; detecting, by the processor, a plurality of phrases in the recognized text; clustering, by the processor, the plurality of phrases into a plurality of clusters; generating, by the processor, a plurality of grammars describing corresponding ones of the clusters; outputting, by the processor, the plurality of grammars; and invoking configuration of the automated self-help system based on the plurality of grammars.
Abstract:
According to some aspects, a method of processing user input received from a user is provided. The method comprises generating a plurality of segmentation hypotheses from content of the user input based, at least in part, on a set of parameters, querying a domain- specific database using each of the plurality of segmentation hypotheses to obtain at least one result, and modifying at least one of the set of parameters based, at least in part, on the at least one result.
Abstract:
A system and method for providing a voice assistant including receiving, at a first device, a first audio input from a user requesting a first action; performing automatic speech recognition on the first audio input; obtaining a context of user; performing natural language understanding based on the speech recognition of the first audio input; and taking the first action based on the context of the user and the natural language understanding.
Abstract:
Systems and methods for providing a user adaptive natural language interface are disclosed. The disclosed embodiments may receive and analyze user input to derive current user behavior data, including data indicative of characteristics of the user input. The user input is classified based on prior user behavior data previously logged during one or more previous user-system interactions and the current user behavior data to generate a classification of the user input. Machine learning algorithms can be employed to classify the user input. User adaptive utterances are selected based on the user input and the classification of the user input. The user-system interaction is logged for use as prior user behavior data in future user-system interactions. A response to the user input is generated, including synthesizing output speech from the user adaptive utterances selected. Example applications of the disclosed systems and methods provide user adaptive navigation directions in navigation systems.
Abstract:
According to one embodiment, a speech interaction apparatus for performing an interaction with a user based on a scenario includes a speech recognition unit, a determination unit, a selection unit and an execution unit. The speech recognition unit recognizes a speech of the user and generates a recognition result text. The determination unit determines whether or not the speech includes an interrogative intention based on the recognition result text. The selection unit selects, when the speech includes the interrogative intention, a term of inquiry from a response sentence in the interaction in accordance with timing of the speech, the term of inquiry being a subject of the interrogative intention. The execution unit executes an explanation scenario including an explanation of the term of inquiry.
Abstract:
The present invention enables voice interaction to continue at a suitable timing without requiring high processing capacity and regardless of deviations in the flow of conversation. This data structure at least comprises, as one set, speech content (Speak) which is spoken to a user, response content (Return) with which a spoken dialogue is established with the speech content, and attribute information (Entity) which indicates attributes of the speech content.
Abstract:
According to one embodiment, an interaction apparatus includes an interaction apparatus includes a storage, a first extractor, a retriever, a generator, a second extractor and a register. The storage stores a problem and at least one solution for solving the problem. The first extractor extracts a target problem which is an expression regarded as the problem, from a first speech. The generator generates a first speech-prompting sentence prompting the user to make a speech including the target solution if the storage stores no target solution or if the user rejects the target solution. The second extractor extracts the target solution from a second speech which is a response of the user relating to the first speech-prompting sentence. The register registers, on the storage, the target problem and the target solution.