Abstract:
A computer system comprises an input configured to receive voice input from a user, the voice input having speech intervals separated by non-speech intervals; an ASR system configured to identify individual words in the voice input during speech intervals of the voice input, and store the identified words in memory; a speech overload detection module configured to detect at a time during a speech interval of the voice input a speech overload condition; and a notification module configured to output to the user, in response to said to detection, a notification of the speech overload condition.
Abstract:
Systems and processes are disclosed for operating a digital assistant for media search and playback. In an exemplary process, an audio input containing a media search request can be received. A primary user intent corresponding to the media search request can be determined and one or more secondary user intents based on one or more previous user intents can be determined. A primary set of media items corresponding to the primary user intent can be displayed and one or more secondary sets of media items corresponding to the one or more secondary user intents can be displayed.
Abstract:
Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
Abstract:
The invention relates to a method for acquiring at least two pieces of information to be acquired, comprising information content to be linked, using a speech dialogue device. A speech output is produced by the speech dialogue device between each acquisition of information. Each piece of information is acquired by acquiring natural vocal speech input data and by extracting the respective information from the speech input data using a speech recognition algorithm. When a repetition condition has been satisfied, a natural speech summary output is generated by the speech dialogue device and output as speech output which comprises a natural vocal reproduction of at least one previously acquired piece of information or a part of said piece of information or a piece of information derived from said piece of information.
Abstract:
Systems and methods for responding to an audio query are presented. More particularly, vocalization nuances of a vocalized search query (audio query) are identified are utilized in responding to the audio query. In addition to converting the audio query to a textual representation, vocalization nuances of the audio query are identified. Search results are identified according to the textual representation of the audio query and in light of the vocalization nuances. A search results presentation is prepared in response to the audio query, where the search results presentation is based on the identified search results and also based on the vocalization nuances. The search results presentation is returned in response to the audio query.
Abstract:
온라인 음성인식을 처리하는 음성인식 클라이언트 시스템, 음성인식 서버 시스템 및 음성인식 방법이 개시된다. 음성인식의 시작시점부터 종료시점까지 입력되는 소리신호에 대한 음성인식 결과를 표시하는 음성인식 클라이언트 시스템은, 시작시점부터 종료시점까지 기선정된 단위시간마다 입력되는 단위소리신호를 단위시간마다 음성인식 서버 시스템으로 전송하고, 음성인식 서버 시스템으로부터 음성인식 중간 결과를 수신하는 통신부 및 수신된 음성인식 중간 결과를 시작시점과 종료시점 사이에 표시하는 표시부를 포함한다.
Abstract:
Methods, systems and articles for receiving, by a telecommunication device, audio input through a unified audio interface are disclosed herein. The telecommunication device is further configured to perform at least one of a dictation action, an incoming message processing action, a navigation action, a content lookup action, or a contact lookup action while continuously or substantially continuously receiving voice commands from a user. In some aspects, the telecommunications device may continuously receive and process voice command while operating in a driving mode, which may be initiated by the telecommunications device.