Abstract:
The invention relates to a method for creating a confirmation mechanism in automatic dictation systems by using speech synthesis (Text-to-Speech-TTS) and feature of segmentation in addition to speech recognition (SR) (Speech Recognition - SR). The invention relates to a system equipped with at least one speech recognition (SR) module (23) converting the words of the users (21) to text by automatically recognising them, a microphone (22) providing input to this module, at least one monitor (24) on which the text can be displayed and edited, belonging to at least one device wherein the dictation system operates, at least one speech synthesis (TTS) module converting the text created as a result of automatic speech recognition to audio output (25), a segmentation module (27) dividing the output created through the speech synthesis into its parts when necessary and a headphone (26) transmitting these outputs to the user (21).
Abstract:
A method for receiving processed information at a remote device is described. The method includes transmitting from the remote device a verbal request to a first information provider and receiving a digital message from the first information provider in response to the transmitted verbal request. The digital message includes a symbolic representation indicator associated with a symbolic representation of the verbal request and data used to control an application. The method also includes transmitting, using the application, the symbolic representation indicator to a second information provider for generating results to be displayed on the remote device.
Abstract:
A correction device (12) for correcting text passages in a recognized text information (RTI) which recognized text information (RTI) is recognized by a speech recognition device from a speech information and which is therefore associated to the speech information comprises a reception unit for receiving the speech information and the associated recognized text information (RTI) and a link information, which link information at each text passage of the associated recognized text information (RTI) marks the part of the speech information at which the text passage was recognized by the speech recognition device, and a confidence level information (CLI), which confidence level information (CLI) at each text passage of the recognized text information (RTI) represents a correctness of the recognition of said text passage and comprises a synchronous playback unit for performing a synchronous playback mode, in which synchronous playback mode during an acoustic playback of the speech information the text passage of the recognized text information (RTI) associated to the speech information just played back and marked by the link information is marked synchronously and comprises an indication unit for indicating the confidence level information (CLI) of a text passage of the text information during the synchronous playback.
Abstract:
The method and apparatus for displaying speech recognition results includes a recognition filter (102) that receives a recognition result list (106) which includes a plurality of speech recognized terms (122), wherein each speech recognized term (122) has a term-specific recognition confidence value (124). The recognition filter (102) generates a modified recognition result list (108) that includes one or more speech recognized terms having term-specific recognition confidence values. The method and apparatus also includes a display generator (104) that receives the modified recognition result list (108) and generates a graphical recognition result list (110). The graphical recognition result list (110) includes speech recognized terms and a non-alphanumeric symbol as a graphical representation (274) of the term-specific recognition confidence value (128) and a speech recognized term (126).
Abstract:
Disclosed is a data selection mechanism for identifying a single data item from a plurality of data items, each data item having an associated plurality of related descriptors each having an associated descriptor value. The data selection mechanism comprises a pattern matching mechanism for identifying candidate matching descriptor values that correspond to user-generated input, and a filter mechanism for providing a filtered data set comprising the single data item. The pattern matching mechanism is operable to apply one or more pattern recognition models to first user-generated input to generate one or more hypothesised descriptor values for each of the one or more pattern recognition models. The filter mechanism is operable to: i) create a data filter from the hypothesised descriptor values produced by the one or more pattern recognition models to apply to the plurality of data items to produce a filtered data set of candidate data items; and ii) select one or more subsequent pattern recognition models for applying to further user-generated input.
Abstract:
Die Erfindung betrifft ein Verfahren zum sprachgeführten Steuern von Werkzeugmaschinen (1) durch eine Bedienperson, die Steuerworte artikuliert, wobei die artikulierten Steuerworte von einem Schallwandler (11) erfasst und von einer Sprachverarbeitungseinheit in Steuerbefehle für die Werkzeugmaschine (1) umgewandelt werden, wobei mindestens einem Aggregat der Werkzeugmaschine (1) ein Steuerwort zugeordnet ist und wobei der mindestens eine Aggregat durch Artikulieren des Steuerwortes in Verbindung mit einem Ausführungscode aktiviert wird.
Abstract:
A portion of speech from a near-end user is captured. A near-end user terminal conducts a communication session, over a network, between the near-end user and one or more far- end users, the session including a message sent to the one or more far-end users. A vetting mechanism is provided via a touchscreen user interface of the near-end user terminal, to allow the near-end user to vet an estimated transcription of the portion of speech prior to being sent to the one or more far-end users in the message. According to the vetting mechanism: (i) a first gesture performed by the near-end user through the touchscreen user interface accepts the estimated transcription to be included in a predetermined role in the sent message, whilst (ii) one or more second gestures performed by the near-end user through the touchscreen user interface each reject the estimated transcription to be sent in the message.