摘要:
The present invention provides a method and system to improve speech recognition using an existing audio realization of a spoken text and a true textual representation of the spoken text. The audio realization and the true textual representation can be aligned to reveal time stamps. A speech recognition can be performed on the audio realization to provide a hypothesis textual representation for the audio realization. The aligned true textual representation can be compared with the hypothesis textual representation. Single word pairs from the true and the hypothesis textual representations can be selected where the representations are different. Similarly, single word pairs can be selected from each representation where the representations are identical. A word or pronunciation database can be updated using the selected single word pairs together with the corresponding aligned audio realization.
摘要:
Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.
摘要:
A digitized speech signal (600) is input to an F0 (fundamental frequency) processor that computes (610) a continuous F0 data from the speech signal. By the criterion voicing state transition (voiced/unvoiced transitions) the speech signal is presegmented (620) into segments. For each segment (630) it is evaluated (640) whether F0 is defined or not defined i.e. whether F0 is ON or OFF. In case of F0=OFF a candidate segment boundary is assumed as described above and, starting from that boundary, prosodic features are computed (650). The feature values are input into a classification tree and each candidate segment is classified thereby revealing, as a result, the existence or non-existence of a semantic or syntactic speech unit.
摘要:
A method and apparatus for creating links between a representation, (e.g. text data,) and a realization, (e.g. corresponding audio data,) is provided. According to the invention the realization is structured by combining a time-stamped version of the representation generated from the realization with structural information from the representation. Thereby so called hyper links between representation and realization are created. These hyper links are used for performing search operations in realization data equivalent to those which are possible in representation data, enabling an improved access to the realization (e.g. via audio databases).
摘要:
A method and apparatus for creating links between a representation, (e.g. text data,) and a realization, (e.g. corresponding audio data,) is provided. According to the invention the realization is structured by combining a time-stamped version of the representation generated from the realization with structural information from the representation. Thereby so called hyper links between representation and realization are created. These hyper links are used for performing search operations in realization data equivalent to those which are possible in representation data, enabling an improved access to the realization (e.g. via audio databases).
摘要:
A characteristic identifier for digital data is generated. Thereby, the information contained in a digital data set is reduced such that the resulting identifier is made comparable to another identifier made in the same manner. The generated identifiers are used for detecting identical digital data or to determine inexact copies of digital data. In one embodiment of the invention, the digital data is a digital audio signal and the characteristic identifier is called an audio signature. The comparison of identical audio data according to the invention can be carried out without a person actually listening to the audio data. The present invention can be used to establish automated processes to find potential unauthorized copies of audio data, e.g., music recordings, and therefore enables a better enforcement of copyrights in the audio industry.
摘要:
A method and apparatus for creating links between a representation, (e.g. text data,) and a realization, (e.g. corresponding audio data,) is provided. According to the invention the realization is structured by combining a time-stamped version of the representation generated from the realization with structural information from the representation. Thereby so called hyper links between representation and realization are created. These hyper links are used for performing search operations in realization data equivalent to those which are possible in representation data, enabling an improved access to the realization (e.g. via audio databases).
摘要:
The present invention provides a method and system for computerized synchronization of an audio stream having a first synchronized textual representation usable for subtitling of the audio stream in the original language, with a second synchronized textual representation which can be used as an alternative subtitle including a transcription of the original language into another language. Time synchronous links can be built between the audio stream and, for instance, the textual representations of the words spoken in the audio stream. More particularly, the second representation can inherit the synchronization between the audio stream and the first representation using structure association information determined between the first and the second representation.
摘要:
The present invention relates to method and system for providing online information in a networked user environment in which an end-user runs an application program and transmits data to an online server while running the application program. It is proposed to provide a request-button at the end-user application program dedicated to requesting information, and in particular help-information. When a help request is received at the communication server, a communication channel is promptly established between end-user and an agent. Information about the user activities sent in one or more transaction parts of an end-user intended business process and performed in the current application program session is read from the storage in the application server and is provided to the terminal of said agent in the help center. Advantageously, the same communication channel as used for performing the transactions is used for voice transmission for providing help or other information to the end-user.
摘要:
A system and method of generating a link between a note of a digital score and a realization of the score are provided. To do so, a digital score is processed to generate an onset curve. The onset curve is then filtered to generate a first series of first time intervals, which each have a significant number of onsets. A realization of the digital score is also processed to generate a second series of second time intervals, which each have a significant dynamic change of the realization. The first and the second series of time intervals are then correlated to produce the link.