Abstract:
Un synthétiseur numérique audio qui comprend : une mémoire d'entrée pour recevoir une suite de données numériques représentatives du spectre d'amplitude d'un signai audio, sur des fenêtres temporelles consécutives et chevauchantes, un calculateur (120), agencé pour recevoir en entrée un jeu de données numériques d'esquisse d'une fenêtre courante comprenant en début de fenêtre des données extrapolées d'amplitude, et des valeurs nulles pour le reste de la fenêtre, et pour établir en réponse une représentation numérique de la transformée de Fourier discrète complexe de ce jeu, un composeur (130), agencé pour combiner l'entrée de spectre d'amplitude associée à la fenêtre courante considérée et la représentation numérique déterminée par le calculateur, et pour appeler le calculateur (120) avec les données résultantes pour établir une représentation numérique de la transformée de Fourier discrète complexe inverse correspondante, ce qui fournit, un jeu de données numériques estimées, relatives à la fenêtre courante considérée, et un additionneur (140), pour cumuler sélectivement les données numériques estimées qui correspondent à un même temps, le composeur ( 130) est agencé pour calculer un jeu de données numériques auxiliaires (Xi(n)), en prenant le jeu de données numériques estimées (z(n)) courant, divisé par une fonction de fenêtre sur chaque fenêtre temporelle, l'additionneur (140) est agencé pour ajouter le jeu de données numériques estimées courant multiplié par la fonction de fenêtre (H) à la valeur précédente du cumul, un cxtrapolateur (110) agencé pour calculer le jeu de données numériques d'esquisse pour une fenêtre courante à partir du jeu de données numériques auxiliaires de la fenêtre précédente multiplié sélectivement par le carré de la fonction de fenêtre.
Abstract:
A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
Abstract:
An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, coding identification information indicating whether to apply a first coding scheme or a second coding scheme to a current frame; when the coding identification information indicates that the second coding scheme is applied to the current frame, receiving window type information indicating a particular window for the current frame, from among a plurality of windows; identifying that a current window is a stop start window based on the window type information, wherein the stop start window follows one of a long start window, a short window and a window of the first coding scheme for a previous frame, wherein the stop start window is followed by one of a long stop window, a short window and a window of the first coding scheme for a following frame, wherein the stop_start window includes a gentle-gentle stop_start window, a gentle-steep stop start window, a steep-gentle stop_start window and a steep-steep stop_start window; when the first coding scheme is applied to a previous frame, applying one of the gentle-gentle stop start window and the gentle-steep stop start window to the current frame; and, when the first coding scheme is applied to a following frame, applying one of the gentle-gentle stop start window and the steep-gentle stop_start window to the current frame, wherein: the gentle-gentle stop start window comprise an ascending line with first slope and a descending line with the first slope, the gentle-steep stop start window comprise an ascending line with the first slope and a descending line with second slope, the steep-gentle stop start window comprise an ascending line with the second slope and a descending line with first slope, the steep-steep stop start window comprise an ascending line with the second slope and a descending line with the second slope, and, the first slope is gentler than the second slope.
Abstract:
An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, coding identification information indicating whether to apply a first coding scheme or a second coding scheme to a current frame, when the coding identification information indicates that the second coding scheme is applied to the current frame, receiving window type information indicating a particular window for the current frame, from among a plurality of windows; identifying that a current window is a short window based on the window type information, wherein the short window has one fixed shape which comprises a plurality of short parts overlapped together, and, applying the short window of the fixed shape to the current frame, wherein the short window follows one of a long_start window, a stop_start window and a window of the first coding scheme for a previous frame, wherein the short window is followed by one of a long_stop window, the stop_start window and the window of the first coding scheme for a following frame.
Abstract:
Methods and devices for encoding and decoding are provided. A source signal value is encoded by a quantization index determined using a partition into quantization cells. Decoding of the quantization index takes place by sampling a reconstruction probability distribution, thereby obtaining a reconstructed signal value, such that the reconstructed signal value lies in the same quantization cell as the source signal value. In one embodiment, encoding and decoding are such that their succession preserves the source signal distribution. In another embodiment, the partition and the reconstruction probability distribution are determined in such manner that the quantization error is minimized subject to a constraint on the relative entropy between the source signal and the reconstructed signal.
Abstract:
Systems and methods for providing object-oriented audio are described. Audio objects can be created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device. The objects can define their locations in space with associated two or three dimensional coordinates. The objects can be adaptively streamed to the client device based on available network or client device resources. A renderer on the client device can use the attributes of the objects to determine how to render the objects. The renderer can further adapt the playback of the objects based on information about a rendering environment of the client device. Various examples of audio object creation techniques are also described.
Abstract:
A method and an apparatus for processing an audio signal are disclosed. In the present invention, an audio signal is encoded and decoded on the basis of the sound-source motion, reverberation characteristics, and semantic objects included in the audio signal, thereby enabling more faithful reproduction of audio and efficient search and editing of the same.
Abstract:
At least one exemplary embodiment is directed to a method and/or a device for voice operated control. The method can include method measuring an ambient sound received from at least one Ambient Sound Microphone, measuring an internal sound received from at least one Ear Canal Microphone, detecting a spoken voice from a wearer of the earpiece based on an analysis of the ambient sound and the internal sound, and controlling at least one voice operation of the earpiece if the presence of spoken voice is detected. The analysis can be a non-difference comparison such as a correlation analysis, a cross-correlation analysis, and a coherence analysis.
Abstract:
The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine.
Abstract:
Provide are systems, methods and techniques for processing frame-based data. A frame of data, an indication that a transient occurs within the frame, and a location of the transient within the frame are obtained. Based on the indication f the transient, a block size is set for the frame, thereby effectively defining a plurality of equal-sized blocks with the frame. In addition, different window functions are selected for efferent ones of the plurality of equal-sized blocks based on the location of the transient, and the framed of data is processed by applying the selected window functions.