Patent search ipc:"G10L19/02" Page 1

1.

发明申请
AUDIO DECODER, AUDIO ENCODER, METHOD FOR DECODING, METHOD FOR ENCODING AND BITSTREAM, USING SCENE CONFIGURATION PACKET A CELL INFORMATION DEFINES AN ASSOCIATION BETWEEN THE ONE OR MORE CELLS AND RESPECTIVE ONE OR MORE DATA STRUCTURES 审中-公开

公开(公告)号：WO2023083921A1

公开(公告)日：2023-05-19

申请号：PCT/EP2022/081373

申请日：2022-11-09

Applicant: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Inventor： DISCH, Sascha , SCHWÄR, Simon , HASSAN, Kahleel Porter

IPC: G10L19/02 , G06F3/16 , G10L19/16 , G10L19/008 , H04S7/00

Abstract: Embodiments according to the invention are related to an audio decoder, for providing a decoded audio representation on the basis of an encoded audio representation, wherein the audio decoder is configured to spatially render one or more audio signals; wherein the audio decoder is configured to receive a plurality of packets of different packet types, the packets comprising one or more scene configuration packets providing a renderer configuration information defining a usage of scene objects and/or a usage of scene characteristics, the packets comprising one or more scene update packets defining a update of scene metadata for the rendering, the packets comprising one or more scene payload packets comprising definitions of one or more of the scene objects and/or definitions of one or more of the scene characteristics; wherein the audio decoder is configured to select definitions of one or more scene objects and/or definitions of one or more scene characteristics, which are in included in the scene payload packets, for the rendering in dependence on the renderer configuration information; and wherein the audio decoder is configured to update one or more scene metadata in dependence on a content of the one or more scene update packets. Further embodiments are related to encoders, methods and bitstreams. Further embodiments are related to decoders, encoders, methods and bitstreams with scene update packets with update conditions, with scene configuration packets providing a renderer configuration information defining a temporal evolution of a rendering scenario and with a timestamp information and/or with subscene cell information, wherein the cell information defines an association between the one or more cells and respective one or more data structures.

2.

发明申请
AUDIO SIGNAL RECONSTRUCTION 审中-公开

公开(公告)号：WO2023069805A1

公开(公告)日：2023-04-27

申请号：PCT/US2022/076172

申请日：2022-09-09

Applicant: QUALCOMM INCORPORATED

Inventor： SKORDILIS, Zisis Iason , DEWASURENDRA, Duminda , RAJENDRAN, Vivek

IPC: G10L25/30 , G10L25/18 , G10L19/02 , G10L21/02

Abstract: A method includes receiving audio data that includes magnitude spectrum data descriptive of an audio signal. The method also includes providing the audio data as input to a neural network to generate an initial phase estimate for one or more samples of the audio signal. The method further includes determining, using a phase estimation algorithm, target phase data for the one or more samples of the audio signal based on the initial phase estimate and a magnitude spectrum of the one or more samples of the audio signal indicated by the magnitude spectrum data. The method also includes reconstructing the audio signal based on a target phase of the one or more samples of the audio signal indicated by the target phase data and based on the magnitude spectrum.

3.

发明申请
MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES 审中-公开

公开(公告)号：WO2022271746A1

公开(公告)日：2022-12-29

申请号：PCT/US2022/034407

申请日：2022-06-21

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor： WENINGER, Felix , GAUDESI, Marco , LEIBOLD, Ralf , ZHAN, Puming

IPC: G10L15/34 , G10L15/26 , G10L15/20 , G10L15/22 , G10L15/04 , G10L19/02 , G10L21/0208 , G10L25/24

Abstract: An end-to-end automatic speech recognition (ASR) system includes: first encoder configured for close-talk input captured by a close-talk input mechanism; second encoder configured for far-talk input captured by far-talk input mechanism; and encoder selection layer configured to select at least one of first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and far-talk input. If signals from both the close-talk input mechanism and far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce ASR output.

4.

发明申请
APPARATUS AND METHOD FOR REMOVING UNDESIRED AUDITORY ROUGHNESS 审中-公开

公开(公告)号：WO2022268347A1

公开(公告)日：2022-12-29

申请号：PCT/EP2021/075816

申请日：2021-09-20

Applicant: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V. , FRIEDRICH-ALEXANDER-UNIVERSITAET ERLANGEN-NUERNBERG

Inventor： DISCH, Sascha , VAN DE PAR, Steven , NIEDERMEIER, Andreas , EDLER, Bernd

IPC: G10L19/26 , G10L19/005 , G10L19/02 , G10L21/003 , G10L21/0316

Abstract: An apparatus (100) for processing an audio input signal to obtain an audio output signal according to an embodiment. The apparatus (100) comprises a signal analyser (110) configured for determining information on an auditory roughness of one or more spectral bands of the audio input signal. Moreover, the apparatus (100) comprises a signal processor (120) configured for processing the audio input signal depending on the information on the auditory roughness of the one or more spectral bands.

5.

发明申请
编解码方法、装置、设备、存储介质及计算机程序审中-公开

公开(公告)号：WO2022258036A1

公开(公告)日：2022-12-15

申请号：PCT/CN2022/098016

申请日：2022-06-10

Applicant: 华为技术有限公司

Inventor： 李佳蔚 , 夏丙寅 , 王喆

IPC: G10L19/02 , G06N3/04 , G06N3/08

Abstract: 本申请实施例公开了一种编解码方法、装置、设备、存储介质及计算机程序，属于编解码技术领域。在本申请实施例中，通过对媒体数据的第一白化谱进行整形处理，以得到第二白化谱，之后基于第二白化谱进行编码。其中，第二白化谱在目标频段内的频谱幅度值大于或等于第一白化谱在目标频段内的频谱幅度值。可见本方案通过调高第一白化谱在目标频段内的频谱幅度值，使得到的第二白化谱中不同频率的谱线的统计平均能量相差较小，这样通过编码神经网络模型对第二白化谱进行处理的过程中，能够保留第二白化谱中更多的谱线，也即本方案能够编码更多的谱线，从而保留更多的频谱特征，编码质量得到提高。

6.

发明申请
AUDIO DIRECTIVITY CODING 审中-公开

公开(公告)号：WO2022248632A1

公开(公告)日：2022-12-01

申请号：PCT/EP2022/064343

申请日：2022-05-25

Applicant: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Inventor： HERRE, Jürgen , GHIDO, Florin

IPC: G10L19/008 , G10L19/04 , G10L19/02

Abstract: The application discloses techniques for compressively encoding and decoding an audio signal representing a directivity pattern, the audio values having different values according to different discrete positions defined on an unit sphere. The audio signal values are encoded in a bitstream as prediction residual values. The prediction residual values being used in sequences to obtained predicted audio signal values by moving on positions defined on parallel lines, parallel to an equator of the sphere, the parallel lines defined from a first pole toward a second pole of the sphere. The predicted values are obtained based on an initial prediction sequence, on adjacent discrete positions preceding a given position or interpolated versions of the audio values of a previously predicted adjacent parallel line.

7.

发明申请
音频处理、视频处理方法、装置、设备及存储介质审中-公开

公开(公告)号：WO2022227037A1

公开(公告)日：2022-11-03

申请号：PCT/CN2021/091615

申请日：2021-04-30

Applicant: 深圳市大疆创新科技有限公司

Inventor： 席迎来

IPC: G10L19/02 , G10L21/06

Abstract: 本申请提供一种音频处理、视频处理方法、装置、设备及存储介质；其中，音频处理方法包括：获取待处理的音频；识别所述音频的节拍点；根据所述节拍点的能量幅值，从所述音频的多个节拍点中筛选出至少一个节奏点并输出；其中，所述节奏点的能量幅值，大于所述多个节拍点中除所述节奏点之外的其他节拍点的能量幅值。本申请实施例的方案可以自动对音频打出节奏点；并且，本实施例方法对待处理的音频并未有限制，从而用户可以指定任意的带有音频的文件，使用户可以利用打出的节奏点进行后续处理，具有灵活性强的特点。

8.

发明申请
COMBINING SPATIAL AUDIO STREAMS 审中-公开

公开(公告)号：WO2022200666A1

公开(公告)日：2022-09-29

申请号：PCT/FI2021/050199

申请日：2021-03-22

Applicant: NOKIA TECHNOLOGIES OY

Inventor： LAITINEN, Mikko-Ville , VASILACHE, Adriana , PIHLAJAKUJA, Tapani , LAAKSONEN, Lasse, Juhani , RÄMÖ, Anssi Sakari

IPC: G10L19/032 , G10L19/008 , G10L19/02 , H04S7/00 , G10L19/022

Abstract: There is inter alia disclosed an apparatus for spatial audio encoding configured to determining an audio scene separation metric between an input audio signal and a further input audio signal, and using the audio scene separation metric for quantizing of at least one spatial audio parameter of the input audio signal.

9.

发明申请
语音合成方法、装置、设备及存储介质审中-公开

公开(公告)号：WO2022141678A1

公开(公告)日：2022-07-07

申请号：PCT/CN2021/072428

申请日：2021-01-18

Applicant: 科大讯飞股份有限公司

Inventor： 陈梦楠 , 江源 , 高丽 , 祖漪清

IPC: G10L13/02 , G10L17/02 , G10L17/04 , G10L19/00 , G10L19/02

Abstract: 一种语音合成方法、装置、设备及存储介质，语音合成方法包括获取原始文本、原始文本对应的音素序列，以及待合成语音的说话人特征（S100）；将原始文本以及音素序列进行特征融合，得到融合特征（S110）；基于融合特征及说话人特征进行编解码处理，得到声学频谱（S120）；基于声学频谱进行语音合成，得到合成语音（S130）。由此通过融合原始文本及音素序列得到融合特征，丰富了输入信息，并且能够挖掘不同语种特有的发音信息，得到的合成语音更加自然、符合对应语种的发音特点，也即合成语音的质量更高。

10.

发明申请
一种基于音乐频率的振动频率设计方法审中-公开

公开(公告)号：WO2022134213A1

公开(公告)日：2022-06-30

申请号：PCT/CN2021/070421

申请日：2021-01-06

Applicant: 瑞声声学科技（深圳）有限公司 , 瑞声光电科技（常州）有限公司

Inventor： 张燕昕 , 郑亚军

IPC: G10L21/06 , G10L19/02 , G10L19/032

Abstract: 本发明提供了一种基于音乐频率的振动频率设计方法，包括以下步骤：S1：预先设置一组量化模块，包括：个性输入量化模块；音乐特征量化模块；振动效果量化模块；S2：用户个性化参数输入，通过个性输入量化模块，获取个性输入的具体量化值；S3：提取音乐信号的音乐特征，通过音乐特征量化模块，获取音乐特征的具体量化值；S4：量化计算，按照公式进行计算，获取振动效果频率相对值；S5：通过振动效果量化模块，将振动效果频率相对值进行映射，获取振动效果频率绝对值；S6：马达基于振动效果频率绝对值播放振动。本发明的振动频率设计方法实现了听觉上的音乐频率到触觉上的振动频率完美转换，为设计人员或者用户提供了高效、丰富的触觉体验。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification