-
公开(公告)号:WO2023083921A1
公开(公告)日:2023-05-19
申请号:PCT/EP2022/081373
申请日:2022-11-09
Inventor: DISCH, Sascha , SCHWÄR, Simon , HASSAN, Kahleel Porter
IPC: G10L19/02 , G06F3/16 , G10L19/16 , G10L19/008 , H04S7/00
Abstract: Embodiments according to the invention are related to an audio decoder, for providing a decoded audio representation on the basis of an encoded audio representation, wherein the audio decoder is configured to spatially render one or more audio signals; wherein the audio decoder is configured to receive a plurality of packets of different packet types, the packets comprising one or more scene configuration packets providing a renderer configuration information defining a usage of scene objects and/or a usage of scene characteristics, the packets comprising one or more scene update packets defining a update of scene metadata for the rendering, the packets comprising one or more scene payload packets comprising definitions of one or more of the scene objects and/or definitions of one or more of the scene characteristics; wherein the audio decoder is configured to select definitions of one or more scene objects and/or definitions of one or more scene characteristics, which are in included in the scene payload packets, for the rendering in dependence on the renderer configuration information; and wherein the audio decoder is configured to update one or more scene metadata in dependence on a content of the one or more scene update packets. Further embodiments are related to encoders, methods and bitstreams. Further embodiments are related to decoders, encoders, methods and bitstreams with scene update packets with update conditions, with scene configuration packets providing a renderer configuration information defining a temporal evolution of a rendering scenario and with a timestamp information and/or with subscene cell information, wherein the cell information defines an association between the one or more cells and respective one or more data structures.
-
公开(公告)号:WO2023069805A1
公开(公告)日:2023-04-27
申请号:PCT/US2022/076172
申请日:2022-09-09
Applicant: QUALCOMM INCORPORATED
Inventor: SKORDILIS, Zisis Iason , DEWASURENDRA, Duminda , RAJENDRAN, Vivek
Abstract: A method includes receiving audio data that includes magnitude spectrum data descriptive of an audio signal. The method also includes providing the audio data as input to a neural network to generate an initial phase estimate for one or more samples of the audio signal. The method further includes determining, using a phase estimation algorithm, target phase data for the one or more samples of the audio signal based on the initial phase estimate and a magnitude spectrum of the one or more samples of the audio signal indicated by the magnitude spectrum data. The method also includes reconstructing the audio signal based on a target phase of the one or more samples of the audio signal indicated by the target phase data and based on the magnitude spectrum.
-
3.
公开(公告)号:WO2022271746A1
公开(公告)日:2022-12-29
申请号:PCT/US2022/034407
申请日:2022-06-21
Applicant: NUANCE COMMUNICATIONS, INC.
Inventor: WENINGER, Felix , GAUDESI, Marco , LEIBOLD, Ralf , ZHAN, Puming
IPC: G10L15/34 , G10L15/26 , G10L15/20 , G10L15/22 , G10L15/04 , G10L19/02 , G10L21/0208 , G10L25/24
Abstract: An end-to-end automatic speech recognition (ASR) system includes: first encoder configured for close-talk input captured by a close-talk input mechanism; second encoder configured for far-talk input captured by far-talk input mechanism; and encoder selection layer configured to select at least one of first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and far-talk input. If signals from both the close-talk input mechanism and far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce ASR output.
-
公开(公告)号:WO2022268347A1
公开(公告)日:2022-12-29
申请号:PCT/EP2021/075816
申请日:2021-09-20
Applicant: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V. , FRIEDRICH-ALEXANDER-UNIVERSITAET ERLANGEN-NUERNBERG
Inventor: DISCH, Sascha , VAN DE PAR, Steven , NIEDERMEIER, Andreas , EDLER, Bernd
IPC: G10L19/26 , G10L19/005 , G10L19/02 , G10L21/003 , G10L21/0316
Abstract: An apparatus (100) for processing an audio input signal to obtain an audio output signal according to an embodiment. The apparatus (100) comprises a signal analyser (110) configured for determining information on an auditory roughness of one or more spectral bands of the audio input signal. Moreover, the apparatus (100) comprises a signal processor (120) configured for processing the audio input signal depending on the information on the auditory roughness of the one or more spectral bands.
-
公开(公告)号:WO2022258036A1
公开(公告)日:2022-12-15
申请号:PCT/CN2022/098016
申请日:2022-06-10
Applicant: 华为技术有限公司
Abstract: 本申请实施例公开了一种编解码方法、装置、设备、存储介质及计算机程序,属于编解码技术领域。在本申请实施例中,通过对媒体数据的第一白化谱进行整形处理,以得到第二白化谱,之后基于第二白化谱进行编码。其中,第二白化谱在目标频段内的频谱幅度值大于或等于第一白化谱在目标频段内的频谱幅度值。可见本方案通过调高第一白化谱在目标频段内的频谱幅度值,使得到的第二白化谱中不同频率的谱线的统计平均能量相差较小,这样通过编码神经网络模型对第二白化谱进行处理的过程中,能够保留第二白化谱中更多的谱线,也即本方案能够编码更多的谱线,从而保留更多的频谱特征,编码质量得到提高。
-
公开(公告)号:WO2022248632A1
公开(公告)日:2022-12-01
申请号:PCT/EP2022/064343
申请日:2022-05-25
Inventor: HERRE, Jürgen , GHIDO, Florin
IPC: G10L19/008 , G10L19/04 , G10L19/02
Abstract: The application discloses techniques for compressively encoding and decoding an audio signal representing a directivity pattern, the audio values having different values according to different discrete positions defined on an unit sphere. The audio signal values are encoded in a bitstream as prediction residual values. The prediction residual values being used in sequences to obtained predicted audio signal values by moving on positions defined on parallel lines, parallel to an equator of the sphere, the parallel lines defined from a first pole toward a second pole of the sphere. The predicted values are obtained based on an initial prediction sequence, on adjacent discrete positions preceding a given position or interpolated versions of the audio values of a previously predicted adjacent parallel line.
-
公开(公告)号:WO2022227037A1
公开(公告)日:2022-11-03
申请号:PCT/CN2021/091615
申请日:2021-04-30
Applicant: 深圳市大疆创新科技有限公司
Inventor: 席迎来
Abstract: 本申请提供一种音频处理、视频处理方法、装置、设备及存储介质;其中,音频处理方法包括:获取待处理的音频;识别所述音频的节拍点;根据所述节拍点的能量幅值,从所述音频的多个节拍点中筛选出至少一个节奏点并输出;其中,所述节奏点的能量幅值,大于所述多个节拍点中除所述节奏点之外的其他节拍点的能量幅值。本申请实施例的方案可以自动对音频打出节奏点;并且,本实施例方法对待处理的音频并未有限制,从而用户可以指定任意的带有音频的文件,使用户可以利用打出的节奏点进行后续处理,具有灵活性强的特点。
-
公开(公告)号:WO2022200666A1
公开(公告)日:2022-09-29
申请号:PCT/FI2021/050199
申请日:2021-03-22
Applicant: NOKIA TECHNOLOGIES OY
Inventor: LAITINEN, Mikko-Ville , VASILACHE, Adriana , PIHLAJAKUJA, Tapani , LAAKSONEN, Lasse, Juhani , RÄMÖ, Anssi Sakari
IPC: G10L19/032 , G10L19/008 , G10L19/02 , H04S7/00 , G10L19/022
Abstract: There is inter alia disclosed an apparatus for spatial audio encoding configured to determining an audio scene separation metric between an input audio signal and a further input audio signal, and using the audio scene separation metric for quantizing of at least one spatial audio parameter of the input audio signal.
-
公开(公告)号:WO2022141678A1
公开(公告)日:2022-07-07
申请号:PCT/CN2021/072428
申请日:2021-01-18
Applicant: 科大讯飞股份有限公司
Abstract: 一种语音合成方法、装置、设备及存储介质,语音合成方法包括获取原始文本、原始文本对应的音素序列,以及待合成语音的说话人特征(S100);将原始文本以及音素序列进行特征融合,得到融合特征(S110);基于融合特征及说话人特征进行编解码处理,得到声学频谱(S120);基于声学频谱进行语音合成,得到合成语音(S130)。由此通过融合原始文本及音素序列得到融合特征,丰富了输入信息,并且能够挖掘不同语种特有的发音信息,得到的合成语音更加自然、符合对应语种的发音特点,也即合成语音的质量更高。
-
公开(公告)号:WO2022134213A1
公开(公告)日:2022-06-30
申请号:PCT/CN2021/070421
申请日:2021-01-06
Applicant: 瑞声声学科技(深圳)有限公司 , 瑞声光电科技(常州)有限公司
IPC: G10L21/06 , G10L19/02 , G10L19/032
Abstract: 本发明提供了一种基于音乐频率的振动频率设计方法,包括以下步骤:S1:预先设置一组量化模块,包括:个性输入量化模块;音乐特征量化模块;振动效果量化模块;S2:用户个性化参数输入,通过个性输入量化模块,获取个性输入的具体量化值;S3:提取音乐信号的音乐特征,通过音乐特征量化模块,获取音乐特征的具体量化值;S4:量化计算,按照公式进行计算,获取振动效果频率相对值;S5:通过振动效果量化模块,将振动效果频率相对值进行映射,获取振动效果频率绝对值;S6:马达基于振动效果频率绝对值播放振动。本发明的振动频率设计方法实现了听觉上的音乐频率到触觉上的振动频率完美转换,为设计人员或者用户提供了高效、丰富的触觉体验。
-
-
-
-
-
-
-
-
-