-
公开(公告)号:WO2023083252A1
公开(公告)日:2023-05-19
申请号:PCT/CN2022/131094
申请日:2022-11-10
申请人: 北京字跳网络技术有限公司
摘要: 本公开涉及一种音色选择方法、装置、电子设备、可读存储介质及程序产品,其中,该方法通过分析待匹配语音的频谱特征,获得待匹配语音的音色特征,再根据待匹配语音的音色特征与至少一个样本音频的音色特征之间的相似度,从至少一个样本音频中确定目标样本音频,其中,目标样本音频的音色与待匹配语音的音色相匹配。
-
公开(公告)号:WO2023041583A1
公开(公告)日:2023-03-23
申请号:PCT/EP2022/075532
申请日:2022-09-14
发明人: CHAKRABARTY, Soumitro , KEMPAPURA SRINIVASA, Shashi Kumar , KÜCH, Fabian , KROOS, Christian , THIERGART, Oliver
摘要: An apparatus for estimating sub-band-specific direction information for two or more sub-bands of a plurality of sub-bands according to an embodiment is provided. The apparatus comprises a feature extractor (110) for obtaining a plurality of feature samples for a plurality of frequency bands of two or more audio signals. Moreover, the apparatus comprises a direction estimator (120) being configured to receive the plurality of feature samples as input values and being configured to output a plurality of output samples wherein the output samples indicate, for each sub-band of the two or more sub-bands, the sub-band-specific direction information for said sub-band. Each of the plurality of sub-bands is equal to one of the plurality of frequency bands or comprises at least one frequency band or a portion of a frequency band of the plurality of frequency bands.
-
公开(公告)号:WO2023279691A1
公开(公告)日:2023-01-12
申请号:PCT/CN2022/071089
申请日:2022-01-10
申请人: 上海商汤智能科技有限公司
摘要: 一种语音分类方法、模型训练方法及装置(400)、设备(700)、介质(800)和程序,其中,训练方法包括:获取至少一个类别的语音数据,同一类别的语音数据构成一个语音数据集(S11);提取语音数据集中每个语音数据的语音特征(S12);利用语音数据集中的语音特征对语音分类模型中的子分类模型进行训练;语音分类模型包括至少一个子分类模型,子分类模型与语音数据集一一对应(S13)。通过对语音数据进行类别分类,形成对应语音数据集,利用语音特征训练对应的子分类模型,从而得到识别所需类别语音数据的语音分类模型。仅利用新类别的语音数据来进行训练,即可使得语音分类模型实现对新类别的分类。
-
公开(公告)号:WO2022271089A1
公开(公告)日:2022-12-29
申请号:PCT/SG2022/050306
申请日:2022-05-11
申请人: LEMON INC.
发明人: LIN, Kexin , LI, Yunzhu
IPC分类号: H04N21/234 , G10L21/10 , H04N21/44 , G06F3/14 , G06F3/16 , G06V20/40 , G06F16/44 , G06F16/64 , G06T11/00 , G06T2207/10016 , G06T7/70 , G10L21/14 , G10L25/18
摘要: Systems and methods for rendering motion-audio visualizations to a display are described. More specifically, video data and audio data is obtained. A position of a target object in each of one or more video frames of the video data is determined. Additionally, a video data comprising one or more video frames is determined. Audio visualizations for the predetermined time period are determined based on the frequency spectrum. A rendered video is generated by applying the audio visualizations at the position of the target object in the one or more video frames for the predetermined time period.
-
公开(公告)号:WO2022257454A1
公开(公告)日:2022-12-15
申请号:PCT/CN2022/071430
申请日:2022-01-11
申请人: 平安科技(深圳)有限公司
摘要: 本申请适用于语音合成技术领域,提供了一种合成语音的方法、装置、终端及存储介质。该方法包括:获取文本信息;将文本信息输入到已训练的频谱生成模型中进行处理,得到文本信息对应的梅尔谱图,频谱生成模型为无需蒸馏的非自回归式的模型,频谱生成模型包括编码器、长度预测网络以及解码器,该解码器的训练过程和实际使用过程是逆运算的过程;基于该梅尔谱图,生成该文本信息对应的语音信息。上述方案中,由于该生成模型为无需蒸馏的非自回归式的模型,提升了该频谱生成模型生成梅尔谱图的速率,进而提升了语音合成的速度。且基于该频谱生成模型可准确、快速地提取文本信息对应的梅尔谱图,进而使得基于该梅尔谱图生成的语音质量高。
-
公开(公告)号:WO2022243828A1
公开(公告)日:2022-11-24
申请号:PCT/IB2022/054502
申请日:2022-05-13
申请人: FRIDMAN-MINTZ, Boris
发明人: FRIDMAN-MINTZ, Boris
IPC分类号: G10L15/187 , G10L21/0232 , G10L21/0264 , G10L15/20 , G10L25/78 , G10L13/02 , G10L15/08 , G10L2025/783 , G10L2025/935 , G10L25/18 , G10L25/93
摘要: Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.
-
公开(公告)号:WO2022234636A1
公开(公告)日:2022-11-10
申请号:PCT/JP2021/017459
申请日:2021-05-07
申请人: 日本電気株式会社
摘要: 適切にイベントの発生を検知することが可能な信号処理装置等を提供する。本開示の一態様にかかる信号処理装置は、入力信号を、時間周波数領域の信号である所定の信号に変換する変換手段と、前記所定の信号の時間周波数強度のピークを、イベントの発生に関連する信号である目的信号の強度と推定する目的信号推定手段と、前記ピークに関連する周波数から所定の周波数までの帯域幅であって、前記ピークと異なるピークに関連する周波数を含まない帯域幅を少なくとも含む帯域を、雑音信号の周波数帯域である雑音帯域と推定する、帯域推定手段と、前記雑音帯域における時間周波数強度に基づいて、前記雑音信号の強度を推定する雑音信号推定手段と、前記目的信号の強度と前記雑音信号の強度との比に基づいて、イベントの発生の有無を判定する判定手段と、を備える。
-
公开(公告)号:WO2022168102A1
公开(公告)日:2022-08-11
申请号:PCT/IL2022/050158
申请日:2022-02-08
IPC分类号: G10L15/04 , G10L15/06 , G10L15/22 , G10L21/0364 , G10L25/18 , G10L15/16 , G10L21/007
摘要: A system and method of speech modification may include: receiving a recorded speech, comprising one or more phonemes uttered by a speaker; segmenting the recorded speech to one or more phoneme segments (PS), each representing an uttered phoneme; selecting a phoneme segment (PSk) of the one or more phoneme segments (PS); extracting a portion of the recorded speech, said portion corresponding to a first timeframe (T̃) that comprises the selected phoneme segment; receiving a representation (P͠ * ) of a phoneme of interest P*; and applying a machine learning (ML) model on (a) the extracted portion of the recorded speech and (b) on the representation (P͠ * ) of the phoneme of interest P*, to generate a modified version of the extracted portion of recorded speech, wherein the phoneme of interest (P*) substitutes the selected phoneme segment (PSk).
-
公开(公告)号:WO2022154341A1
公开(公告)日:2022-07-21
申请号:PCT/KR2021/095116
申请日:2021-12-02
申请人: 한양대학교산학협력단
摘要: 본 발명은, 제1 텍스트와 상기 제1 텍스트에 대한 제1 음성 및 제2 텍스트와 제2 텍스트에 대한 제2 음성이 입력되는 단계, 제1, 2 텍스트 및 제1, 2 음성을 커리큘럼 러닝에 적용하여 학습한 음성 함성 모델을 생성하는 단계 및 음성 출력을 위한 대상 텍스트 입력 시, 음성 합성 모델을 기반으로 대상 텍스트에 대응하는 대상 합성 음성을 출력하는 단계를 포함하고, 음성 합성 모델을 생성하는 단계는, 제1, 2 텍스트를 결합한 결합 텍스트 및 제1, 2 음성을 결합한 결합 음성을 생성하는 단계 및 결합 텍스트 및 결합 음성의 학습 결합 시 에러 레이트(error rate)가 설정된 기준 레이트(reference rate)보다 작으면 결합 텍스트 및 결합 음성을 상기 음성 합성 모델에 추가하는 단계를 포함하는 음성 합성 시스템의 동작방법을 제공한다.
-
公开(公告)号:WO2022087303A1
公开(公告)日:2022-04-28
申请号:PCT/US2021/056098
申请日:2021-10-21
发明人: CHANG, Simyung , PARK, Hyunsin , PARK, Hyoungwoo , CHO, Janghoon , YUN, Sungrack , HWANG, Kyu Woong
摘要: A computer-implemented method of operating an artificial neural network for processing data having a frequency dimension includes receiving an input. The audio input may be separated into one or more subgroups along the frequency dimension. A normalization may be performed on each subgroup. The normalization for a first subgroup the normalization is performed independently of the normalization a second subgroups. An output such as a keyword detection indication, is generated based on the normalized subgroups.
-
-
-
-
-
-
-
-
-