Patent search ipc:"G10L13/033" Page 1

1.

发明申请
SYSTEM AND METHOD FOR DATA AUGMENTATION AND SPEECH PROCESSING IN DYNAMIC ACOUSTIC ENVIRONMENTS 审中-公开

公开(公告)号：WO2022178151A1

公开(公告)日：2022-08-25

申请号：PCT/US2022/016832

申请日：2022-02-17

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor： NAYLOR, Patrick A. , SHARMA, Dushyant , JOST, Uwe Helmut , GANONG III, William F.

IPC: G06N99/00 , G10K11/00 , G10L13/033 , G10L13/047 , G10L13/10

Abstract: A method, computer program product, and computing system for receiving one or more inputs indicative of at least one of: a relative location of a speaker and a microphone array, and a relative orientation of the speaker and the microphone array. One or more reference signals may be received. A speech processing system may be trained using the one or more inputs and the one or more reference signals.

2.

发明申请
機械学習モデルを用いた音生成方法、機械学習モデルの訓練方法、音生成装置、訓練装置、音生成プログラムおよび訓練プログラム审中-公开

公开(公告)号：WO2022172576A1

公开(公告)日：2022-08-18

申请号：PCT/JP2021/045962

申请日：2021-12-14

Applicant: ヤマハ株式会社

Inventor： 才野　慶二郎 , 大道　竜之介 , ジョルディ　ボナダ , メルレイン　ブラアウ

IPC: G10G1/04 , G10H1/00 , G10L13/033 , G10L13/10

Abstract: 音楽的な特徴量が時間的に変化する第１の特徴量列の入力が受付部により受け付けられる。訓練済モデルを用いて、第１の特徴量列を処理して、特徴量が第２の精細度で変化する第２の特徴量列に対応する音データ列が生成部により生成される。訓練済モデルは、特徴量が第１の精細度で時間的に変化する入力特徴量列と、特徴量が第１の精細度よりも高い第２の精細度で時間的に変化する出力特徴量列に対応する参照音データ列との間の入出力関係を習得した機械学習モデルである。

3.

发明申请
SYNTHESIZED SPEECH GENERATION 审中-公开

公开(公告)号：WO2022159256A1

公开(公告)日：2022-07-28

申请号：PCT/US2021/072800

申请日：2021-12-08

Applicant: QUALCOMM INCORPORATED

Inventor： BYUN, Kyungguen , MOON, Sunkuk , ZHANG, Shuhua , MONTAZERI, Vahid , KIM, Lae-Hoon , VISSER, Erik

IPC: G10L13/033 , G10L21/007 , G10L21/013

Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

4.

发明申请
情報処理方法、情報処理システムおよびプログラム审中-公开

公开(公告)号：WO2022074754A1

公开(公告)日：2022-04-14

申请号：PCT/JP2020/037966

申请日：2020-10-07

Applicant: ヤマハ株式会社

Inventor： 大道　竜之介 , 才野　慶二郎 , 清水　正宏

IPC: G10G1/04 , G10L13/00 , G10L13/033

Abstract: 第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、第１時系列データの編集毎に、当該編集後の第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、第１発音スタイルとは異なる第２発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第２時系列データを、利用者からの第２指示に応じて編集し、第２時系列データの編集毎に、当該編集後の第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存し、保存された相異なるバージョンの複数の第１履歴データのうち利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、保存された相異なるバージョンの複数の第２履歴データのうち利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。

5.

发明申请
SYNTHESIZED DATA AUGMENTATION USING VOICE CONVERSION AND SPEECH RECOGNITION MODELS 审中-公开

公开(公告)号：WO2022046526A1

公开(公告)日：2022-03-03

申请号：PCT/US2021/046781

申请日：2021-08-19

Applicant: GOOGLE LLC

Inventor： BIADSY, Fadi , JIANG, Liyang , MORENO MENGIBAR, Pedro, J. , ROSENBERG, Andrew

IPC: G10L21/057 , G10L13/033 , G10L15/07 , G10L25/66

Abstract: A method (380) for training a speech conversion model (300) includes obtaining a plurality of transcriptions (302) in a set of spoken training utterances (305) and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker (104) associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation (304). The method also includes adapting, using the set of spoken training utterances, a TTS model (210) to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation (306) that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

6.

发明申请
SELECTING A PRIMARY SOURCE OF TEXT TO SPEECH BASED ON POSTURE 审中-公开

公开(公告)号：WO2021260469A1

公开(公告)日：2021-12-30

申请号：PCT/IB2021/055065

申请日：2021-06-09

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION , IBM UNITED KINGDOM LIMITED , IBM (CHINA) INVESTMENT COMPANY LIMITED

Inventor： ZHANG, Da Wei , CHEN, Ke , SUN, Yu Ping , JIA, Hou Ping , MO, Xiaoguang

IPC: G10L13/00 , G10L13/02 , G10L13/08 , G06F3/012 , G10L13/033 , G10L13/047 , H04S7/303

Abstract: A computer converts each content sources from textual content to speech comprising a separate audio selection. The computer applies, to each audio selection, one or more speech attributes to specify the audio attributes that select a respective position of the respective audio selection from among multiple positions within a multidimensional sound space and audibly distinguish one or more characteristics of the respective audio selection from other audio selections, wherein the respective position of the respective audio selection reflects a rank of the respective audio selection as ordered by interest to a user. The computer outputs a simultaneous stream of the multiple audio selections to an audio output device for stereo play of the audio selections within the multiple positions within the multidimensional sound space to the user, with the multiple positions reflecting the content sources ordered by interest.

7.

发明申请
歌曲合成方法、装置、可读介质及电子设备审中-公开

公开(公告)号：WO2021218324A1

公开(公告)日：2021-11-04

申请号：PCT/CN2021/077986

申请日：2021-02-25

Applicant: 北京字节跳动网络技术有限公司

Inventor： 顾宇 , 殷翔

IPC: G10L13/033 , G10L25/18 , G10L25/24 , G10L25/30 , G10L13/08

Abstract: 一种歌曲合成方法、装置、电子设备、计算机可读介质及计算机程序，其中该方法包括：根据目标歌曲的歌曲信息，获取目标歌曲的时长特征信息（101）；将时长特征信息和歌曲信息输入至预设的歌曲合成模型中，得到目标歌曲对应的声学特征信息，其中，预设的歌曲合成模型为基于注意力机制的序列到序列模型（102）；通过声码器对声学特征信息进行合成，得到目标歌曲的歌唱音频（103）。由于基于注意力机制的序列到序列模型采用端到端的架构，因此，可提取更丰富的声学特征信息，具有较好的时序建模能力，使得合成后的歌唱音频的发音更加清楚，走调的现象更少，合成的音域也更广。由此，提升了合成的歌唱音频的自然度和流畅性，使其比较接近真人演唱效果，用户听觉体验佳。

8.

发明申请
歌曲合成方法、装置、设备及存储介质审中-公开

公开(公告)号：WO2021218138A1

公开(公告)日：2021-11-04

申请号：PCT/CN2020/131663

申请日：2020-11-26

Applicant: 平安科技（深圳）有限公司

Inventor： 朱清影 , 韩宝强

IPC: G10L13/033 , G10L13/08 , G10L13/10 , G10L25/18

Abstract: 一种歌曲合成方法，包括：获取歌词朗诵音频和乐谱信息（101）；通过预置语音识别模型和歌词拼音文本对歌词朗诵音频进行时长标注，得到朗诵时长（102）；通过预置声码器从歌词朗诵音频中分析初始声学参数（103）；根据预置声母变速字典、节奏信息和节拍信息从歌词拼音文本中提取歌唱时长（104）；根据预置变速算法、朗诵时长和歌唱时长对初始声学参数进行变速处理（105）；对变速后的频谱包络进行共振峰增强处理，得到增强后的频谱包络（106）；基于音高信息、歌唱时长和变速后的基频进行矫正处理，得到矫正后的基频（107）；通过预置声码器对处理后的声学参数进行歌曲合成处理（108）。还涉及区块链，合成的歌曲存储于区块链中。

9.

发明申请
制御装置、制御システム、情報処理装置、およびプログラム审中-公开

公开(公告)号：WO2021176780A1

公开(公告)日：2021-09-10

申请号：PCT/JP2020/044091

申请日：2020-11-26

Applicant: 株式会社東海理化電機製作所

Inventor： 山本　恒行 , 砂川　尚貴 , 岩田　健児

IPC: A63H3/33 , A63H11/00 , G10L13/00 , G10L13/033 , G10L13/047 , G10L13/08 , G10L13/10 , G01C21/36 , G06F3/16

Abstract: 【課題】エージェントのエンターテインメント性を向上させることが可能な制御装置、制御システム、情報処理装置、およびプログラムを提供する。【解決手段】情報を提示するエージェントの制御を行う制御部を備え、前記制御部は、前記エージェントに付加された物品に基づいて、前記エージェントの特性を決定する、制御装置。

10.

发明申请
SYNTHESIZED SPEECH AUDIO DATA GENERATED ON BEHALF OF HUMAN PARTICIPANT IN CONVERSATION 审中-公开

公开(公告)号：WO2021162675A1

公开(公告)日：2021-08-19

申请号：PCT/US2020/017562

申请日：2020-02-10

Applicant: GOOGLE LLC

Inventor： BOWERS, Mark , ALLEN, Brian F. , ZADA, Nida , SEGUIN, Julie Anne

IPC: G10L13/033 , G10L15/26

Abstract: Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification