Patent search ipc:G10L25/03 Page 1

1.

发明申请
MANAGEMENT OF PROFESSIONALLY GENERATED AND USER-GENERATED AUDIO CONTENT 审中-公开

公开(公告)号：WO2023018889A1

公开(公告)日：2023-02-16

申请号：PCT/US2022/040089

申请日：2022-08-11

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventor： YANG, Shaofan , LI, Kai

IPC: G10L25/51 , G10L25/03

Abstract: A system for managing user-generated content (UGC) and professionally generated content (PGC) is disclosed. The system is programmed to receive digital audio data having two channels from a social media platform. The system is programmed to extract spatial features that capture differences in the two channels from the digital audio data. The system is programmed to also extract temporal features, spectral features, and background features from the digital audio data. The system is programmed to then use the extracted features to determine whether to process the digital audio data as UGC or PGC before playback.

2.

发明申请
VOICE MODIFICATION 审中-公开

公开(公告)号：WO2023288265A1

公开(公告)日：2023-01-19

申请号：PCT/US2022/073721

申请日：2022-07-14

Applicant: SRI INTERNATIONAL

Inventor： LUBIN, Jeffrey , SPENCE, Clay

IPC: G10L21/007 , G10L13/02 , G10L17/14 , G10L25/03

Abstract: A computing system that receives an audio waveform representing speech from an individual and produces as output a. modified version of the audio waveform that maintains the speaker's speech characteristics as well as prosody for specific utterances (e.g., voice timbre, intonation, timing, intensity). The sy stem uses a bottleneck-based autoencoder with speech spectrograms as input and output. To produce the output audio waveform, the system includes a. reconstruction error-based loss function with two additional loss functions. The second loss function is speaker "real vs fake" discriminator that penalizes for the output not sounding like the speaker. The third loss function is a. speech intelligibility scorer that penalizes the output for speech that is difficult for the target population to understand. The produced modified audio waveform is an enhanced speech output that delivers speech m a target accent without sacrificing the personality of the speaker.

3.

发明申请
RENDERING VIRTUAL ARTICLES OF CLOTHING BASED ON AUDIO CHARACTERISTICS 审中-公开

公开(公告)号：WO2022271086A1

公开(公告)日：2022-12-29

申请号：PCT/SG2022/050294

申请日：2022-05-10

Applicant: LEMON INC.

Inventor： LI, Yunzhu , CHENG, Haiying , CHEN, Sun

IPC: G06T19/20 , G06T15/00 , G06V20/40 , G06T13/00 , G06T13/205 , G06T15/04 , G06T17/20 , G06T2210/16 , G06T2210/62 , G06V20/20 , G06V40/10 , G10L25/03 , G10L25/48

Abstract: Systems and methods for generating a virtual article of clothing at a display are described. Some examples may include: obtaining video data and audio data, analyzing the video data to determine one or more body joints of a target object appearing in the video data. A mesh based on the determined one or more body joints may be generated. The audio data may be analyzed to determine audio characteristics associated with the audio data. Texture rendering information associated with a virtual article of clothing may be determined based on the audio characteristics. A rendered video may be generated by rendering the virtual article of clothing to the generated mesh using the texture rendering information.

4.

发明申请
AUTOMATED CLASSIFICATION AND INDEXING OF EVENTS USING MACHINE LEARNING 审中-公开

公开(公告)号：WO2022250987A1

公开(公告)日：2022-12-01

申请号：PCT/US2022/029291

申请日：2022-05-13

Applicant: GETAC TECHNOLOGY CORPORATION , WHP WORKFLOW SOLUTIONS, INC.

Inventor： GUZIK, Thomas , ADEEL, Muhammad

IPC: G06V20/40 , G06V10/14 , H04N5/77 , G10L15/26 , G10L25/03 , G01P15/00

Abstract: Described herein are techniques that may be used to automatically identify and index events within a media content file. Such techniques may comprise receiving, from at least one recording device, a media content, receiving sensor data determined to correspond to the media content, determine a context associated with the at least one recording device based on the sensor data, identifying, based on one or more data patterns detected within the sensor data and based on the contextual data, at least one event, generating an index corresponding to the identified event, and storing an indication of the generated index in association with the media content.

5.

发明申请
基于语音的智能面试评估方法、装置、设备及存储介质审中-公开

公开(公告)号：WO2022179048A1

公开(公告)日：2022-09-01

申请号：PCT/CN2021/109701

申请日：2021-07-30

Applicant: 深圳壹账通智能科技有限公司

Inventor： 赵沁

IPC: G10L15/26 , G10L17/00 , G10L25/03 , G10L25/51 , G10L25/87

Abstract: 本申请涉及人工智能技术领域，提供一种基于语音的智能面试评估方法、装置、设备及存储介质，用于提高远程面谈评估的效率。基于语音的智能面试评估方法包括：对待处理的远程面试者语音信号进行端点检测，得到有效语音段落，将有效语音段落划分为待定标语音段落和待检测语音段落；提取待定标语音段落的定标语音特征和待检测语音段落的检测语音特征；计算定标语音特征的定标特征值和检测语音特征的检测特征值；将检测特征值与定标特征值进行对比分析得到面试者状况分析结果，生成面试者状况分析结果的评估报告。此外，本申请还涉及区块链技术，待处理的远程面试者语音信号可存储于区块链中。

6.

发明申请
一种语音分析方法及其语音记录装置审中-公开

公开(公告)号：WO2022166220A1

公开(公告)日：2022-08-11

申请号：PCT/CN2021/120416

申请日：2021-09-24

Applicant: 深圳壹秘科技有限公司

Inventor： 陈文明 , 陈新磊 , 张洁 , 张世明

IPC: G10L17/04 , G10L17/02 , G10L25/51 , G10L25/27 , G10L25/03

Abstract: 一种语音分析方法及其语音记录设备。该方法包括：获取第一语音数据，其中，所述第一语音数据包括第一语音信息以及所述第一语音信息对应的标记声源；若未存储与所述标记声源对应的验证模型，采用预先存储的基础验证模型对所述第一语音信息进行适配，并将适配后的模型参数集作为与所述标记声源对应的验证模型进行保存；若存储有与所述标记声源对应的验证模型，采用所述验证模型判断所述第一语音信息是否与所述标记声源对应，并对所述验证模型进行优化；当确定所述验证模型的验证准确率超过预设阈值时，采用所述验证模型确定第二语音数据中包含的第二语音信息对应的声源。该方法中的验证模型可不断获得优化，使用起来更为灵活、准确率更高。

7.

发明申请
一种提取语音特征的方法、装置、终端及存储介质审中-公开

公开(公告)号：WO2022141868A1

公开(公告)日：2022-07-07

申请号：PCT/CN2021/084166

申请日：2021-03-30

Applicant: 平安科技（深圳）有限公司

Inventor： 张之勇 , 王健宗 , 程宁

IPC: G10L15/16 , G10L25/03 , G10L25/30

Abstract: 本申请适用于计算机技术领域，提供了一种提取语音特征的方法、装置、终端及存储介质，包括：获取待处理的语音数据；将该语音数据输入到已训练的语音特征提取模型中进行处理，得到该语音数据对应的目标语音特征。上述方式中的语音特征提取模型是基于自监督学习，以每个样本语音数据对中的原始语音数据对应的样本语音特征为目标，对每个样本语音数据对中的原始语音数据和增强语音数据之间的差异性进行训练得到的。基于该语音特征提取模型可以提取到有效地、信息丰富、表达准确的目标语音特征，进而使该目标语音特征应用于智能语音任务处理场景时，处理结果更准确。

8.

发明申请
METHOD AND APPARATUS FOR RENDERING AN AUDIO SIGNAL OF A PLURALITY OF VOICE SIGNALS 审中-公开

公开(公告)号：WO2022078905A1

公开(公告)日：2022-04-21

申请号：PCT/EP2021/077898

申请日：2021-10-08

Applicant: INTERDIGITAL CE PATENT HOLDINGS, SAS

Inventor： MORIN, Thomas , THIEBAUD, Sylvain

IPC: H04M3/56 , G10L21/0364 , H04S7/00 , G10L25/03

Abstract: According to embodiments, similarity values of the voice signals may be obtained, wherein a similarity value may indicate a level of similarity between two voice signals. According to embodiments, the audio signal may be rendered by spatializing the voice signals based on the similarity values, the higher the level of similarity between the two voice signals, the higher a distance between the two voice signals in the spatialized audio signal.

9.

发明申请
语音转换方法、装置、计算机设备及计算机可读存储介质审中-公开

公开(公告)号：WO2021120145A1

公开(公告)日：2021-06-24

申请号：PCT/CN2019/126865

申请日：2019-12-20

Applicant: 深圳市优必选科技股份有限公司

Inventor： 刘洋 , 李柏 , 丁万 , 黄东延 , 熊友军

IPC: G10L13/033 , G10L25/03 , G10L25/27 , G10L25/30

Abstract: 一种语音转换方法、装置、计算机设备及计算机可读存储介质，该方法包括：获取待转换语音和原始转换模型，原始转换模型的格式为在线格式（202）；将原始转换模型进行格式转换，得到离线格式的目标转换模型（204）；对待转换语音进行特征提取，得到待转换特征（206）；将待转换特征输入目标转换模型，得到目标转换模型输出的目标特征（208）；根据目标转换模型输出的目标特征得到目标语音，目标语音的语音内容和待转换语音相同，目标语音的声音与待转换语音不同（210）。该语音转换方法不仅可以在离线状态下高质量进行语音转换，而且运行速度快，可以实现语音的实时转换。

10.

发明申请
到站提醒方法、装置、终端及存储介质审中-公开

公开(公告)号：WO2021115232A1

公开(公告)日：2021-06-17

申请号：PCT/CN2020/134351

申请日：2020-12-07

Applicant: OPPO广东移动通信有限公司

Inventor： 刘文龙

IPC: G08B21/24 , G10L25/51 , G10L25/03

Abstract: 一种到站提醒方法、装置、终端及存储介质，属于人工智能领域。到站提醒方法包括：当处于交通工具时，通过麦克风采集环境音（201）；对环境音对应的音频数据进行时频域特征提取，得到时频域特征矩阵（202）；将时频域特征矩阵输入声音识别模型，得到声音识别模型输出的目标警铃声识别结果（203）；当识别出环境音中包含目标警铃声时，更新已行驶站数（204）；当已行驶站数达到目标站数时，进行到站提醒（205）。通过实时采集环境音，并在识别出目标警铃声时，更新已行驶站数，在已行驶站数达到目标站数时，进行到站提醒，终端对环境音进行时频域特征提取，并将得到的时频域特征矩阵输入声音识别模型，提高了到站提醒的准确率和有效性。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification