专利检索 ipc:G10L13/033 第 1 页

1.

发明申请
GENERATING GENRE APPROPRIATE VOICES FOR AUDIO BOOKS 有权

公开(公告)号：US20230134970A1

公开(公告)日：2023-05-04

申请号：US17977360

申请日：2022-10-31

申请人： Apple Inc.

发明人： Ramya RASIPURAM , William BECKMAN , Ladan GOLIPOUR , David A. WINARSKY , Cheng-Chieh YEH , Weicheng ZHANG

IPC分类号： G10L13/10 , G06F40/30 , G06F40/284 , G10L13/033

摘要： Systems and processes for generating audio books from text are provided. An example process includes, at an electronic device having one or more processors and memory: receiving a text including at least a first subset and a second subset, wherein at least a portion of the first subset overlaps with at least a portion of the second subset; determining, based on the text, a prosody for a speech output, wherein the prosody is representative of a genre; determining a semantic meaning of the text; and generating, based on the prosody and the semantic meaning, the speech output of the text.

2.

发明申请
CROSS-SPEAKER STYLE TRANSFER SPEECH SYNTHESIS 有权

公开(公告)号：US20230081659A1

公开(公告)日：2023-03-16

申请号：US17799031

申请日：2021-02-01

申请人： Microsoft Technology Licensing, LLC

发明人： Shifeng Pan , Lei He , Chunling Ma

IPC分类号： G10L13/047 , G10L13/08 , G10L13/033

摘要： This disclosure provides methods and apparatuses for training an acoustic model which is for implementing cross-speaker style transfer and comprises at least a style encoder. Training data may be obtained, which comprises a text, a speaker ID, a style ID and acoustic features corresponding to a reference audio. A reference embedding vector may be generated, through the style encoder, based on the acoustic features. Adversarial training may be performed to the reference embedding vector with at least the style ID and the speaker ID, to remove speaker information and retain style information. A style embedding vector may be generated, through the style encoder, based at least on the reference embedding vector being performed the adversarial training. Predicted acoustic features may be generated based at least on a state sequence corresponding to the text, a speaker embedding vector corresponding to the speaker ID, and the style embedding vector.

3.

发明授权
Voice synthesis method, apparatus, device and storage medium 有权

公开(公告)号：US11600259B2

公开(公告)日：2023-03-07

申请号：US16565784

申请日：2019-09-10

申请人： BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

发明人： Jie Yang

IPC分类号： G10L13/027 , G10L13/033 , G10L13/08

摘要： Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user.

4.

发明申请
PROJECTION ON A VEHICLE WINDOW 有权

公开(公告)号：US20230053029A1

公开(公告)日：2023-02-16

申请号：US17392570

申请日：2021-08-03

申请人： Ford Global Technologies, LLC

发明人： Stuart C. Salter , Kristopher Karl Brown , John Robert Van Wiemeersch , Hussein H. Berry

IPC分类号： B60K35/00 , G06K9/00 , H04N5/76 , H04N9/31 , H04N7/18 , G10L13/033 , B60Q1/24 , B60R25/01 , B60R25/30 , B60R25/10

摘要： A system includes a camera aimed externally to a vehicle, a window of the vehicle, a projector positioned to project on the window, and a computer communicatively coupled to the camera and the projector. The computer is programmed to, upon receiving data from the camera indicating a first person outside the vehicle, instruct the projector to project an image on the window depicting a second person inside the vehicle.

5.

发明授权
Systems and methods for audio processing 有权

公开(公告)号：US11546689B2

公开(公告)日：2023-01-03

申请号：US17061708

申请日：2020-10-02

申请人： Ford Global Technologies, LLC

发明人： Ranjani Rangarajan , Leah Busch

IPC分类号： H04R1/40 , B60R16/023 , H04R1/32 , H04R3/00 , G10L25/18 , G10L15/30 , G10L19/02 , G10L13/033 , G10L21/0308

摘要： The disclosure describes systems and methods for processing audio signals in a vehicle to perform sound source separation. The sound source separation is performed using transfer functions and involves separation of the speech of multiple occupants. The separated speech can be used to isolate and correctly respond to a command to control vehicle systems.

6.

发明授权
Audio file processing method, electronic device, and storage medium 有权

公开(公告)号：US11538456B2

公开(公告)日：2022-12-27

申请号：US16844283

申请日：2020-04-09

申请人： TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

发明人： Chunjiang Lai

IPC分类号： G10L13/033 , G10L13/04 , H04N21/81 , H04N21/845

摘要： An audio file processing method is provided for an electronic device. The method includes extracting at least one audio segment from a first audio file, recognizing at least one to-be-replaced audio segment representing a target role from the at least one audio segment, and determining time frame information of each to-be-replaced audio segment in the first audio file. The method also includes obtaining to-be-dubbed audio data for each to-be-replaced audio segment, and replacing data in the to-be-replaced audio segment with the to-be-dubbed audio data according to the time frame information, to obtain a second audio file. The at least one to-be-replaced audio segment is divided from the at least one audio segment based on a structure and a word count in a sentence corresponding to each to-be-replaced audio segment.

7.

发明申请
System Providing Expressive and Emotive Text-to-Speech 有权

公开(公告)号：US20220392430A1

公开(公告)日：2022-12-08

申请号：US17880190

申请日：2022-08-03

申请人： D&M Holdings, Inc.

发明人： Robert M. Kilgore , Maria Astrinaki

IPC分类号： G10L13/10 , G06F3/01 , G06F3/04883 , G10L13/033

摘要： A speech to text system includes a text and labels module receiving a text input and providing a text analysis and a label with a phonetic description of the text. A label buffer receives the label from the text and labels module. A parameter generation module accesses the label from the label buffer and generates a speech generation parameter. A parameter buffer receives the parameter from the parameter generation module. An audio generation module receives the text input, the label, and/or the parameter and generates a plurality of audio samples, A scheduler monitors and schedules the text and label module, the parameter generation module, and/or the audio generation module. The parameter generation module is further configured to initialize a voice identifier with a Voice Style Sheet (VSS) parameter, receive an input indicating a modification to the VSS parameter, and modify the VSS parameter according to the modification.

8.

发明授权
Method of embodying online media service having multiple voice systems 有权

公开(公告)号：US11521593B2

公开(公告)日：2022-12-06

申请号：US17076121

申请日：2020-10-21

申请人： Jong Yup Lee

发明人： Jong Yup Lee

IPC分类号： G10L13/08 , G10L13/00 , G10L13/033 , G06F16/45 , G06F16/438 , G10L15/183 , G06Q30/06 , G10L13/047 , G10L15/22 , G06F16/9535

摘要： A method of embodying an online media service having a multiple voice system includes a first operation of collecting preset online articles and content from a specific media site and displaying the online articles and content on a screen of a personal terminal, a second operation of inputting a voice of a subscriber or setting a voice of a specific person among voices that are pre-stored in a database, a third operation of recognizing and classifying the online articles and content, a fourth operation of converting the classified online articles and content into speech, and a fifth operation of outputting the online articles and content using the voice of the subscriber or the specific person, which is set in the second operation.

9.

发明申请
CONVERSATIONAL AGENT RESPONSE DETERMINED USING A SENTIMENT 有权

公开(公告)号：US20220351731A1

公开(公告)日：2022-11-03

申请号：US17867161

申请日：2022-07-18

申请人： Google LLC

发明人： Johnny Chen , Thomas L. Dean , Qiangfeng Peter Lau , Sudeep Gandhe , Gabriel Schine

IPC分类号： G10L15/22 , G10L17/22 , H04L67/104 , G10L15/26 , G10L13/00 , G06F16/332 , G10L15/18 , G10L13/033 , G10L15/30 , G10L13/08 , G06F21/62

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.

10.

发明授权
Systems and methods for generating a volume-based response for multiple voice-operated user devices 有权

公开(公告)号：US11481187B2

公开(公告)日：2022-10-25

申请号：US16738815

申请日：2020-01-09

申请人： Rovi Guides, Inc.

发明人： Michael McCarty , Glen E. Roe

IPC分类号： G06F3/16 , H03G3/32 , G10L15/22 , H03G3/30 , G10L21/034 , G10L25/21 , G10L13/033 , H04L12/28 , G10L21/10

摘要： Systems and methods are provided herein for responding to a voice command at a volume level based on a volume level of the voice command. For example, a media guidance application may detect, through a first voice-operated user device of a plurality of voice-operated user devices, a voice command spoken by a user. The media guidance application may determine a first volume level of the voice command. Based on the volume level of the voice command, the media guidance application may determine that a second voice-operated user device of the plurality of voice-operated user devices is closer to the user than any of the other voice-operated user devices. The media guidance application may generate an audible response, through the second voice-operated user device, at a second volume level that is set based on the first volume level of the voice command.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类