-
公开(公告)号:US20230134970A1
公开(公告)日:2023-05-04
申请号:US17977360
申请日:2022-10-31
申请人: Apple Inc.
发明人: Ramya RASIPURAM , William BECKMAN , Ladan GOLIPOUR , David A. WINARSKY , Cheng-Chieh YEH , Weicheng ZHANG
IPC分类号: G10L13/10 , G06F40/30 , G06F40/284 , G10L13/033
摘要: Systems and processes for generating audio books from text are provided. An example process includes, at an electronic device having one or more processors and memory: receiving a text including at least a first subset and a second subset, wherein at least a portion of the first subset overlaps with at least a portion of the second subset; determining, based on the text, a prosody for a speech output, wherein the prosody is representative of a genre; determining a semantic meaning of the text; and generating, based on the prosody and the semantic meaning, the speech output of the text.
-
公开(公告)号:US20230081659A1
公开(公告)日:2023-03-16
申请号:US17799031
申请日:2021-02-01
发明人: Shifeng Pan , Lei He , Chunling Ma
IPC分类号: G10L13/047 , G10L13/08 , G10L13/033
摘要: This disclosure provides methods and apparatuses for training an acoustic model which is for implementing cross-speaker style transfer and comprises at least a style encoder. Training data may be obtained, which comprises a text, a speaker ID, a style ID and acoustic features corresponding to a reference audio. A reference embedding vector may be generated, through the style encoder, based on the acoustic features. Adversarial training may be performed to the reference embedding vector with at least the style ID and the speaker ID, to remove speaker information and retain style information. A style embedding vector may be generated, through the style encoder, based at least on the reference embedding vector being performed the adversarial training. Predicted acoustic features may be generated based at least on a state sequence corresponding to the text, a speaker embedding vector corresponding to the speaker ID, and the style embedding vector.
-
公开(公告)号:US11600259B2
公开(公告)日:2023-03-07
申请号:US16565784
申请日:2019-09-10
发明人: Jie Yang
IPC分类号: G10L13/027 , G10L13/033 , G10L13/08
摘要: Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user.
-
公开(公告)号:US20230053029A1
公开(公告)日:2023-02-16
申请号:US17392570
申请日:2021-08-03
IPC分类号: B60K35/00 , G06K9/00 , H04N5/76 , H04N9/31 , H04N7/18 , G10L13/033 , B60Q1/24 , B60R25/01 , B60R25/30 , B60R25/10
摘要: A system includes a camera aimed externally to a vehicle, a window of the vehicle, a projector positioned to project on the window, and a computer communicatively coupled to the camera and the projector. The computer is programmed to, upon receiving data from the camera indicating a first person outside the vehicle, instruct the projector to project an image on the window depicting a second person inside the vehicle.
-
公开(公告)号:US11546689B2
公开(公告)日:2023-01-03
申请号:US17061708
申请日:2020-10-02
发明人: Ranjani Rangarajan , Leah Busch
IPC分类号: H04R1/40 , B60R16/023 , H04R1/32 , H04R3/00 , G10L25/18 , G10L15/30 , G10L19/02 , G10L13/033 , G10L21/0308
摘要: The disclosure describes systems and methods for processing audio signals in a vehicle to perform sound source separation. The sound source separation is performed using transfer functions and involves separation of the speech of multiple occupants. The separated speech can be used to isolate and correctly respond to a command to control vehicle systems.
-
公开(公告)号:US11538456B2
公开(公告)日:2022-12-27
申请号:US16844283
申请日:2020-04-09
发明人: Chunjiang Lai
IPC分类号: G10L13/033 , G10L13/04 , H04N21/81 , H04N21/845
摘要: An audio file processing method is provided for an electronic device. The method includes extracting at least one audio segment from a first audio file, recognizing at least one to-be-replaced audio segment representing a target role from the at least one audio segment, and determining time frame information of each to-be-replaced audio segment in the first audio file. The method also includes obtaining to-be-dubbed audio data for each to-be-replaced audio segment, and replacing data in the to-be-replaced audio segment with the to-be-dubbed audio data according to the time frame information, to obtain a second audio file. The at least one to-be-replaced audio segment is divided from the at least one audio segment based on a structure and a word count in a sentence corresponding to each to-be-replaced audio segment.
-
公开(公告)号:US20220392430A1
公开(公告)日:2022-12-08
申请号:US17880190
申请日:2022-08-03
申请人: D&M Holdings, Inc.
发明人: Robert M. Kilgore , Maria Astrinaki
IPC分类号: G10L13/10 , G06F3/01 , G06F3/04883 , G10L13/033
摘要: A speech to text system includes a text and labels module receiving a text input and providing a text analysis and a label with a phonetic description of the text. A label buffer receives the label from the text and labels module. A parameter generation module accesses the label from the label buffer and generates a speech generation parameter. A parameter buffer receives the parameter from the parameter generation module. An audio generation module receives the text input, the label, and/or the parameter and generates a plurality of audio samples, A scheduler monitors and schedules the text and label module, the parameter generation module, and/or the audio generation module. The parameter generation module is further configured to initialize a voice identifier with a Voice Style Sheet (VSS) parameter, receive an input indicating a modification to the VSS parameter, and modify the VSS parameter according to the modification.
-
公开(公告)号:US11521593B2
公开(公告)日:2022-12-06
申请号:US17076121
申请日:2020-10-21
申请人: Jong Yup Lee
发明人: Jong Yup Lee
IPC分类号: G10L13/08 , G10L13/00 , G10L13/033 , G06F16/45 , G06F16/438 , G10L15/183 , G06Q30/06 , G10L13/047 , G10L15/22 , G06F16/9535
摘要: A method of embodying an online media service having a multiple voice system includes a first operation of collecting preset online articles and content from a specific media site and displaying the online articles and content on a screen of a personal terminal, a second operation of inputting a voice of a subscriber or setting a voice of a specific person among voices that are pre-stored in a database, a third operation of recognizing and classifying the online articles and content, a fourth operation of converting the classified online articles and content into speech, and a fifth operation of outputting the online articles and content using the voice of the subscriber or the specific person, which is set in the second operation.
-
公开(公告)号:US20220351731A1
公开(公告)日:2022-11-03
申请号:US17867161
申请日:2022-07-18
申请人: Google LLC
IPC分类号: G10L15/22 , G10L17/22 , H04L67/104 , G10L15/26 , G10L13/00 , G06F16/332 , G10L15/18 , G10L13/033 , G10L15/30 , G10L13/08 , G06F21/62
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.
-
10.
公开(公告)号:US11481187B2
公开(公告)日:2022-10-25
申请号:US16738815
申请日:2020-01-09
申请人: Rovi Guides, Inc.
发明人: Michael McCarty , Glen E. Roe
IPC分类号: G06F3/16 , H03G3/32 , G10L15/22 , H03G3/30 , G10L21/034 , G10L25/21 , G10L13/033 , H04L12/28 , G10L21/10
摘要: Systems and methods are provided herein for responding to a voice command at a volume level based on a volume level of the voice command. For example, a media guidance application may detect, through a first voice-operated user device of a plurality of voice-operated user devices, a voice command spoken by a user. The media guidance application may determine a first volume level of the voice command. Based on the volume level of the voice command, the media guidance application may determine that a second voice-operated user device of the plurality of voice-operated user devices is closer to the user than any of the other voice-operated user devices. The media guidance application may generate an audible response, through the second voice-operated user device, at a second volume level that is set based on the first volume level of the voice command.
-
-
-
-
-
-
-
-
-