-
公开(公告)号:US12125498B2
公开(公告)日:2024-10-22
申请号:US17570557
申请日:2022-01-07
发明人: Seungbeom Ryu , Sungjae Park , Hyuk Oh , Myeungyong Choi , Junkwon Choi
CPC分类号: G10L25/84 , G06N3/045 , G10L15/02 , G10L15/16 , G10L15/22 , H04R1/08 , H04R3/00 , G10L2015/223 , H04R2420/07
摘要: According to various embodiments, an electronic device may include: a microphone; an audio connector; a wireless communication circuit; a processor operatively connected to the microphone, the audio connector, and the wireless communication circuit; and a memory operatively connected to the processor, wherein the memory may store instructions that, when executed, cause the processor to: receive a first audio signal through the microphone, the audio connector, or the wireless communication circuit, extract audio feature information from the first audio signal, and recognize a speech section in a second audio signal, received after the first audio signal through the microphone, the audio connector, or the wireless communication circuit, using the audio feature information.
-
公开(公告)号:US12125496B1
公开(公告)日:2024-10-22
申请号:US18644959
申请日:2024-04-24
申请人: Sanas.ai Inc.
发明人: Shawn Zhang , Lukas Pfeifenberger , Jason Wu , Piotr Dura , David Braude , Bajibabu Bollepalli , Alvaro Escudero , Gokce Keskin , Ankita Jha , Maxim Serebryakov
CPC分类号: G10L21/0232 , G10L15/02 , G10L15/063 , G10L25/30 , G10L15/16 , G10L15/22
摘要: The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.
-
公开(公告)号:US12124998B2
公开(公告)日:2024-10-22
申请号:US18476712
申请日:2023-09-28
申请人: Asana, Inc.
发明人: Steve B. Morin
CPC分类号: G06Q10/103 , G06F3/0486 , G06F40/40 , G06Q10/06 , G06T7/33 , G10L15/005
摘要: Systems and methods to generate records within a collaboration environment are described herein. Exemplary implementations may perform one or more of: manage environment state information maintaining a collaboration environment; obtain input information defining digital assets representing sets of content input via a user interface; generate content information characterizing the sets of content represented in the digital assets; generate individual records based on the content information; and/or other operations.
-
公开(公告)号:US12119008B2
公开(公告)日:2024-10-15
申请号:US17655441
申请日:2022-03-18
发明人: Samuel Thomas , Vishal Sunder , Hong-Kwang Kuo , Jatin Ganhotra , Brian E. D. Kingsbury , Eric Fosler-Lussier
IPC分类号: G10L19/00 , G06F40/126 , G06N3/045 , G10L15/00
CPC分类号: G10L19/00 , G06F40/126 , G06N3/045 , G10L15/00
摘要: Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.
-
公开(公告)号:US12118981B2
公开(公告)日:2024-10-15
申请号:US17475897
申请日:2021-09-15
申请人: GOOGLE LLC
CPC分类号: G10L13/086 , G10L15/22 , G10L2015/223 , G10L2015/225
摘要: Implementations relate to determining multilingual content to render at an interface in response to a user submitted query. Those implementations further relate to determining a first language response and a second language response to a query that is submitted to an automated assistant. Some of those implementations relate to determining multilingual content that includes a response to the query in both the first and second languages. Other implementations relate to determining multilingual content that includes a query suggestion in the first language and a query suggestion in a second language. Some of those implementations relate to pre-fetching results for the query suggestions prior to rendering the multilingual content.
-
公开(公告)号:US12118978B2
公开(公告)日:2024-10-15
申请号:US18387211
申请日:2023-11-06
申请人: ROVI GUIDES, INC.
IPC分类号: G10L13/06 , G10L13/00 , G10L13/02 , G10L13/033 , G10L13/08 , G10L15/00 , G10L15/10 , G10L15/16 , G10L15/18 , G10L15/22 , G10L15/26 , G10L25/63
CPC分类号: G10L13/0335 , G10L25/63 , G10L13/00 , G10L13/02 , G10L13/06 , G10L13/08 , G10L15/00 , G10L15/10 , G10L15/16 , G10L15/18 , G10L15/22 , G10L15/26
摘要: The system provides a synthesized speech response to a voice input, based on the prosodic character of the voice input. The system receives the voice input and calculates at least one prosodic metric of the voice input. The at least one prosodic metric can be associated with a word, phrase, grouping thereof, or the entire voice input. The system also determines a response to the voice input, which may include the sequence of words that form the response. The system generates the synthesized speech response, by determining prosodic characteristics based on the response, and on the prosodic character of the voice input. The system outputs the synthesized speech response, which includes a more natural, relevant, or both answer to the call of the voice input. The prosodic character of the voice input and/or response may include pitch, note, duration, prominence, timbre, rate, and rhythm, for example.
-
公开(公告)号:US12118976B1
公开(公告)日:2024-10-15
申请号:US18622365
申请日:2024-03-29
发明人: Boyu Chen , Peike Li , Yao Yao , Yijun Wang
IPC分类号: G10L13/027 , G10L15/00
CPC分类号: G10L13/027
摘要: The method involves configuring a pretrained text to music AI model that includes a neural network implementing a diffusion model. The process includes receiving audio sample data corresponding to a specific audio concept, generating a concept identifier token based on the audio sample data, adapting a loss function of the diffusion model based on the concept identifier token, selecting pivotal parameters in weight matrices in a self-attention layer of the neural network of the AI model based on the audio sample data, and further training the pivotal parameters of the AI model, to optimize the AI model for the specific audio concept.
-
8.
公开(公告)号:US12106747B2
公开(公告)日:2024-10-01
申请号:US18095804
申请日:2023-01-11
IPC分类号: G10L15/00 , G06F40/263 , H04H20/59 , H04H20/86 , H04H60/58 , H04N21/233 , H04N21/2362
CPC分类号: G10L15/005 , G06F40/263 , H04H20/59 , H04H20/86 , H04H60/58 , H04N21/233 , H04N21/2362
摘要: A device may be configured to parse a syntax element specifying the number of available languages within a presentation associated with an audio stream. A device may be configured to parse one or more syntax elements identifying each of the available languages and parse an accessibility syntax element for each language within the presentation.
-
公开(公告)号:US12100385B2
公开(公告)日:2024-09-24
申请号:US17237258
申请日:2021-04-22
发明人: David Peace Hung
CPC分类号: G10L15/005 , G06F40/58 , G10L15/26 , G10L15/32
摘要: Systems are provided for multilingual speech data processing. A language identification module is configured to analyze spoken utterances in an audio stream and to detect at least one language corresponding to the spoken language utterances. The language identification module detects that a first language corresponds to the first portion of the audio stream. A first transcription of the first portion of the audio stream in the first language is generated and stored in a cache. A second transcription of a second portion of the audio stream in the first language is also generated and stored. When the second portion of the audio stream corresponds to a second language, a third transcription is generated in the second language using a second speech recognition engine configured to transcribe spoken language utterances in the second language. Then, the second transcription is replaced with the third transcription in the cache and any displayed instances.
-
公开(公告)号:US12087298B2
公开(公告)日:2024-09-10
申请号:US17979078
申请日:2022-11-02
发明人: Donghyeon Lee , Seonghan Ryu , Yubin Seo , Eunji Lee , Sungja Choi , Jiyeon Hong , Sechun Kang , Yongjin Cho , Seungchul Lee
摘要: Disclosed is an electronic device. The electronic device may execute an application for transmitting and receiving at least one of text data or voice data with another electronic device using the communication module, in response to occurrence of at least one event, based on receiving at least one of text data or voice data from the another electronic device, identify that a confirmation is necessary using the digital assistant based on at least one of text data or voice data being generated based on a characteristic of ab utterance using a digital assistant, generate a notification to request confirmation using the digital assistant based on confirmation being necessary, and output the notification using the application.
A method for identifying that a confirmation is necessary may include identifying using voice data or text data that is received from another electronic device using a rule-based or AI algorithm.
When a confirmation is necessary is identified using the AI algorithm, the method may use machine learning, neural network, or a deep learning algorithm.
-
-
-
-
-
-
-
-
-