专利检索 ipc:G10L13/04 第 1 页

1.

发明授权
Multiple wake words for systems with multiple smart assistants 有权

公开(公告)号：US12131523B2

公开(公告)日：2024-10-29

申请号：US17182951

申请日：2021-02-23

申请人： Meta Platforms, Inc.

发明人： Xiaohu Liu , Baiyang Liu , Rajen Subba

IPC分类号： G06V10/82 , G06F3/01 , G06F3/16 , G06F7/14 , G06F9/451 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/00 , G06V10/764 , G06V20/10 , G06V40/20 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/50 , H04L67/5651 , H04L67/75 , H04W12/08 , G10L13/00 , G10L13/04 , H04L51/046 , H04L67/10 , H04L67/53

CPC分类号： G06V10/82 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F9/453 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/01 , G06V10/764 , G06V20/10 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53

摘要： In one embodiment, a method includes by a client system associated with a user, receiving, at the client system, a user input from the user, parsing, by the client system, the first user input to identify a request to execute a function to be performed by an assistant system of several assistant systems associated with the client system, determining whether the user is authorized to access the assistant system by comparing a voiceprint of the user to several voiceprints stored on the client system, sending, from the client system to the assistant system in response to determining the user is authorized to access the assistant system, a request to set an assistant xbot of the assistant system into a listening mode, and receiving, at the client system from the assistant system, an indication that the assistant xbot is in listening mode.

2.

发明公开
PSEUDOTELEPATHY HEADSET 审中-公开

公开(公告)号：US20240347036A1

公开(公告)日：2024-10-17

申请号：US18638155

申请日：2024-04-17

申请人： University of Utah Research Foundation

发明人： Nicholas S. WITHAM , Juan Pablo BOTERO TORRES , Colleen CHEMERKA , Tanner KRONE , Rami SHORTI , Thomas ODELL

IPC分类号： G10L13/04 , G06F3/01

CPC分类号： G10L13/04 , G06F3/012

摘要： A system for enabling conversion of speech pantomimes of a user into synthesized speech includes a headset connected to an artificial intelligence network hosted on a computing device. The headset can include an array of distance measurement devices distributed adjacent facial regions of the user associated with speech. The system can further include a microphone and a speaker. Sensor data captured by the distance measuring devices and audio data captured by the microphone are used to train the artificial intelligence network to correlate speech pantomimes of the user with phonemes. The system can output synthesized speech generated from the phonemes through the speaker.

3.

发明公开
SYSTEMS AND METHODS FOR AUTOMATIC GENERATION OF LISTENING TEST AUDIO FROM AUDIOSCRIPT 审中-公开

公开(公告)号：US20240312450A1

公开(公告)日：2024-09-19

申请号：US18606270

申请日：2024-03-15

申请人： NANJING CONCEPTIVE ARTS DIGITAL TECHNOLOGY CO., LTD.

发明人： Xiaoyi ZHANG

IPC分类号： G10L13/08 , G06F18/241 , G10L13/04 , G10L13/07

CPC分类号： G10L13/08 , G06F18/241 , G10L13/04 , G10L13/07

摘要： A system, computer-readable storage medium, and computer-implemented method for automatically generating listening test audio from audioscript. The audioscript is managed by sections, and each section is parsed and section-configuration generated accordingly. All the section-configurations are composed to a configuration of the audioscript, or said, a configuration. The configuration can be transmitted to a client device such that the configuration is viewable and the parameter values in the configuration can be set/reviewed through a graphical user interface. Responsive to receiving the fulfilled configuration, each section-configuration in the configuration can be applied to the corresponding section to generate the audio and/or audio-generation-script of that section. The complete audio and/or audio-generation-script of the audioscript is generated by concatenating all the audio and/or audio-generation-script of each section.

4.

发明公开
SYNCHRONIZATION METHOD AND APPARATUS FOR AUDIO AND TEXT, DEVICE, AND MEDIUM 审中-公开

公开(公告)号：US20240169972A1

公开(公告)日：2024-05-23

申请号：US18283433

申请日：2022-02-15

申请人： Beijing Bytedance Network Technology Co., Ltd.

发明人： Jiaxin XIONG , Hong FENG , Hao ZENG , Tongxin ZHANG

IPC分类号： G10L13/04 , G10L21/055

CPC分类号： G10L13/04 , G10L21/055

摘要： Provided are a synchronization method and apparatus for audio and text, a device, and a medium. The method includes: determining a plurality of first text segments for audio conversion and a second text for reading display, in which the plurality of first text segments and the second text are from an initial text; converting the plurality of first text segments into audio segments, to obtain a first mapping relationship between the first text segments and the audio segments; performing matching on the first text segments and the second text, to obtain a second mapping relationship between the first text segments and second text segments in the second text; determining the second text segment synchronized with each of the audio segments based on the first mapping relationship and the second mapping relationship.

5.

发明公开
METHOD AND ELECTRONIC DEVICE FOR TRANSLATING SPEECH SIGNAL 审中-公开

公开(公告)号：US20240161731A1

公开(公告)日：2024-05-16

申请号：US18415166

申请日：2024-01-17

申请人： SAMSUNG ELECTRONICS CO., LTD.

发明人： Ji-sang YU , Sang-ha Kim , Jong-youb Ryu , Yoon-jung Choi , Eun-kyoung Kim , Jae-won Lee

IPC分类号： G10L15/00 , G06F40/58 , G10L13/04

CPC分类号： G10L15/005 , G06F40/58 , G10L13/04 , G10L15/26

摘要： A method and an electronic device for translating a speech signal between a first language and a second language with minimized translation delay by translating fewer than all words of the speech signal according to a level of understanding of the second language by a user that receives the translation.

6.

发明授权
Systems and methods to alter voice interactions 有权

公开(公告)号：US11984112B2

公开(公告)日：2024-05-14

申请号：US17244659

申请日：2021-04-29

申请人： Rovi Guides, Inc.

发明人： Ankur Anil Aher , Jeffry Copps Robert Jose , Reda Harb

IPC分类号： G10L15/22 , G10L13/027 , G10L13/04

CPC分类号： G10L13/027 , G10L13/04

摘要： Systems and methods are disclosed for providing voice interactions based on user context. Data is received that causes a voice interaction to be generated for output at a user device. In response, current user contextual data of the user device is retrieved. A user availability level for consuming the voice interaction is determined based on the current user contextual data. The voice interaction is altered based on the user availability level. Content of the voice interaction may be altered to be suitable for consumption. The altered voice interaction is outputted at the user device.

7.

发明授权
Systems and methods for communicating notifications and textual data associated with applications 有权

公开(公告)号：US11954403B1

公开(公告)日：2024-04-09

申请号：US18162735

申请日：2023-02-01

申请人： Google Technology Holdings LLC

发明人： Long Peng , Hui Dai , Xin Guan

IPC分类号： G06F3/0484 , G06F3/0488 , G06F3/16 , G10L13/04

CPC分类号： G06F3/165 , G06F3/0484 , G06F3/0488 , G10L13/04

摘要： Embodiments are provided for communicating notifications and other textual data associated with applications installed on an electronic device. According to certain aspects, a user can interface with an input device to send (218) a wake up trigger to the electronic device. The electronic device retrieves (222) application notifications and converts (288) the application notifications to audio data. The electronic device also sends (230) the audio data to an audio output device for annunciation (232). The user may also use the input device to send (242) a request to the electronic device to activate the display screen. The electronic device identifies (248) an application corresponding to an annunciated notification, and activates (254) the display screen and initiates the application.

8.

发明公开
SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20240112667A1

公开(公告)日：2024-04-04

申请号：US18525475

申请日：2023-11-30

申请人： Google LLC

发明人： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC分类号： G10L13/04 , G10L17/04 , G10L19/00

CPC分类号： G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

9.

发明授权
Speech transcription from facial skin movements 有权

公开(公告)号：US11922946B2

公开(公告)日：2024-03-05

申请号：US18181787

申请日：2023-03-10

申请人： Q (CUE) LTD.

发明人： Aviad Maizels , Avi Barliya , Yonatan Wexler

IPC分类号： G10L15/25 , G02B27/00 , G06F40/58 , G06V10/141 , G06V10/60 , G06V10/82 , G06V40/16 , G10L13/02 , G10L13/027 , G10L13/04 , G10L15/16 , G10L15/26 , G06F3/01

CPC分类号： G10L15/26 , G02B27/0093 , G06F40/58 , G06V10/141 , G06V10/60 , G06V10/82 , G06V40/171 , G06V40/174 , G06V40/176 , G10L13/02 , G10L13/027 , G10L13/04 , G10L15/16 , G10L15/25 , G06F3/015

摘要： Systems and methods are disclosed for determining textual transcription from minute facial skin movements. In one implementation, a system may include at least one coherent light source, at least one sensor configured to receive light reflections from the at least one coherent light source; and a processor configured to control the at least one coherent light source to illuminate a region of a face of a user. The processor may receive from the at least one sensor, reflection signals indicative of coherent light reflected from the face in a time interval. The reflection signals may be analyzed to determine minute facial skin movements in the time interval. Then, based on the determined minute facial skin movements in the time interval, the processor may determine a sequence of words associated with the minute facial skin movements, and output a textual transcription corresponding with the determined sequence of words.

10.

发明公开
APPARATUS AND METHOD FOR MANAGING USER INPUTS IN VIDEO GAMES 审中-公开

公开(公告)号：US20240066396A1

公开(公告)日：2024-02-29

申请号：US18500338

申请日：2023-11-02

申请人： STEELSERIES ApS

发明人： Michael Aronzon

IPC分类号： A63F13/355 , A63F13/215 , A63F13/25 , A63F13/335 , A63F13/35 , A63F13/40 , A63F13/87 , G06F3/0489 , G06F9/451 , G06F40/263 , G06F40/58 , G10L13/00 , G10L13/04

CPC分类号： A63F13/355 , A63F13/215 , A63F13/25 , A63F13/335 , A63F13/35 , A63F13/40 , A63F13/87 , G06F3/0489 , G06F9/454 , G06F40/263 , G06F40/58 , G10L13/00 , G10L13/04 , A63F13/54

摘要： A system that incorporates teachings of the present disclosure may include, for example, a computing device having a controller to obtain a user input that was inputted into a first accessory operably coupled with the computing device where the first accessory provides a user interface for user interaction with a video game, determine a language of an intended recipient of the user input based on an identity of the intended recipient, access a multi-lingual library comprising a plurality of words associated with the video game, match the user input to one or more words of the plurality of words of the multi-lingual library to generate a translated message in the determined language of the intended recipient, and provide the translated message to a second accessory for presentation to the intended recipient in real-time. Additional embodiments are disclosed.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类