-
公开(公告)号:US12131523B2
公开(公告)日:2024-10-29
申请号:US17182951
申请日:2021-02-23
申请人: Meta Platforms, Inc.
发明人: Xiaohu Liu , Baiyang Liu , Rajen Subba
IPC分类号: G06V10/82 , G06F3/01 , G06F3/16 , G06F7/14 , G06F9/451 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/00 , G06V10/764 , G06V20/10 , G06V40/20 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/50 , H04L67/5651 , H04L67/75 , H04W12/08 , G10L13/00 , G10L13/04 , H04L51/046 , H04L67/10 , H04L67/53
CPC分类号: G06V10/82 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F9/453 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/01 , G06V10/764 , G06V20/10 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53
摘要: In one embodiment, a method includes by a client system associated with a user, receiving, at the client system, a user input from the user, parsing, by the client system, the first user input to identify a request to execute a function to be performed by an assistant system of several assistant systems associated with the client system, determining whether the user is authorized to access the assistant system by comparing a voiceprint of the user to several voiceprints stored on the client system, sending, from the client system to the assistant system in response to determining the user is authorized to access the assistant system, a request to set an assistant xbot of the assistant system into a listening mode, and receiving, at the client system from the assistant system, an indication that the assistant xbot is in listening mode.
-
公开(公告)号:US12125487B2
公开(公告)日:2024-10-22
申请号:US17450551
申请日:2021-10-11
申请人: SoundHound, Inc.
发明人: Kiersten L. Bradley , Ethan Coeytaux , Ziming Yin
IPC分类号: G10L15/26 , G06F40/134 , G06F40/166 , G06F40/284 , G10L15/02 , G10L15/06 , G10L15/07
CPC分类号: G10L15/26 , G06F40/134 , G06F40/166 , G06F40/284 , G10L15/02 , G10L15/063 , G10L15/07 , G10L2015/0631
摘要: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed and multiuser-editable transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript by one or more editors. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.
-
公开(公告)号:US12112129B2
公开(公告)日:2024-10-08
申请号:US17527167
申请日:2021-11-16
申请人: Fujitsu Limited
IPC分类号: G10L15/16 , G06F18/214 , G06F40/169 , G06F40/226 , G06N3/04 , G10L15/06 , G10L15/07 , G10L15/18 , G06F40/279 , G06F40/295 , G10L15/183
CPC分类号: G06F40/226 , G06F18/214 , G06F40/169 , G06N3/04 , G10L15/063 , G10L15/075 , G10L15/16 , G10L15/18 , G06F40/279 , G06F40/295 , G10L2015/0635 , G10L15/1822 , G10L15/183
摘要: A method of training a neural network as a natural language processing, NLP, model, comprises: inputting annotated training data to first architecture portions of the neural network, the first architecture portions being executed respectively in a plurality of distributed client computing devices in communication with a server computing device, the training data being derived from text data private to the client computing device in which the first architecture portion is executed, the server computing device having no access to any of the private text data; deriving from the training data, using the first architecture portions, weight matrices of numeric weights which are decoupled from the private text data; concatenating the weight matrices, in a second architecture portion of the neural network executed in the server computing device, to obtain a single concatenated weight matrix; and training, on the second architecture portion, the NLP model using the concatenated weight matrix.
-
公开(公告)号:US20240331702A1
公开(公告)日:2024-10-03
申请号:US18743562
申请日:2024-06-14
发明人: Kiersten L. BRADLEY , Ethan COEYTAUX , Ziming YIN
IPC分类号: G10L15/26 , G06F40/134 , G06F40/166 , G06F40/284 , G10L15/02 , G10L15/06 , G10L15/07
CPC分类号: G10L15/26 , G06F40/134 , G06F40/166 , G06F40/284 , G10L15/02 , G10L15/063 , G10L15/07 , G10L2015/0631
摘要: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.
-
公开(公告)号:US12020696B2
公开(公告)日:2024-06-25
申请号:US16659260
申请日:2019-10-21
申请人: SoundHound, Inc.
发明人: Karl Stahl
IPC分类号: G10L15/00 , G06F16/242 , G06F40/253 , G10L15/07 , G10L15/19 , G10L15/22 , G10L15/30
CPC分类号: G10L15/19 , G06F16/243 , G06F40/253 , G10L15/07 , G10L15/22 , G10L15/30 , G10L2015/223
摘要: [Object] Technology is provided to enable a mobile terminal to function as a digital assistant even when the mobile terminal is in a state where it cannot communicate with a server apparatus.
[Solution] When a user terminal 200 receives a query A from a user, user terminal 200 sends query A to a server 100. Server 100 interprets the meaning of query A using a grammar A. Server 100 obtains a response to query A based on the meaning of query A and sends the response to user terminal 200. Server 100 further sends grammar A to user terminal 200. That is, server 100 sends to user terminal 200 a grammar used to interpret the query received from user terminal 200.-
公开(公告)号:US11997344B2
公开(公告)日:2024-05-28
申请号:US17509401
申请日:2021-10-25
申请人: Rovi Guides, Inc.
IPC分类号: G06F40/40 , G10L13/027 , G10L13/033 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/43 , H04N21/81
CPC分类号: H04N21/43072 , G10L13/027 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/8106
摘要: Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language.
-
公开(公告)号:US20240161732A1
公开(公告)日:2024-05-16
申请号:US18418246
申请日:2024-01-20
申请人: Google LLC
发明人: Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen
CPC分类号: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
-
公开(公告)号:US11983551B2
公开(公告)日:2024-05-14
申请号:US18207053
申请日:2023-06-07
申请人: Apple Inc.
IPC分类号: G06F9/451 , G06F3/0481 , G06F3/0484 , G10L15/06 , G10L15/07 , G10L15/22 , G10L17/00
CPC分类号: G06F9/451 , G06F3/0481 , G06F3/0484 , G10L15/063 , G10L15/07 , G10L15/22 , G10L17/00 , G10L2015/0638 , G10L2015/223 , G10L2015/225
摘要: Examples of multi-user configuration are disclosed. An example method includes, at an electronic device: receiving a request; and in response to the request: if the voice input does not match a voice profile associated with an account associated with the electronic device: causing output of first information based on the request using a first account associated with the electronic device; if a setting of the electronic device has a first state, causing update of account data of the first account based on the request; and if the setting has a second state, forgoing causing update of the account data; and if the voice input matches a voice profile associated with an account associated with the electronic device: causing output of the first information using the account associated with the matching voice profile; and causing update of account data of the account based on the request.
-
9.
公开(公告)号:US20240152705A1
公开(公告)日:2024-05-09
申请号:US18414224
申请日:2024-01-16
申请人: Embodied, Inc.
发明人: Stefan A. Scherer , Mario E. Munich , Paolo Pirjanian , Kevin D. Saunders , Wilson Harron , Marissa Kohan
IPC分类号: G06F40/35 , G10L13/027 , G10L15/07 , G10L15/22 , G10L15/26
CPC分类号: G06F40/35 , G10L13/027 , G10L15/07 , G10L15/22 , G10L15/26 , G10L2015/223
摘要: Systems and methods for managing conversations between a robot computing device and a user are disclosed. Exemplary implementations may: initiate a first-time user experience sequence with the user; teach the user the robot computing capabilities and/or characteristics; initiate, utilizing a dialog manager, a conversation with the user; receive, one or more command files from the user via one or more microphones; and generate conversation response files and communicating the generated conversation files to the dialog manager in response to the one or more received user global command files to initiate an initial conversation exchange.
-
10.
公开(公告)号:US11934769B2
公开(公告)日:2024-03-19
申请号:US18300120
申请日:2023-04-13
申请人: Suki AI, Inc.
发明人: Nithyanand Kota , Yashas Rao , Hao Ran Raymond Lin , Maneesh Dewan , Arunan Rabindran , Jatin Chhugani , Sudheer Tumu
CPC分类号: G06F40/166 , G10L15/07 , G10L15/083 , G10L15/22 , G10L15/26 , G16H15/00 , G10L2015/088
摘要: Systems and methods to briefly deviate from and resume back to amending a section of a note are disclosed. Exemplary implementations may: obtain audio information representing sound captured by an audio section of a client computing platform, such sound including speech from a user associated with the client computing platform; effectuate presentation of a graphical user interface that includes sections of the note; analyze the audio information to determine which individual ones of the spoken inputs are the primary spoken input or the deviant spoken input; determine, based on analysis, which section of the note to which the deviant spoken input is related; alternately amend, based on the determination, sections of the note by deviating from one section to another section and returning back to the one section for continued population; and effectuate, via the user interface, presentation of the alternating amendments to the sections of the note.
-
-
-
-
-
-
-
-
-