-
公开(公告)号:US20240363122A1
公开(公告)日:2024-10-31
申请号:US18765108
申请日:2024-07-05
申请人: GOOGLE LLC
发明人: Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw
IPC分类号: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028
CPC分类号: G10L17/24 , G10L15/26 , G10L17/06 , G10L21/028
摘要: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.
-
公开(公告)号:US12132983B2
公开(公告)日:2024-10-29
申请号:US17859808
申请日:2022-07-07
IPC分类号: H04N23/60 , G02B27/01 , G02C7/10 , G02C7/16 , G06F3/01 , G06F3/044 , G06F3/16 , G06F40/205 , G06V10/25 , G06V10/75 , G06V20/50 , G10L17/06 , G10L17/22 , H04N23/56 , H04N23/62 , H04N23/63 , H04N23/66 , H04N23/695 , H04N23/90
CPC分类号: H04N23/64 , G02B27/0172 , G02C7/101 , G02C7/16 , G06F3/013 , G06F3/016 , G06F3/017 , G06F3/044 , G06F3/167 , G06F40/205 , G06V10/25 , G06V10/759 , G06V20/50 , G10L17/06 , G10L17/22 , H04N23/56 , H04N23/62 , H04N23/63 , H04N23/66 , H04N23/695 , H04N23/90 , G02B2027/0138 , G02B2027/0141 , G02B2027/0178
摘要: A wearable device for use in immersive reality applications is provided. The wearable device includes eyepieces to provide a forward-image to a user, a first forward-looking camera mounted on the frame and having a field of view, a processor configured to identify a region of interest within the forward-image, and an interface device to indicate to the user that a field of view of the first forward-looking camera is misaligned with the region of interest. Methods of use of the device, a memory storing instructions and a processor to execute the instructions to cause the device to perform the methods of use, are also provided.
-
公开(公告)号:US12131740B2
公开(公告)日:2024-10-29
申请号:US18331920
申请日:2023-06-08
CPC分类号: G10L17/04 , G10L17/02 , G10L17/06 , H04M3/42221
摘要: Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
-
公开(公告)号:US20240355335A1
公开(公告)日:2024-10-24
申请号:US18685019
申请日:2022-11-08
发明人: Xianliang Wang , Hongbin Suo
CPC分类号: G10L17/06 , G10L15/04 , G10L15/063 , G10L17/04 , G10L2015/0631
摘要: The present disclosure relates to an audio signal processing method and apparatus, a device and a storage medium. The present disclosure performs a segmenting processing on an audio signal to obtain multiple audio segments, performs a clustering processing on the multiple audio segments according to feature information of each audio segment in the multiple audio segments to obtain one or more first sets, determines a first cluster center of each first set according to the feature information of the audio segment included in each first set, and performs a clustering processing on the multiple audio segments according to the first cluster center of each first set to obtain one or more second sets, where audio segments in a same second set corresponding to a same role label. In this way, an accuracy of an unsupervised role separation based on a single channel speech is improved.
-
公开(公告)号:US12120262B2
公开(公告)日:2024-10-15
申请号:US17954422
申请日:2022-09-28
发明人: George Albero , Youshika C. Scott , Brian H. Corr , Thomas G. Frost , Scott Nielsen , Charlene L. Ramsue
CPC分类号: H04M3/2281 , G10L17/04 , G10L17/06 , G10L25/90 , H04M3/42042 , H04M3/5175 , H04M3/5183 , H04M2201/41
摘要: Aspects of the disclosure relate to voiceprint tracking and anomaly detection. A computing platform may detect voice information from a call management system. The computing platform may establish voiceprints for employees and clients of an enterprise. The computing platform may detect a call between an employee and a caller attempting to access a client account. The computing platform may identify a first voiceprint corresponding to the employee and a second voiceprint corresponding to the caller. The computing platform may compare the second voiceprint to a known voiceprint corresponding to the client. Based on the comparison of the second voiceprint to the known voiceprint, the computing platform may determine that the second voiceprint does not match the known voiceprint. The computing platform may identify that the second voiceprint corresponds to another employee of the enterprise, and may send a security notification indicating potential unauthorized account access to an enterprise computing device.
-
公开(公告)号:US12114039B2
公开(公告)日:2024-10-08
申请号:US17437281
申请日:2021-07-27
IPC分类号: H04N21/4415 , G10L17/06 , G10L17/22 , H04N21/422 , H04N21/45 , H04N21/472
CPC分类号: H04N21/4415 , G10L17/06 , G10L17/22 , H04N21/42203 , H04N21/4532 , H04N21/47202
摘要: An electronic apparatus is provided. The electronic apparatus includes: a display; and a processor configured to: control the display to display a content based on one mode of a plurality of display modes, receive a user voice in real time while the content is being displayed, identify user's age information corresponding to the received user voice, identify whether or not the one mode is a kids mode when the identified user's age information is less than a threshold value, and change the one mode to the kids mode when it is identified that the one mode is not the kids mode.
-
公开(公告)号:US20240314427A1
公开(公告)日:2024-09-19
申请号:US18634869
申请日:2024-04-12
IPC分类号: H04N23/60 , G06T7/73 , G10L17/06 , G10L17/18 , G10L25/57 , H04N5/268 , H04N23/611 , H04R1/40 , H04R3/00
CPC分类号: H04N23/64 , G06T7/73 , G10L17/06 , G10L17/18 , G10L25/57 , H04N5/268 , H04N23/611 , H04R1/406 , H04R3/005 , G06T2207/10016 , G06T2207/20084 , G06T2207/30201
摘要: Described are multiple cameras in a conference room, each pointed in a different direction. A primary camera includes a microphone array to perform sound source localization (SSL). The SSL is used in combination with a video image to identify the speaker from among multiple individuals that appear in the video image. Pose information of the speaker is developed. Pose information of each individual identified in each other camera is developed. The speaker pose information is compared to the pose information of the individuals from the other cameras. The best match for each other camera is selected as the speaker in that camera. The speaker views of each camera are compared to determine the speaker view with the most frontal view of the speaker. That camera is selected to provide the video for provision to the far end.
-
公开(公告)号:US12063214B2
公开(公告)日:2024-08-13
申请号:US16799867
申请日:2020-02-25
申请人: VMware LLC
发明人: Rohit Pradeep Shetty
CPC分类号: H04L63/0861 , G10L17/06 , G10L17/24 , G10L25/84 , H04L63/0838 , H04L63/0846 , H04L63/0853
摘要: Disclosed are various approaches for authenticating a user through a voice assistant device and creating an association between the device and a user account. The request is associated with a network or federated service. The user can use a client device, such as a smartphone, to initiate an authentication flow. A passphrase is provided to the client device can captured by the client device and a voice assistant device. Audio captured by the client device and voice assistant device can be sent to an assistant connection service. The passphrase and an audio signature calculated from the audio can be validated. An association between the user account and the voice assistant device can then be created.
-
公开(公告)号:US12051397B2
公开(公告)日:2024-07-30
申请号:US17176697
申请日:2021-02-16
发明人: Shaomin Xiong , Toshiki Hirano , Pritam Das , Ramy Ayad , Rajeev Nagabhirava
IPC分类号: G10K11/175 , G06F21/32 , G06F21/62 , G06V20/40 , G06V20/52 , G06V40/16 , G10L17/06 , G10L21/028 , G10L25/57 , H04N5/76 , H04N5/77 , H04N9/802 , H04N9/804 , H04N9/82
CPC分类号: G10K11/1754 , G06F21/32 , G06V20/40 , G06V40/172 , G10L17/06 , G10L21/028 , G10L25/57 , H04N5/76 , G06V20/44
摘要: Systems and methods for audio privacy in network video surveillance systems are described. A video camera may include an image sensor and a microphone to generate a video stream. Responsive to detecting a human speaking condition in the video stream, the audio data may be selectively modified to mask a human voice component of the audio data for storing and/or displaying the surveillance video stream.
-
公开(公告)号:US20240249727A1
公开(公告)日:2024-07-25
申请号:US18605696
申请日:2024-03-14
申请人: Prevail Legal, Inc.
发明人: Robert FEIGENBAUM , Random BARES
IPC分类号: G10L17/06 , G10L15/26 , H04L65/1069
CPC分类号: G10L17/06 , G10L15/26 , H04L65/1069
摘要: A system and method that overcomes technological hurdles related to litigation-related management is disclosed. The technological hurdles were overcome with industry-transformative innovations in in-person, hybrid, and remote legal proceedings; court reporting; testimony management; trial preparation; and utilization of video evidence, to name several. These innovations resulted in many advantages, such as could-based testimony management, scalable digital transformation, dramatic savings in litigation costs, and fast turn-around on certified transcripts, to name several.
-
-
-
-
-
-
-
-
-