-
公开(公告)号:US20240321264A1
公开(公告)日:2024-09-26
申请号:US18679981
申请日:2024-05-31
Applicant: Amazon Technologies, Inc.
Inventor: Jing Liu , Feng-Ju Chang , Athanasios Mouchtaris , Martin Radfar , Maurizio Omologo , Siegfried Kunzmann
CPC classification number: G10L15/08 , G10L15/005 , G10L15/02 , G10L2015/088
Abstract: Techniques for performing automatic speech recognition (ASR) are described. In some embodiments, an ASR component integrates contextual information from user profile data into audio encoding data to predict a token(s) corresponding to a spoken input. The user profile data may include personalized words, such as, contact names, device names, etc. The ASR component determines word embedding data using the personalized words. The ASR component is configured to apply attention to audio frames that are relevant to the personalized words based on processing the audio encoding data and the word embedding data.
-
公开(公告)号:US20240314094A1
公开(公告)日:2024-09-19
申请号:US18670389
申请日:2024-05-21
Applicant: Google LLC
Inventor: Fredrik BERGENLID , Vladyslav LYSYCHKIN , Denis BURAKOV , Behshad BEHZADI , Andrea Terwisscha VAN SCHELTINGA , Quentin Lascombes DE LAROUSSILHE , Mikhail GOLIKOV , Koa METTER , Ibrahim BADR , Zaheed SABUR
IPC: H04L51/10 , G06F16/44 , G10L15/00 , G10L15/16 , G10L15/22 , G10L25/63 , H04N7/15 , H04N21/439 , H04N21/4788
CPC classification number: H04L51/10 , G06F16/44 , G10L15/22 , H04N7/15 , H04N21/4394 , H04N21/4788 , G10L15/005 , G10L15/16 , G10L2015/223 , G10L25/63
Abstract: Implementations relate to providing information items for display during a communication session. In some implementations, a computer-implemented method includes receiving, during a communication session between a first computing device and a second computing device, first media content from the communication session. The method further includes determining a first information item for display in the communication session based at least in part on the first media content. The method further includes sending a first command to at least one of the first computing device and the second computing device to display the first information item.
-
公开(公告)号:US12087285B2
公开(公告)日:2024-09-10
申请号:US17881999
申请日:2022-08-05
Inventor: Jingsheng Yang , Kojung Chen , Li Zhao , Xiao Han , Yin Shi
CPC classification number: G10L15/083 , G10L15/005 , G10L25/57 , G10L25/72
Abstract: A method and apparatus for generating an interaction record, and a device and a medium are provided. The method includes: firstly, from a multimedia data stream, collecting behavior data, represented by the multimedia data stream, of a user, wherein the behavior data includes voice information and/or operation information; and then, on the basis of the behavior data, generating interaction record data corresponding to the behavior data. According to the technical solution, by means of collecting voice information and/or operation information from a multimedia data stream, and generating interaction record data on the basis of the voice information and the operation information, an interacting user can determine interaction information by using the interaction record data, and the interaction efficiency of the interacting user is improved, thereby also improving the user experience.
-
公开(公告)号:US12074720B2
公开(公告)日:2024-08-27
申请号:US17732826
申请日:2022-04-29
Applicant: Zoom Video Communications, Inc.
Inventor: Awni Yusuf Hannun , Sebastian Stüker
CPC classification number: H04L12/1818 , G06F40/58 , G10L15/005
Abstract: In some aspects, a computing device may access audio information comprising an audio stream from a client device. The computing device may provide an audio segment from the audio stream to a language identification process of the computing device comprising a machine learning model that is trained to identify a language of a plurality of languages within recorded speech. The computing device may identify an identified-language of the plurality of languages for the speech based at least in part on the audio segment. The computing device may provide the identified-language to the client device. Numerous other aspects are described.
-
公开(公告)号:US20240265924A1
公开(公告)日:2024-08-08
申请号:US18573846
申请日:2021-06-29
Applicant: Shujie LIU , Jinyu LI , Long ZHOU , Xie SUN , Microsoft Technology Licensing, LLC
Inventor: Jinyu LI , Long ZHOU , Xie SUN , Shujie LIU
CPC classification number: G10L15/32 , G10L15/005 , G10L15/063 , G10L15/30 , G10L2015/0635
Abstract: Embodiments are provided for building a configurable multilingual model. A computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. The computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.
-
公开(公告)号:US20240221722A1
公开(公告)日:2024-07-04
申请号:US18403008
申请日:2024-01-03
Applicant: Solos Technology Limited
Inventor: Wai Kuen CHEUNG , Kwok Wah LAW , Chi Sum YU , Kwun Lam TAI , Kenneth FAN
CPC classification number: G10L15/005 , G02B27/0172 , G02B27/0176 , G02C11/10 , G10L15/063 , G10L15/26 , G02B2027/0178
Abstract: An eyewear device having an eyewear frame including a first unidirectional audio input device configured to receive a communication datum from an individual in communication with a user, a second unidirectional audio input device configured to receive user speech from the user. a user input device configured to control whether the eyewear frame is in a first mode or a second mode and at least one audio output device located on at least one temple of the eyewear frame, and a computing device configured to, receive a user input from the user input device. in the first mode, receive the communication datum and modify the communication datum to generate a modified communication datum and in the second mode, receive the user speech and transmit the user speech to a remote device.
-
公开(公告)号:US20240221379A1
公开(公告)日:2024-07-04
申请号:US18090843
申请日:2022-12-29
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Yonit HOFFMAN , Mordechai KADOSH , Zvi FIGOV , Eliyahu STRUGO , Mattan SERRY , Michael BEN-HAYM
CPC classification number: G06V20/41 , G06V10/70 , G06V20/46 , G06V20/49 , G06V20/62 , G06V30/245 , G10L15/005 , G10L15/26 , G10L25/48 , G11B27/102
Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.
-
公开(公告)号:US20240220741A1
公开(公告)日:2024-07-04
申请号:US18608703
申请日:2024-03-18
Applicant: GOOGLE LLC
Inventor: Michael Greenberg , Bertrand Damiba , Olivia Grace , Fei Wu , Shane Brennan
CPC classification number: G06F40/58 , G06F40/51 , G10L15/005 , G10L15/22
Abstract: The systems and methods described herein can generate a voice-based interface to increase the accuracy of translations. The voice-based interface can result in fewer input audio signals being transmitted between devices of a network. Reducing the number of redundant translation requests that are sent between the devices of a network can save bandwidth and other computational resources by processing fewer input audio signals.
-
公开(公告)号:US20240187521A1
公开(公告)日:2024-06-06
申请号:US18442576
申请日:2024-02-15
Applicant: GOOGLE LLC
Inventor: Eyal Segalis , Daniel Walevski , Yaniv Leviathan , Yossi Matias
IPC: H04M3/493 , G06F40/20 , G06F40/205 , G06F40/56 , G06N20/00 , G06Q10/06 , G06Q10/10 , G10L13/08 , G10L15/00 , G10L15/18 , G10L15/22 , G10L25/63 , H04M3/42 , H04M3/58 , H04M3/60
CPC classification number: H04M3/4936 , G06F40/20 , G06F40/205 , G06F40/56 , G06N20/00 , G06Q10/06 , G06Q10/10 , G10L15/005 , G10L15/1807 , G10L15/222 , H04M3/42042 , H04M3/42093 , H04M3/493 , H04M3/58 , H04M3/60 , G10L13/08 , G10L15/1815 , G10L2015/227 , G10L25/63 , H04M2242/18
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.
-
公开(公告)号:US12002451B1
公开(公告)日:2024-06-04
申请号:US17484457
申请日:2021-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Jing Liu , Feng-Ju Chang , Athanasios Mouchtaris , Martin Radfar , Maurizio Omologo , Siegfried Kunzmann
CPC classification number: G10L15/08 , G10L15/005 , G10L15/02 , G10L2015/088
Abstract: Techniques for performing automatic speech recognition (ASR) are described. In some embodiments, an ASR component integrates contextual information from user profile data into audio encoding data to predict a token(s) corresponding to a spoken input. The user profile data may include personalized words, such as, contact names, device names, etc. The ASR component determines word embedding data using the personalized words. The ASR component is configured to apply attention to audio frames that are relevant to the personalized words based on processing the audio encoding data and the word embedding data.
-
-
-
-
-
-
-
-
-