Patent search cpc:"G10L15/005" Page 1

1.

发明公开
AUTOMATIC SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240321264A1

公开(公告)日：2024-09-26

申请号：US18679981

申请日：2024-05-31

Applicant: Amazon Technologies, Inc.

Inventor： Jing Liu , Feng-Ju Chang , Athanasios Mouchtaris , Martin Radfar , Maurizio Omologo , Siegfried Kunzmann

IPC: G10L15/08 , G10L15/00 , G10L15/02

CPC classification number: G10L15/08 , G10L15/005 , G10L15/02 , G10L2015/088

Abstract: Techniques for performing automatic speech recognition (ASR) are described. In some embodiments, an ASR component integrates contextual information from user profile data into audio encoding data to predict a token(s) corresponding to a spoken input. The user profile data may include personalized words, such as, contact names, device names, etc. The ASR component determines word embedding data using the personalized words. The ASR component is configured to apply attention to audio frames that are relevant to the personalized words based on processing the audio encoding data and the word embedding data.

2.

发明公开
ASSISTANCE DURING AUDIO AND VIDEO CALLS 审中-公开

公开(公告)号：US20240314094A1

公开(公告)日：2024-09-19

申请号：US18670389

申请日：2024-05-21

Applicant: Google LLC

Inventor： Fredrik BERGENLID , Vladyslav LYSYCHKIN , Denis BURAKOV , Behshad BEHZADI , Andrea Terwisscha VAN SCHELTINGA , Quentin Lascombes DE LAROUSSILHE , Mikhail GOLIKOV , Koa METTER , Ibrahim BADR , Zaheed SABUR

IPC: H04L51/10 , G06F16/44 , G10L15/00 , G10L15/16 , G10L15/22 , G10L25/63 , H04N7/15 , H04N21/439 , H04N21/4788

CPC classification number: H04L51/10 , G06F16/44 , G10L15/22 , H04N7/15 , H04N21/4394 , H04N21/4788 , G10L15/005 , G10L15/16 , G10L2015/223 , G10L25/63

Abstract: Implementations relate to providing information items for display during a communication session. In some implementations, a computer-implemented method includes receiving, during a communication session between a first computing device and a second computing device, first media content from the communication session. The method further includes determining a first information item for display in the communication session based at least in part on the first media content. The method further includes sending a first command to at least one of the first computing device and the second computing device to display the first information item.

3.

发明授权
Method and apparatus for generating interaction record, and device and medium 有权

公开(公告)号：US12087285B2

公开(公告)日：2024-09-10

申请号：US17881999

申请日：2022-08-05

Applicant: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

Inventor： Jingsheng Yang , Kojung Chen , Li Zhao , Xiao Han , Yin Shi

IPC: G10L15/08 , G10L15/00 , G10L25/57 , G10L25/72

CPC classification number: G10L15/083 , G10L15/005 , G10L25/57 , G10L25/72

Abstract: A method and apparatus for generating an interaction record, and a device and a medium are provided. The method includes: firstly, from a multimedia data stream, collecting behavior data, represented by the multimedia data stream, of a user, wherein the behavior data includes voice information and/or operation information; and then, on the basis of the behavior data, generating interaction record data corresponding to the behavior data. According to the technical solution, by means of collecting voice information and/or operation information from a multimedia data stream, and generating interaction record data on the basis of the voice information and the operation information, an interacting user can determine interaction information by using the interaction record data, and the interaction efficiency of the interacting user is improved, thereby also improving the user experience.

4.

发明授权
Automated language identification during virtual conferences 有权

公开(公告)号：US12074720B2

公开(公告)日：2024-08-27

申请号：US17732826

申请日：2022-04-29

Applicant: Zoom Video Communications, Inc.

Inventor： Awni Yusuf Hannun , Sebastian Stüker

IPC: G06F15/16 , G06F40/58 , G10L15/00 , H04L12/18

CPC classification number: H04L12/1818 , G06F40/58 , G10L15/005

Abstract: In some aspects, a computing device may access audio information comprising an audio stream from a client device. The computing device may provide an audio segment from the audio stream to a language identification process of the computing device comprising a machine learning model that is trained to identify a language of a plurality of languages within recorded speech. The computing device may identify an identified-language of the plurality of languages for the speech based at least in part on the audio segment. The computing device may provide the identified-language to the client device. Numerous other aspects are described.

5.

发明公开
CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH 审中-公开

公开(公告)号：US20240265924A1

公开(公告)日：2024-08-08

申请号：US18573846

申请日：2021-06-29

Applicant: Shujie LIU , Jinyu LI , Long ZHOU , Xie SUN , Microsoft Technology Licensing, LLC

Inventor： Jinyu LI , Long ZHOU , Xie SUN , Shujie LIU

IPC: G10L15/32 , G10L15/00 , G10L15/06 , G10L15/30

CPC classification number: G10L15/32 , G10L15/005 , G10L15/063 , G10L15/30 , G10L2015/0635

Abstract: Embodiments are provided for building a configurable multilingual model. A computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. The computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.

6.

发明公开
EYEWEAR DEVICE AND METHOD OF USE 审中-公开

公开(公告)号：US20240221722A1

公开(公告)日：2024-07-04

申请号：US18403008

申请日：2024-01-03

Applicant: Solos Technology Limited

Inventor： Wai Kuen CHEUNG , Kwok Wah LAW , Chi Sum YU , Kwun Lam TAI , Kenneth FAN

IPC: G10L15/00 , G02B27/01 , G02C11/00 , G10L15/06 , G10L15/26

CPC classification number: G10L15/005 , G02B27/0172 , G02B27/0176 , G02C11/10 , G10L15/063 , G10L15/26 , G02B2027/0178

Abstract: An eyewear device having an eyewear frame including a first unidirectional audio input device configured to receive a communication datum from an individual in communication with a user, a second unidirectional audio input device configured to receive user speech from the user. a user input device configured to control whether the eyewear frame is in a first mode or a second mode and at least one audio output device located on at least one temple of the eyewear frame, and a computing device configured to, receive a user input from the user input device. in the first mode, receive the communication datum and modify the communication datum to generate a modified communication datum and in the second mode, receive the user speech and transmit the user speech to a remote device.

7.

发明公开
COMBINING VISUAL AND AUDIO INSIGHTS TO DETECT OPENING SCENES IN MULTIMEDIA FILES 审中-公开

公开(公告)号：US20240221379A1

公开(公告)日：2024-07-04

申请号：US18090843

申请日：2022-12-29

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Yonit HOFFMAN , Mordechai KADOSH , Zvi FIGOV , Eliyahu STRUGO , Mattan SERRY , Michael BEN-HAYM

IPC: G06V20/40 , G06V10/70 , G06V20/62 , G06V30/244 , G10L15/00 , G10L15/26 , G10L25/48 , G11B27/10

CPC classification number: G06V20/41 , G06V10/70 , G06V20/46 , G06V20/49 , G06V20/62 , G06V30/245 , G10L15/005 , G10L15/26 , G10L25/48 , G11B27/102

Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.

8.

发明公开
VOICE-BASED INTERFACE FOR TRANSLATING UTTERANCES BETWEEN USERS 审中-公开

公开(公告)号：US20240220741A1

公开(公告)日：2024-07-04

申请号：US18608703

申请日：2024-03-18

Applicant: GOOGLE LLC

Inventor： Michael Greenberg , Bertrand Damiba , Olivia Grace , Fei Wu , Shane Brennan

IPC: G06F40/58 , G06F40/51 , G10L15/00 , G10L15/22

CPC classification number: G06F40/58 , G06F40/51 , G10L15/005 , G10L15/22

Abstract: The systems and methods described herein can generate a voice-based interface to increase the accuracy of translations. The voice-based interface can result in fewer input audio signals being transmitted between devices of a network. Reducing the number of redundant translation requests that are sent between the devices of a network can save bandwidth and other computational resources by processing fewer input audio signals.

9.

发明公开
AUTOMATED CALL REQUESTS WITH STATUS UPDATES 审中-公开

公开(公告)号：US20240187521A1

公开(公告)日：2024-06-06

申请号：US18442576

申请日：2024-02-15

Applicant: GOOGLE LLC

Inventor： Eyal Segalis , Daniel Walevski , Yaniv Leviathan , Yossi Matias

IPC: H04M3/493 , G06F40/20 , G06F40/205 , G06F40/56 , G06N20/00 , G06Q10/06 , G06Q10/10 , G10L13/08 , G10L15/00 , G10L15/18 , G10L15/22 , G10L25/63 , H04M3/42 , H04M3/58 , H04M3/60

CPC classification number: H04M3/4936 , G06F40/20 , G06F40/205 , G06F40/56 , G06N20/00 , G06Q10/06 , G06Q10/10 , G10L15/005 , G10L15/1807 , G10L15/222 , H04M3/42042 , H04M3/42093 , H04M3/493 , H04M3/58 , H04M3/60 , G10L13/08 , G10L15/1815 , G10L2015/227 , G10L25/63 , H04M2242/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.

10.

发明授权
Automatic speech recognition 有权

公开(公告)号：US12002451B1

公开(公告)日：2024-06-04

申请号：US17484457

申请日：2021-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Jing Liu , Feng-Ju Chang , Athanasios Mouchtaris , Martin Radfar , Maurizio Omologo , Siegfried Kunzmann

IPC: G10L15/08 , G10L15/00 , G10L15/02

CPC classification number: G10L15/08 , G10L15/005 , G10L15/02 , G10L2015/088

Abstract: Techniques for performing automatic speech recognition (ASR) are described. In some embodiments, an ASR component integrates contextual information from user profile data into audio encoding data to predict a token(s) corresponding to a spoken input. The user profile data may include personalized words, such as, contact names, device names, etc. The ASR component determines word embedding data using the personalized words. The ASR component is configured to apply attention to audio frames that are relevant to the personalized words based on processing the audio encoding data and the word embedding data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification