Patent search ap:("Google LLC") AND inv:"Hank Liao" Page 1

1.

发明申请
WORD-LEVEL END-TO-END NEURAL SPEAKER DIARIZATION WITH AUXNET 有权

公开(公告)号：US20250118292A1

公开(公告)日：2025-04-10

申请号：US18891045

申请日：2024-09-20

Applicant: Google LLC

Inventor： Yiling Huang , Weiran Wang , Quan Wang , Guanlong Zhao , Hank Liao , Han Lu

IPC: G10L15/06 , G10L15/07

Abstract: A method includes obtaining labeled training data including a plurality of spoken terms spoken during a conversation. For each respective spoken term, the method includes generating a corresponding sequence of intermediate audio encodings from a corresponding sequence of acoustic frames, generating a corresponding sequence of final audio encodings from the corresponding sequence of intermediate audio encodings, generating a corresponding speech recognition result, and generating a respective speaker token representing a predicted identity of a speaker for each corresponding speech recognition result. The method also includes training the joint speech recognition and speaker diarization model jointly based on a first loss derived from the generated speech recognition results and the corresponding transcriptions and a second loss derived from the generated speaker tokens and the corresponding speaker labels.

2.

发明申请
Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching 有权

公开(公告)号：US20220392439A1

公开(公告)日：2022-12-08

申请号：US17755972

申请日：2019-11-18

Applicant: Google LLC

Inventor： Olivier Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basillo Garcia Castillo

IPC: G10L15/08 , G10L13/02 , G10L15/25 , G06V20/40 , G06V40/16 , G10L15/06 , G06V10/774 , G10L15/22 , G10L15/30 , G10L25/57

Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

3.

发明授权
Privacy-aware meeting room transcription from audio-visual stream 有权

公开(公告)号：US12242648B2

公开(公告)日：2025-03-04

申请号：US18535214

申请日：2023-12-11

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

4.

发明授权
Privacy-aware meeting room transcription from audio-visual stream 有权

公开(公告)号：US12118123B2

公开(公告)日：2024-10-15

申请号：US17755892

申请日：2019-11-18

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

5.

发明公开
PRIVACY-AWARE MEETING ROOM TRANSCRIPTION FROM AUDIO-VISUAL STREAM 审中-公开

公开(公告)号：US20240104247A1

公开(公告)日：2024-03-28

申请号：US18535214

申请日：2023-12-11

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

6.

发明申请
Privacy-Aware Meeting Room Transcription from Audio-Visual Stream 有权

公开(公告)号：US20220382907A1

公开(公告)日：2022-12-01

申请号：US17755892

申请日：2019-11-18

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

7.

发明申请
ACOUSTIC-TO-WORD NEURAL NETWORK SPEECH RECOGNIZER 审中-公开

公开(公告)号：US20180174576A1

公开(公告)日：2018-06-21

申请号：US15834254

申请日：2017-12-07

Applicant: Google LLC

Inventor： Hagen Soltau , Hasim Sak , Hank Liao

IPC: G10L15/16 , G06N3/04 , G06N3/08 , G10L15/02 , G10L15/22 , G10L15/14 , G10L21/10 , G10L15/06

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/084 , G10L15/02 , G10L15/063 , G10L15/14 , G10L15/22 , G10L21/10

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for large vocabulary continuous speech recognition. One method includes receiving audio data representing an utterance of a speaker. Acoustic features of the audio data are provided to a recurrent neural network trained using connectionist temporal classification to estimate likelihoods of occurrence of whole words based on acoustic feature input. Output of the recurrent neural network generated in response to the acoustic features is received. The output indicates a likelihood of occurrence for each of multiple different words in a vocabulary. A transcription for the utterance is generated based on the output of the recurrent neural network. The transcription is provided as output of the automated speech recognition system.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification