Patent search ap:("Google LLC") AND inv:"Aleksandar Kracun" Page 2

11.

发明授权
Speaker diarization 有权

公开(公告)号：US10978070B2

公开(公告)日：2021-04-13

申请号：US16552244

申请日：2019-08-27

Applicant: Google LLC

Inventor： Aleksandar Kracun , Richard Cameron Rose

IPC: G10L17/00 , G10L15/22 , G10L15/08 , H04M3/56

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.

12.

发明授权
Speaker diarization 有权

公开(公告)号：US10403288B2

公开(公告)日：2019-09-03

申请号：US15785751

申请日：2017-10-17

Applicant: Google LLC

Inventor： Aleksandar Kracun , Richard Cameron Rose

IPC: G10L17/00 , G10L15/22 , G10L15/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.

13.

发明申请
SPEAKER DIARIZATION 有权

公开(公告)号：US20240371365A1

公开(公告)日：2024-11-07

申请号：US18772267

申请日：2024-07-15

Applicant: Google LLC

Inventor： Aleksandar Kracun , Richard Cameron Rose

IPC: G10L15/08 , G10L15/22 , G10L17/00 , H04M3/56

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.

14.

发明授权
Speaker diarization 有权

公开(公告)号：US12051405B2

公开(公告)日：2024-07-30

申请号：US18309900

申请日：2023-05-01

Applicant: Google LLC

Inventor： Aleksandar Kracun , Richard Cameron Rose

IPC: G10L17/00 , G10L15/08 , G10L15/22 , H04M3/56

CPC classification number: G10L15/08 , G10L15/22 , G10L2015/088 , G10L2015/223 , G10L2015/228 , G10L17/00 , H04M3/568 , H04M2250/74

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.

15.

发明公开
END-TO-END SPEECH CONVERSION 审中-公开

公开(公告)号：US20230230572A1

公开(公告)日：2023-07-20

申请号：US18188524

申请日：2023-03-23

Applicant: Google LLC

Inventor： Fadi Biadsy , Ron J. Weiss , Aleksandar Kracun , Pedro J. Moreno Mengibar

IPC: G10L13/02 , G06N3/08 , G10L21/10 , G10L25/30 , H04L51/02

CPC classification number: G10L13/02 , G06N3/08 , G10L21/10 , G10L25/30 , H04L51/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

16.

发明申请
Freeze Words 有权

公开(公告)号：US20220180862A1

公开(公告)日：2022-06-09

申请号：US17115742

申请日：2020-12-08

Applicant: Google LLC

Inventor： Matthew Sharifi , Aleksandar Kracun

IPC: G10L15/16 , G10L15/05

Abstract: A method for detecting freeze words includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device associated with the user. The method also includes processing, using a speech recognizer, the audio data to determine that the utterance includes a query for a digital assistant to perform an operation. The speech recognizer is configured to trigger endpointing of the utterance after a predetermined duration of non-speech in the audio data. Before the predetermined duration of non-speech, the method includes detecting a freeze word in the audio data. In response to detecting the freeze word in the audio data, the method also includes triggering a hard microphone closing event at the user device. The hard microphone closing event prevents the user device from capturing any audio subsequent to the freeze word.

17.

发明申请
Adapting Hotword Recognition Based On Personalized Negatives 有权

公开(公告)号：US20220165277A1

公开(公告)日：2022-05-26

申请号：US16953510

申请日：2020-11-20

Applicant: Google LLC

Inventor： Aleksandar Kracun , Matthew Sharifi

IPC: G10L17/24 , G10L17/06 , G10L15/197 , G10L15/22

Abstract: A method for adapting hotword recognition includes receiving audio data characterizing a hotword event detected by a first stage hotword detector in streaming audio captured by a user device. The method also includes processing, using a second stage hotword detector, the audio data to determine whether a hotword is detected by the second stage hot word detector in a first segment of the audio data. When the hotword is not detected by the second stage hotword detector, the method includes, classifying the first segment of the audio data as containing a negative hotword that caused a false detection of the hotword event in the streaming audio by the first stage hotword detector. Based on the first segment of the audio data classified as containing the negative hotword, the method includes updating the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that contains the negative hotword.

18.

发明授权
Hotword-aware speech synthesis 有权

公开(公告)号：US11308934B2

公开(公告)日：2022-04-19

申请号：US16609326

申请日：2018-06-25

Applicant: Google LLC

Inventor： Matthew Sharifi , Aleksandar Kracun

IPC: G10L13/027 , G06K9/62 , G10L13/08 , G10L17/24 , G10L25/87

Abstract: A method includes receiving text input data for conversion into synthesized speech and determining, using a hotword-aware model trained to detect a presence of a hotword assigned to a user device, whether a pronunciation of the text input data includes the hotword. The hotword is configured to initiate a wake-up process on the user device for processing the hotword and/or one or more other terms following the hotword in the audio input data. When the pronunciation of the text input data includes the hotword, the method also includes generating an audio output signal from the text input data and providing the audio output signal to an audio output device to output the audio output signal. The audio output signal when captured by an audio capture device of the user device, configured to prevent initiation of the wake-up process on the user device.

19.

发明申请
CONTEXTUAL HOTWORDS 有权

公开(公告)号：US20210043210A1

公开(公告)日：2021-02-11

申请号：US17068681

申请日：2020-10-12

Applicant: Google LLC

Inventor： Christopher Thaddeus Hughes , Ignacio Lopez Moreno , Aleksandar Kracun

IPC: G10L15/22 , G10L15/08 , G10L15/02 , G10L15/20

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.

20.

发明授权
End-to-end speech conversion 有权

公开(公告)号：US12300216B2

公开(公告)日：2025-05-13

申请号：US17310732

申请日：2019-11-26

Applicant: Google LLC

Inventor： Fadi Biadsy , Ron J. Weiss , Aleksandar Kracun , Pedro J. Moreno Mengibar

IPC: G10L13/02 , G06N3/08 , G10L13/027 , G10L13/08 , G10L15/06 , G10L21/003 , G10L21/04 , G10L21/10 , G10L25/18 , G10L25/30 , H04L51/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification