Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Kazuhito KOISHIDA"

1.

发明申请
MULTI-USER INTELLIGENT ASSISTANCE 审中-公开

公开(公告)号：US20180233142A1

公开(公告)日：2018-08-16

申请号：US15657822

申请日：2017-07-24

Applicant: Microsoft Technology Licensing, LLC

Inventor： Kazuhito KOISHIDA , Alexander A. POPOV , Uros BATRICEVIC , Steven Nabil BATHICHE

IPC: G10L15/22 , G10L15/32 , G10L15/08 , G10L25/51 , H04L29/06

Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.

2.

发明申请
MULTI-USER INTELLIGENT ASSISTANCE 有权

公开(公告)号：US20220012470A1

公开(公告)日：2022-01-13

申请号：US17449054

申请日：2021-09-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Kazuhito KOISHIDA , Alexander A. POPOV , Uros BATRICEVIC , Steven Nabil BATHICHE

IPC: G06K9/00 , A61B5/0205 , A61B5/0507 , A61B5/117 , A61B5/11 , A61B5/00 , G01S5/18 , G01S5/28 , G01S13/72 , G06F1/324 , G06F1/3206 , G06F1/3231 , G06F3/01 , G06F3/03 , G06F3/0482 , G06F3/0484 , G06F3/16 , G06F21/32 , G06F21/35 , G06F40/211 , G06F40/35 , G06K9/62 , G06K9/72 , G06N5/02 , G06N5/04 , G06N20/00 , G06T7/246 , G06T7/292 , G06T7/60 , G06T7/70 , G06T7/73 , G07C9/28 , G08B13/14 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/18 , G10L15/19 , G10L15/22 , G10L15/24 , G10L15/26 , G10L15/28 , G10L15/32 , G10L17/04 , G10L17/08 , G10L25/51 , H04L12/58 , H04L29/06 , H04L29/08 , H04N5/232 , H04N5/33 , H04N7/18 , H04N21/231 , H04N21/422 , H04N21/442 , H04R1/40 , H04R3/00 , H04W4/029 , H04W4/33

Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.

3.

发明申请
SPEECH PARSING WITH INTELLIGENT ASSISTANT 审中-公开

公开(公告)号：US20180293221A1

公开(公告)日：2018-10-11

申请号：US16005470

申请日：2018-06-11

Applicant: Microsoft Technology Licensing, LLC

Inventor： Erich-Soren FINKELSTEIN , Han Yee Mimi FUNG , Aleksandar UZELAC , Oz SOLOMON , Keith Coleman HEROLD , Vivek PRADEEP , Zongyi LIU , Kazuhito KOISHIDA , Haithem ALBADAWI , Steven Nabil BATHICHE , Christopher Lance NUESMEYER , Michelle Lynn HOLTMANN , Christopher Brian QUIRK , Pablo Luis SALA

IPC: G06F17/27 , G10L17/00 , G10L15/22 , G06F15/18

Abstract: A method to execute computer-actionable directives conveyed in human speech comprises: receiving audio data recording speech from one or more speakers; converting the audio data into a linguistic representation of the recorded speech; detecting a target corresponding to the linguistic representation; committing to the data structure language data associated with the detected target and based on the linguistic representation; parsing the data structure to identify one or more of the computer-actionable directives; and submitting the one or more of the computer-actionable directives to the computer for processing.

4.

发明申请
AUDIO-VISUAL SPEECH ENHANCEMENT 有权

公开(公告)号：US20210134312A1

公开(公告)日：2021-05-06

申请号：US16783021

申请日：2020-02-05

Applicant: Microsoft Technology Licensing, LLC

Inventor： Kazuhito KOISHIDA , Michael IUZZOLINO

IPC: G10L21/02 , G06K9/00 , G10L15/25 , G10L25/18 , G10L15/22

Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

5.

发明申请
DETERMINING SPEAKER CHANGES IN AUDIO INPUT 审中-公开

公开(公告)号：US20180233140A1

公开(公告)日：2018-08-16

申请号：US15646871

申请日：2017-07-11

Applicant: Microsoft Technology Licensing, LLC

Inventor： Kazuhito KOISHIDA , Uros BATRICEVIC

IPC: G10L15/22 , G10L15/02 , G10L15/06 , G10L15/18

Abstract: Intelligent assistant systems, methods and computing devices are disclosed for identifying a speaker change. A method comprises receiving audio input comprising a speech fragment. A first voice model is trained with a first sub-fragment from the speech fragment. A second voice model is trained with a second sub-fragment from the speech fragment. The first sub-fragment is analyzed with the second voice model to yield a first confidence value. The second sub-fragment is analyzed with the first voice model to yield a second confidence value. Based at least on the first and second confidence values, the method determines if a speaker of the first sub-fragment is the speaker of the second sub-fragment.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification