专利检索 ap:("OTO Systems Inc.") AND inv:"Nicolas Lucien Perony" 第 1 页

1.

发明授权
Speaker separation based on real-time latent speaker state characterization 有权

公开(公告)号：US11790921B2

公开(公告)日：2023-10-17

申请号：US17169843

申请日：2021-02-08

申请人： OTO Systems Inc.

发明人： Valentin Alain Jean Perret , Nándor Kedves , Nicolas Lucien Perony

IPC分类号： G10L17/06 , G10L17/02 , G10L17/04 , G10L17/18 , G06N3/04 , G06N3/08 , G06N3/049 , G10L21/0272 , G06N3/045

CPC分类号： G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272

摘要： Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.

2.

发明公开
SPEAKER SEPARATION BASED ON REAL-TIME LATENT SPEAKER STATE CHARACTERIZATION 审中-公开

公开(公告)号：US20240153509A1

公开(公告)日：2024-05-09

申请号：US18368459

申请日：2023-09-14

申请人： OTO Systems Inc.

发明人： Valentin Alain Jean Perret , Nándor Kedves , Nicolas Lucien Perony

IPC分类号： G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272

CPC分类号： G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272

摘要： Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.

3.

发明授权
Sample-efficient representation learning for real-time latent speaker state characterization 有权

公开(公告)号：US11646037B2

公开(公告)日：2023-05-09

申请号：US17115382

申请日：2020-12-08

申请人： OTO Systems Inc.

发明人： Valentin Alain Jean Perret , Nicolas Lucien Perony , Nándor Kedves

IPC分类号： G10L17/18 , G10L17/02 , G06N3/04 , G06N3/08 , G06N3/049 , G06N3/045 , G06N3/048 , G10L17/08

CPC分类号： G10L17/18 , G06N3/045 , G06N3/048 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/08

摘要： Systems, methods, and non-transitory computer-readable media can provide audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation. The temporal convolutional network can pre-process the audio waveform data and can output an identity embedding associated with the audio waveform data. The identity embedding associated with the voice sample can be obtained from the temporal convolutional network. Information describing a speaker associated with the voice sample can be determined based at least in part on the identity embedding.

4.

发明公开
SAMPLE-EFFICIENT REPRESENTATION LEARNING FOR REAL-TIME LATENT SPEAKER STATE CHARACTERISATION 审中-公开

公开(公告)号：US20230352031A1

公开(公告)日：2023-11-02

申请号：US18129789

申请日：2023-03-31

申请人： OTO Systems Inc.

发明人： Valentin Alain Jean Perret , Nicolas Lucien Perony , Nándor Kedves

IPC分类号： G10L17/18 , G10L17/02 , G06N3/049 , G06N3/08 , G06N3/045 , G06N3/048

CPC分类号： G10L17/18 , G10L17/02 , G06N3/049 , G06N3/08 , G06N3/045 , G06N3/048 , G10L17/08

摘要： Systems, methods, and non-transitory computer-readable media can provide audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation. The temporal convolutional network can pre-process the audio waveform data and can output an identity embedding associated with the audio waveform data. The identity embedding associated with the voice sample can be obtained from the temporal convolutional network. Information describing a speaker associated with the voice sample can be determined based at least in part on the identity embedding.