- 专利标题: Speaker separation based on real-time latent speaker state characterization
-
申请号: US17169843申请日: 2021-02-08
-
公开(公告)号: US11790921B2公开(公告)日: 2023-10-17
- 发明人: Valentin Alain Jean Perret , Nándor Kedves , Nicolas Lucien Perony
- 申请人: OTO Systems Inc.
- 申请人地址: US NY New York
- 专利权人: OTO Systems Inc.
- 当前专利权人: OTO Systems Inc.
- 当前专利权人地址: US NY New York
- 代理机构: Schwegman Lundberg & Woessner, P.A.
- 主分类号: G10L17/06
- IPC分类号: G10L17/06 ; G10L17/02 ; G10L17/04 ; G10L17/18 ; G06N3/04 ; G06N3/08 ; G06N3/049 ; G10L21/0272 ; G06N3/045
摘要:
Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.
信息查询