-
公开(公告)号:US11790921B2
公开(公告)日:2023-10-17
申请号:US17169843
申请日:2021-02-08
申请人: OTO Systems Inc.
IPC分类号: G10L17/06 , G10L17/02 , G10L17/04 , G10L17/18 , G06N3/04 , G06N3/08 , G06N3/049 , G10L21/0272 , G06N3/045
CPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
摘要: Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.
-
公开(公告)号:US20240153509A1
公开(公告)日:2024-05-09
申请号:US18368459
申请日:2023-09-14
申请人: OTO Systems Inc.
IPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
CPC分类号: G10L17/06 , G06N3/045 , G06N3/049 , G06N3/08 , G10L17/02 , G10L17/04 , G10L17/18 , G10L21/0272
摘要: Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.
-
3.
公开(公告)号:US11646037B2
公开(公告)日:2023-05-09
申请号:US17115382
申请日:2020-12-08
申请人: OTO Systems Inc.
摘要: Systems, methods, and non-transitory computer-readable media can provide audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation. The temporal convolutional network can pre-process the audio waveform data and can output an identity embedding associated with the audio waveform data. The identity embedding associated with the voice sample can be obtained from the temporal convolutional network. Information describing a speaker associated with the voice sample can be determined based at least in part on the identity embedding.
-
4.
公开(公告)号:US20230352031A1
公开(公告)日:2023-11-02
申请号:US18129789
申请日:2023-03-31
申请人: OTO Systems Inc.
摘要: Systems, methods, and non-transitory computer-readable media can provide audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation. The temporal convolutional network can pre-process the audio waveform data and can output an identity embedding associated with the audio waveform data. The identity embedding associated with the voice sample can be obtained from the temporal convolutional network. Information describing a speaker associated with the voice sample can be determined based at least in part on the identity embedding.
-
-
-