SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATABILITY
Abstract:
Embodiments include a computer executing voice biometric machine-learning for speaker recognition. The machine-learning architecture includes embedding extractors that extract embeddings for enrollment or for verifying inbound speakers, and embedding convertors that convert enrollment voiceprints from a first type of embedding to a second type of embedding. The embedding convertor maps the feature vector space of the first type of embedding to the feature vector space of the second type of embedding. The embedding convertor takes as input enrollment embeddings of the first type of embedding and generates as output converted enrolled embeddings that are aggregated into a converted enrolled voiceprint of the second type of embedding. To verify an inbound speaker, a second embedding extractor generates an inbound voiceprint of the second type of embedding, and scoring layers determine a similarity between the inbound voiceprint and the converted enrolled voiceprint, both of which are the second type of embedding.
Information query
Patent Agency Ranking
0/0