摘要:
Systems, methods, devices, and other techniques for training and using a speaker verification neural network. A computing device may receive data that characterizes a first utterance. The computing device provides the data that characterizes the utterance to a speaker verification neural network. Subsequently, the computing device obtains, from the speaker verification neural network, a speaker representation that indicates speaking characteristics of a speaker of the first utterance. The computing device determines whether the first utterance is classified as an utterance of a registered user of the computing device. In response to determining that the first utterance is classified as an utterance of the registered user of the computing device, the device may perform an action for the registered user of the computing device.
摘要:
A method of speaker authentication comprises: receiving a speech signal; dividing the speech signal into segments; and, following each segment, obtaining an authentication score based on said segment and previously received segments, wherein the authentication score represents a probability that the speech signal comes from a specific registered speaker. In response to an authentication request, an authentication result is output based on the authentication score.
摘要:
A method, performed in an electronic device, for connecting to a target device is disclosed. The method includes capturing an image including a face of a target person associated with the target device and recognizing an indication of the target person. The indication of the target person may be a pointing object, a speech command, and/or any suitable input command. The face of the target person in the image is detected based on the indication and at least one facial feature of the face in the image is extracted. Based on the at least one facial feature, the electronic device is connected to the target device.
摘要:
A system and method of creating textual content from audio streams is present. The system can include a computing device configured to receive audio streams containing speech and identify the different speakers in the speech. The system breaks apart an audio stream into separate audio streams using speaker diarization and each audio stream is sent separately to a speech-to-text transcriber. Each audio stream includes only the speech of a single speaker, which is more easily converted into text by the speech-to-text transcriber. The text streams can be assembled into a transcript of the speech portions of the audio stream. A web page of the transcript can be published. High frequency words in the transcript can be tagged in the metadata of the web page to assist search engines and increase the value of the web page.
摘要:
A speaker recognition system for authenticating a mobile device user includes an enrollment and learning software module, a voice biometric authentication software module, and a secure software application. Upon request by a user of the mobile device, the enrollment and learning software module displays text prompts to the user, receives speech utterances from the user, and produces a voice biometric print. The enrollment and training software module determines when a voice biometric print has met at least a quality threshold before storing it on the mobile device. The secure software application prompts a user requiring authentication to repeat an utterance based at least on an attribute of a selected voice biometric print, receives a corresponding utterance, requests the voice biometric authentication software module to verify the identity of the second user using the utterance, and, if the user is authenticated, imports the voice biometric print.
摘要:
A method, client application and user terminal, the method comprising: providing a packet-based communication system for conducting voice or video calls over a packet-based network; and providing an instance of a client application enabling a first user terminal to access the packet-based communication system. The client application is configured so as when executed on the first terminal to receive an input from multiple different audio and/or video input transducers of the first terminal, to analyse those inputs in relation to one another, and based on that analysis to select at least one audio and/or video input transducer and/or output transducer of the first terminal for use in conducting a voice or video call with a remote user terminal via the packet-based communication system.