Abstract:
본 개시의 실시예에 따른 전자 장치는 메모리에 저장된 하나 이상의 인스트럭션을 실행함으로써, 수신부를 통해 음성 신호를 수신하도록 제어하고, 수신된 음성 신호에 서로 다른 복수 화자의 음성신호가 포함되어 있는지를 판단하고, 수신된 음성 신호에 서로 다른 복수 화자의 음성 신호가 포함되어 있으면, 각 화자의 음성 신호로부터 특징 정보를 검출하고, 검출된 특징 정보에 기초하여 서로 다른 화자의 발화 내용간의 관계를 판단하고, 판단된 발화 내용간의 관계에 기초하여 대응 방식을 결정하고, 결정된 대응 방식에 따라 전자 장치의 동작이 수행되도록 전자 장치를 제어하는 프로세서를 포함한다.
Abstract:
An electronic apparatus is provided. The electronic apparatus includes a communicator comprising communication circuitry configured to communicate with a voice recognition server; and a processor configured to control the communicator to establish a session with the voice recognition server, based on a voice input start signal being received from a first external apparatus, to maintain the established session based on the voice input start signal being received from a second external apparatus in a state where the session is established, and to process voice recognition on audio data received from the second external apparatus using the maintained session.
Abstract:
Systems, methods, devices, and other techniques for training and using a speaker verification neural network. A computing device may receive data that characterizes a first utterance. The computing device provides the data that characterizes the utterance to a speaker verification neural network. Subsequently, the computing device obtains, from the speaker verification neural network, a speaker representation that indicates speaking characteristics of a speaker of the first utterance. The computing device determines whether the first utterance is classified as an utterance of a registered user of the computing device. In response to determining that the first utterance is classified as an utterance of the registered user of the computing device, the device may perform an action for the registered user of the computing device.
Abstract:
Disclosed herein are embodiments of systems and methods for zero-knowledge multiparty secure sharing of voiceprints. In an embodiment, an illustrative computer may receive, through a remote server, a plurality of encrypted voiceprints. When the computer receives an incoming call, the computer may generate a plaintext i-vector of the incoming call. Using the plaintext i- vector and the encrypted voiceprints, the computer may generate one or more encrypted comparison models. The remote server may decrypt the encrypted comparison model to generate similarity scores between the plaintext i-vector and the plurality of encrypted voiceprints.
Abstract:
A method of speaker authentication comprises: receiving a speech signal; dividing the speech signal into segments; and, following each segment, obtaining an authentication score based on said segment and previously received segments, wherein the authentication score represents a probability that the speech signal comes from a specific registered speaker. In response to an authentication request, an authentication result is output based on the authentication score.
Abstract:
The present disclosure relates to methods for facilitating a transaction. The methods involve interaction with a humanoid robot and include using the humanoid robot to receive a session initiation instruction from a user. During the session one or more articles are identified for purchase, each article being a product or service. A checkout sequence is then initiated to purchase the one or more articles. At some stage during performance of the above steps, beforehand or afterwards, the user is authenticated via interaction with the humanoid robot. [FIG. 1]
Abstract:
Systems and methods of providing text related to utterances, and gathering voice data in response to the text are provide herein. In various implementations, an identification token that identifies a first file for a voice data collection campaign, and a second file for a session script may be received from a natural language processing training device. The first file and the second file may be used to configure the mobile application to display a sequence of screens, each of the sequence of screens containing text of at least one utterance specified in the voice data collection campaign. Voice data may be received from the natural language processing training device in response to user interaction with the text of the at least one utterance. The voice data and the text may be stored in a transcription library.
Abstract:
In some implementations, user input is received while a form that includes text entry fields is being accessed. In one aspect, a process may include mapping user input to fields of a form and populating the fields of the form with the appropriate information. This process may allow a user to fill out a form using speech input, by generating a transcription of input speech, determining a field that best corresponds to each portion of the speech, and populating each field with the appropriate information. In some examples, the processes described herein may reduce the load on user input components, may reduce overall power consumption and may reduce a cognitive burden on the user.