AUTOMATIC GENERATION AND/OR USE OF TEXT-DEPENDENT SPEAKER VERIFICATION FEATURES

    公开(公告)号:EP4407438A2

    公开(公告)日:2024-07-31

    申请号:EP24182951.4

    申请日:2020-12-15

    申请人: Google LLC

    IPC分类号: G06F3/16

    摘要: Implementations relate to automatic generation of speaker features for each of one or more particular text-dependent speaker verifications (TD-SVs) for a user. Implementations can generate speaker features for a particular TD-SV using instances of audio data that each capture a corresponding spoken utterance of the user during normal non-enrollment interactions with an automated assistant via one or more respective assistant devices. For example, a portion of an instance of audio data can be used in response to: (a) determining that recognized term(s) for the spoken utterance captured by that the portion correspond to the particular TD-SV; and (b) determining that an authentication measure, for the user and for the spoken utterance, satisfies a threshold. Implementations additionally or alternatively relate to utilization of speaker features, for each of one or more particular TD-SVs for a user, in determining whether to authenticate a spoken utterance for the user.

    SMART HOME APPLIANCES, OPERATING METHOD OF THEREOF, AND VOICE RECOGNITION SYSTEM USING THE SMART HOME APPLIANCES

    公开(公告)号:EP4387174A3

    公开(公告)日:2024-07-17

    申请号:EP24168136.0

    申请日:2014-11-04

    摘要: Provided is a smart home appliance. The smart home appliance includes: a voice input unit collecting a voice; a voice recognition unit recognizing a text corresponding to the voice collected through the voice input unit; a capturing unit collecting an image for detecting a user's visage; a memory unit mapping the text recognized by the voice recognition unit and a setting function and storing the mapped information, and storing a keyword information that a user may input to start a voice recognition service; a control unit determining whether to perform a voice recognition service on the basis of at least one information of image information collected by the capturing unit and voice information collected by the voice input unit; a region recognition unit determining a user's region on the basis of information on the voice collected through the voice input unit; an output unit outputting region customized information on the basis of information on a region determined by the region recognition unit and information on the setting function, and wherein the control unit comprises a face detection unit recognizing that a user is in a staring state for voice input when image information on a user's visage is collected for more than a setting time through the capturing unit and, wherein the control unit determines that a voice recognition service standby state is entered when it is recognized that there is the keyword information in a voice through the voice input unit and a user is in the staring state through the face detection unit.

    TRANSIENT PERSONALIZATION MODE FOR GUEST USERS OF AN AUTOMATED ASSISTANT

    公开(公告)号:EP4443426A2

    公开(公告)日:2024-10-09

    申请号:EP24195747.1

    申请日:2020-12-14

    申请人: Google LLC

    IPC分类号: G10L15/22

    摘要: Implementations set forth herein relate to an automated assistant that can operate in a transient personalization mode, and/or assist a separate automated assistant with providing output according to a transient personalization mode. The transient personalization mode can allow a guest user of an assistant enabled-device to receive personalized responses from the assistant-enabled device-despite not being signed into the assistant-enabled device. A host automated assistant of the assistant-enabled device can securely communicate with a guest user's automated assistant through a backend process. In this way, input queries from the guest user to the host automated assistant can be personalized according to the guest automated assistant-without the guest user directly engaging with their own personal device.

    ELECTRONIC DEVICE AND METHOD FOR CONTROLLING THE SAME

    公开(公告)号:EP4418263A2

    公开(公告)日:2024-08-21

    申请号:EP24187314.0

    申请日:2019-10-23

    IPC分类号: G10L15/183

    摘要: An electronic device is provided. The electronic device includes a communicator, a memory configured to include at least one instruction, and a processor configured to execute the at least one instruction, wherein the processor is configured to: receive first audio signal of a user speech through the communicator from a first external device, control the communicator to transmit a control signal, to a second external device, for receiving second audio signal of the user speech from the second external device located in a movement direction of a user based on a movement of the user being detected by using the received first audio signal, receive the second audio signal through the communicator from the second external device, align the received first audio signal and the received second audio signal so that a first time at which the first audio signal is received and a second time at which the second audio signal is received correspond to each other, match the received first audio signal and the received second audio signal by compare the aligned first audio signal and the aligned second audio signal, and obtain a speech recognition result based on the matching.

    CONTEXTUAL DENORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:EP4375871A2

    公开(公告)日:2024-05-29

    申请号:EP24170370.1

    申请日:2019-09-03

    申请人: GOOGLE LLC

    IPC分类号: G06F40/35

    摘要: A method (600) includes receiving a speech input (104) from a user and obtaining context metadata (110) associated with the speech input. The method also includes generating a raw speech recognition result (312) corresponding to the speech input and selecting a list of one or more denormalizers (352) to apply to the generated raw speech recognition result based on the context metadata associated with the speech input. The generated raw speech recognition result includes normalized text. The method also includes denormalizing the generated raw speech recognition result into denormalized text (322) by applying the list of the one or more denormalizers in sequence to the generated raw speech recognition result.