-
公开(公告)号:US20240194206A1
公开(公告)日:2024-06-13
申请号:US18438225
申请日:2024-02-09
发明人: Oleksandra SOKOL , Dmytro PROGONOV , Heorhii NAUMENKO , Kostiantyn VOLOBUIEV , Vasyl KUZNETSOV , Viacheslav DERKACH
摘要: An electronic device includes a microphone, and at least one processor configured to, based on receiving voice data through the microphone, input the voice data into a non-semantic feature extractor model and acquire a non-semantic feature included in the voice data using the non-semantic feature extractor model, input the non-semantic feature into a synthetic voice classifier model and classify the voice data into a synthetic voice or a user voice the synthetic voice classifier model, and provide a result of the classification, and the synthetic voice classifier model is a model that is transfer-learned based on the non-semantic feature extractor model.
-
公开(公告)号:US12008996B2
公开(公告)日:2024-06-11
申请号:US17717326
申请日:2022-04-11
CPC分类号: G10L17/00 , G10L17/24 , G10L17/26 , H04M1/271 , H04M3/533 , H04M2203/6027 , H04M2250/74
摘要: A system, method and computer-readable storage device are disclosed signing a voicemail and confirming an identity of the speaker. A method includes receiving a request to verify a speaker associated with a communication to a recipient, receiving first data from the speaker in connection with the communication, accessing second data associated with the speaker to verify the speaker, determining whether a match exists between the first data and the second data to yield a determination, retrieving a communication address of the recipient, generating a notification for the recipient, wherein the notification reports on the determination and transmitting the notification to the recipient at the communication address.
-
公开(公告)号:US20240187735A1
公开(公告)日:2024-06-06
申请号:US18488444
申请日:2023-10-17
发明人: Yan LONG , Kevin FU , Kevin BUTLER , Sara RAMPAZZI , Pirouz NAGHAVI
IPC分类号: H04N23/68 , G10L17/04 , G10L17/26 , H04N23/611
CPC分类号: H04N23/687 , G10L17/04 , G10L17/26 , H04N23/611 , H04N23/6812 , H04N23/689
摘要: Rolling shutter and movable lens structures widely found in smartphone cameras modulate structure-borne sounds onto camera images, creating a point-of-view optical-acoustic side channel for acoustic eavesdropping. The movement of smartphone camera hardware leaks acoustic information because images unwittingly modulate ambient sound as imperceptible distortions. Experiments have found that the side channel is further amplified by intrinsic behaviors of complementary metal-oxide-semiconductor (CMOS) rolling shutters and movable lenses such as in optical image stabilization (OIS) and auto focus (AF). This disclosure characterizes the limits of acoustic information leakage caused by structure-borne sound that perturbs the point-of-view of smartphone cameras. In contrast with traditional optical-acoustic eavesdropping on vibrating objects, this side channel requires no line of sight and no object within the camera's field of view.
-
公开(公告)号:US11944437B2
公开(公告)日:2024-04-02
申请号:US17965536
申请日:2022-10-13
CPC分类号: A61B5/165 , A61B5/0022 , A61B5/4803 , A61B5/6887 , G10L15/1822 , G10L17/26 , G16H40/67 , A61B2503/12 , G10L25/63 , G10L25/90
摘要: According to some aspects, disclosed methods and systems may include having a user input one or more speech commands into an input device of a user device. The user device may communicate with one or more components or devices at a local office or headend. The local office or the user device may transcribe the speech commands into language transcriptions. The local office or the user device may determine a mood for the user based on whether any of the speech commands may have been repeated. The local office or the user device may determine, based on the mood of the user, which content asset or content service to make available to the user device.
-
公开(公告)号:US20240086856A1
公开(公告)日:2024-03-14
申请号:US15978622
申请日:2018-05-14
发明人: John C. Brenner , Donna Marie Greene , Carter C. Hansen , Tambra Nichols , Elizabeth Long Swindler , Keith Meade Sykes
CPC分类号: G06Q10/1093 , G06F3/011 , G10L17/26 , G10L25/63 , G06F2203/011
摘要: A computer-implemented system and method are provided for adaptively controlling communication activity of a communication system. The method stores user information comprising contact preferences, a goal, and account information. The system may receive external information by a trigger engine which applies a trigger rule to determine a degree of relatedness between the goal and at least one of external information and the account information. When the degree of relatedness exceeds a threshold, then the contact preferences may be used to format a communication related to the external information based on the contact preferences. The system may then send the communication to the user by the communication entity according to the contact preferences, and, based on a feedback or lack of feedback from the user related to the communication, adjust at least one of the contact preferences, the trigger rule, or a weighting factor of the trigger rule.
-
公开(公告)号:US11922953B2
公开(公告)日:2024-03-05
申请号:US17414194
申请日:2018-12-18
发明人: Hideo Omura
摘要: A voice analyzer analyzes whether a voice signal input into a voice input unit includes a specific characteristic component. A voice recognizer recognizes a voice represented by the voice signal input into the voice input unit. A response instruction unit instructs a response to a response operation unit that operates in response to the voice recognized by the voice recognizer. A controller controls the voice recognizer not to execute voice recognition processing by the voice recognizer or controls the response instruction unit not to instruct the response operation unit about an instruction content by the voice recognized by the voice recognizer, when the voice analyzer analyzes that the voice signal includes the specific characteristic component.
-
公开(公告)号:US11922356B1
公开(公告)日:2024-03-05
申请号:US16667366
申请日:2019-10-29
申请人: Snap Inc.
IPC分类号: G06Q10/0639 , G06V40/16 , G10L25/63 , H04N21/4402 , G10L15/22 , G10L17/26 , H04N7/15 , H04N21/442 , H04N21/4788
CPC分类号: G06Q10/06395 , G06Q10/06393 , G06V40/174 , H04N21/440218 , G10L2015/227 , G10L17/26 , G10L25/63 , H04N7/15 , H04N21/44218 , H04N21/4788
摘要: Methods and systems for videoconferencing include generating work quality metrics based on emotion recognition of an individual such as a call center agent. The work quality metrics allow for workforce optimization. One example method includes the steps of receiving a video including a sequence of images, detecting an individual in one or more of the images, locating feature reference points of the individual, aligning a virtual face mesh to the individual in one or more of the images based at least in part on the feature reference points, dynamically determining over the sequence of images at least one deformation of the virtual face mesh, determining that the at least one deformation refers to at least one facial emotion selected from a plurality of reference facial emotions, and generating quality metrics including at least one work quality parameter associated with the individual based on the at least one facial emotion.
-
公开(公告)号:US20240029743A1
公开(公告)日:2024-01-25
申请号:US18206231
申请日:2023-06-06
发明人: Stanislaw Ignacy Pasko , Pawel Zelazko , Cagdas Bak , Eli Joshua Fidler , Michal Kowalczuk , Andrew Oberlin , Ariya Rastrow
IPC分类号: G10L17/26 , G10L15/183 , G10L15/34 , G10L15/22
CPC分类号: G10L17/26 , G10L15/183 , G10L15/34 , G10L15/22
摘要: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
-
公开(公告)号:US11875777B2
公开(公告)日:2024-01-16
申请号:US17698601
申请日:2022-03-18
申请人: YAMAHA CORPORATION
发明人: Ryunosuke Daido
IPC分类号: G10L21/00 , G10L13/00 , G10L13/06 , G10L17/26 , G10L25/48 , G10L13/033 , G10L13/047 , G10L25/18
CPC分类号: G10L13/0335 , G10L13/047 , G10L25/18
摘要: An information processing device includes a memory storing instructions, and a processor configured to implement the stored instructions to execute a plurality of tasks. The tasks includes: a first generating task that generates a series of fluctuations of a target sound based on first control data of the target sound to be synthesized, using a first model trained to have an ability to estimate a series of fluctuations of the target sound based on first control data of the target sound, and a second generating task that generates a series of features of the target sound based on second control data of the target sound and the generated series of fluctuations of the target sound, using a second model trained to estimate a series of features of the target sound based on second control data of the target sound and a series of fluctuations of the target sound.
-
公开(公告)号:US20230410222A1
公开(公告)日:2023-12-21
申请号:US18240209
申请日:2023-08-30
申请人: NEC Corporation
发明人: Masahiro Tani , Kazufumi KOJIMA
IPC分类号: G06Q50/00 , G06F16/9536 , G10L17/26 , G06V10/75 , G06F18/22
CPC分类号: G06Q50/01 , G06F16/9536 , G10L17/26 , G06V10/758 , G06F18/22
摘要: An information processing apparatus (2000) determines whether a content (30-1) of a relevant account (20-1) associated with a target account (10-1) and a content (30-2) of a relevant account (20-2) associated with a target account (10-2) are similar. When the content (30-1) and the content (30-2) are similar, the information processing apparatus (2000) executes predetermined processing related to the target account (10-1) and the target account (10-2).
-
-
-
-
-
-
-
-
-