Patent search cpc:"G10L2015/025" Page 3

21.

发明公开
SYSTEMS, METHODS, DEVICES AND APPARATUSES FOR DETECTING FACIAL EXPRESSION 审中-公开

公开(公告)号：US20230418380A1

公开(公告)日：2023-12-28

申请号：US18317058

申请日：2023-05-13

Applicant: MindMaze Group SA

Inventor： Tej TADI , Robert LEEB , Nicolas BOURDAUD , Gangadhar GARIPELLI , Skander MENSI , Nicolas MERLINI , Yann LEBRUN

IPC: G06F3/01 , G06V40/16 , G06F18/245 , G06F18/2132 , G06F18/2453 , G06F18/2415

CPC classification number: G06F3/015 , G06V40/174 , G06V40/176 , G06F18/245 , G06F18/2132 , G06F18/2453 , G06F18/24155 , G10L2015/025

Abstract: A system, method and apparatus for detecting facial expressions according to EMG signals.

22.

发明公开
PHONEME-BASED TEXT TRANSCRIPTION SEARCHING 审中-公开

公开(公告)号：US20230386472A1

公开(公告)日：2023-11-30

申请号：US17804508

申请日：2022-05-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yuchen LI

IPC: G10L15/26 , G10L15/02 , G06F40/284 , G10L15/187

CPC classification number: G10L15/26 , G10L15/02 , G06F40/284 , G10L15/187 , G10L2015/025

Abstract: A computer-implemented method is disclosed. A search query of a text transcription is received. The search query includes a word or words having a specified spelling. A sequence of search phonemes corresponding to the specified spelling is generated. A sequence of transcript phonemes corresponding to the text transcription is generated from the text transcription. A search alignment in which the sequence of search phonemes is aligned to a transcript phoneme fragment is generated. Based at least on the search alignment having a quality score exceeding a quality score threshold, the transcript phoneme fragment and an associated portion of the text transcription is determined to result from an utterance of the specified spelling in an audio session corresponding to the text transcription. A search result indicating that the transcript phoneme fragment and the associated portion of the text transcription is determined to have resulted from the utterance is output.

23.

发明公开
UNIFIED SPEECH REPRESENTATION LEARNING 审中-公开

公开(公告)号：US20230368782A1

公开(公告)日：2023-11-16

申请号：US18217888

申请日：2023-07-03

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yao QIAN , Yu WU , Kenichi KUMATANI , Shujie LIU , Furu WEI , Nanshan ZENG , Xuedong David HUANG , Chengyi WANG

IPC: G10L15/187 , G10L15/22 , G10L15/06 , G10L15/02 , G06N20/00

CPC classification number: G10L15/187 , G10L15/22 , G10L15/063 , G10L15/02 , G06N20/00 , G10L2015/025

Abstract: Systems and methods are provided for training a machine learning model to learn speech representations. Labeled speech data or both labeled and unlabeled data sets is applied to a feature extractor of a machine learning model to generate latent speech representations. The latent speech representations are applied to a quantizer to generate quantized latent speech representations and to a transformer context network to generate contextual representations. Each contextual representation included in the contextual representations is aligned with a phoneme label to generate phonetically-aware contextual representations. Quantized latent representations are aligned with phoneme labels to generate phonetically aware latent speech representations. Systems and methods also include randomly replacing a sub-set of the contextual representations with quantized latent speech representations during their alignments to phoneme labels and aligning the phonetically aware latent speech representations to the contextual representations using supervised learning.

24.

发明授权
Entity resolution using acoustic data 有权

公开(公告)号：US11817090B1

公开(公告)日：2023-11-14

申请号：US16712394

申请日：2019-12-12

Applicant: Amazon Technologies, Inc.

Inventor： James Claiborne Moore , Majid Laali , Yasser Gonzalez Fernandez , Siyong Liang , Ameya Ashok Limaye

IPC: G10L15/187 , G10L15/22 , G10L15/02 , G10L15/18

CPC classification number: G10L15/187 , G10L15/02 , G10L15/1815 , G10L15/22 , G10L2015/025 , G10L2015/223

Abstract: A phonetic search system may pass phonetic information from an automatic speech recognition (ASR) system to a natural language understanding (NLU) system for the latter to leverage when performing entity resolution in the presence of ambiguous interpretations. The ASR system may include an acoustic model and a language model. The acoustic model can process audio data to generate hypotheses that can be mapped to acoustic data; i.e., one or more acoustic units such as phonemes. The language model can process the acoustic units to generate text data representing possible transcriptions of the audio data. ASR/NLU systems may have difficulty interpreting speech when confronted with, for example, homographs, which are words that are spelled the same, but have different meanings. When uncertainty in the final transcription is high, the system can leverage the acoustic data to improve the accuracy of entity resolution.

25.

发明授权
Music cover identification with lyrics for search, compliance, and licensing 有权

公开(公告)号：US11816151B2

公开(公告)日：2023-11-14

申请号：US16875927

申请日：2020-05-15

Applicant: Audible Magic Corporation

Inventor： Erling Wold

IPC: G10L15/02 , G10L15/187 , G10L15/32 , G10L25/90 , G06F16/683

CPC classification number: G06F16/685 , G10L15/02 , G10L15/187 , G10L15/32 , G10L25/90 , G10L2015/025

Abstract: Embodiments cover identifying an unidentified media content item as a cover of a known media content item using lyrical contents. In an example, a processing device receives an unidentified media content item and determines lyrical content associated with the unidentified media content item. The processing device then determines a lyrical similarity between the lyrical content associated with the unidentified media content item and additional lyrical content associated with a known media content item of a plurality of known media content items. The processing device then identifies the unidentified media content item as a cover of the known media content item based at least in part on the lyrical similarity, resulting in an identified cover-media content item.

26.

发明授权
Phoneme-based speaker model adaptation method and device 有权

公开(公告)号：US11804228B2

公开(公告)日：2023-10-31

申请号：US17273542

申请日：2019-08-09

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Chisang Jung

IPC: G10L17/08 , G10L17/02 , G10L17/04 , G10L15/02 , G10L15/26 , G10L15/04

CPC classification number: G10L17/08 , G10L15/02 , G10L15/26 , G10L17/02 , G10L17/04 , G10L15/04 , G10L2015/025

Abstract: The present disclosure relates to a speaker model adaptation method and device for enhancing text-independent speaker recognition performance. Specifically, the disclosure relates to a method and a device whereby, for the adaption of a speaker model pre-stored in an electronic device, text-independent speaker recognition performance is improved by considering variations in the amount of speaker characteristics information per phoneme unit.

27.

发明公开
COMPUTERIZED DECISION SUPPORT TOOL AND MEDICAL DEVICE FOR RESPIRATORY CONDITION MONITORING AND CARE 审中-公开

公开(公告)号：US20230329630A1

公开(公告)日：2023-10-19

申请号：US18043271

申请日：2021-08-30

Applicant: PFIZER INC.

Inventor： Shyamal Patel , Paul William Wacnik , Kara Chappie , Robert Mather , Brian Tracey , Maria del Mar Santamaria Serra

IPC: A61B5/00 , A61B5/08 , A61B7/00 , A61K31/675 , A61K38/06 , G10L15/02 , G10L25/66 , G10L15/26 , G10L15/22 , G16H10/60 , G16H40/20 , G16H20/10 , G16H50/30

CPC classification number: A61B5/4803 , A61B5/08 , A61B5/7275 , A61B5/4839 , A61B5/4848 , A61B5/4842 , A61B7/003 , A61B5/7278 , A61K31/675 , A61K38/06 , G10L15/02 , G10L25/66 , G10L15/26 , G10L15/22 , G16H10/60 , G16H40/20 , G16H20/10 , G16H50/30 , G10L2015/025

Abstract: Technology is disclosed for monitoring a user's respirator), condition and provide decision support by analyzing a user's audio data. Spoken phonemes may be detected within audio data, and acoustic features may be extracted for the phonemes. A distance metric may be computed to compare phoneme feature sets of a user. Based on the comparison, a determination about the user's respiratory condition, such as whether the user has a respiratory condition (e.g., an infection) and/or whether the condition is changing, may be made. Some aspects include predicting the user's respiratory condition in the future utilizing the phoneme feature sets. Decision support tools in the form of computer applications or services may utilize the detected or predicted respiratory condition information to initiate an action for treating a current condition or mitigating a future risk.

28.

发明公开
SYSTEMS AND METHODS FOR ADAPTIVE PROPER NAME ENTITY RECOGNITION AND UNDERSTANDING 审中-公开

公开(公告)号：US20230317079A1

公开(公告)日：2023-10-05

申请号：US18329787

申请日：2023-06-06

Applicant: Promptu Systems Corporation

Inventor： Harry William Printz

IPC: G10L15/22 , G10L15/18 , G10L15/02 , G10L15/16 , G10L15/19 , G01C21/36 , G10L15/32 , G06F40/295

CPC classification number: G10L15/22 , G10L15/1815 , G10L15/02 , G10L15/16 , G10L15/19 , G01C21/3608 , G10L15/32 , G06F40/295 , G10L2015/025 , G06F3/167

Abstract: Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question, e.g. words that are not part of the proper name entities, may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.

29.

发明公开
End-to-End Streaming Keyword Spotting 审中-公开

公开(公告)号：US20230298576A1

公开(公告)日：2023-09-21

申请号：US18322207

申请日：2023-05-23

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

30.

发明公开
Scalable Model Specialization Framework for Speech Model Personalization 审中-公开

公开(公告)号：US20230298574A1

公开(公告)日：2023-09-21

申请号：US18184630

申请日：2023-03-15

Applicant: Google LLC

Inventor： Fadi Biadsy , Youzheng Chen , Xia Zhang , Oleg Rybakov , Andrew M. Rosenberg , Pedro J.Moreno Mengibar

IPC: G10L15/16 , G10L15/06 , G10L15/02

CPC classification number: G10L15/16 , G10L15/063 , G10L15/02 , G10L2015/025

Abstract: A method for speech conversion includes obtaining a speech conversion model configured to convert input utterances of human speech directly into corresponding output utterances of synthesized speech. The method further includes receiving a speech conversion request including input audio data corresponding to an utterance spoken by a target speaker associated with atypical speech and a speaker identifier uniquely identifying the target speaker. The method includes activating, using the speaker identifier, a particular sub-model for biasing the speech conversion model to recognize a type of the atypical speech associated with the target speaker identified by the speaker identifier. The method includes converting, using the speech conversion model biased by the activated particular sub-model, the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into output audio data corresponding to a synthesized canonical fluent speech representation of the utterance spoken by the target speaker.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification