-
公开(公告)号:US11736769B2
公开(公告)日:2023-08-22
申请号:US17228438
申请日:2021-04-12
Applicant: SoundHound, Inc.
Inventor: Thor S. Khov , Terry Kong
IPC: H04N21/454 , H04N21/44 , H04N21/466 , G06V20/40 , G06N3/045
CPC classification number: H04N21/4542 , G06N3/045 , G06V20/46 , H04N21/44008 , H04N21/4665
Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.
-
公开(公告)号:US11295732B2
公开(公告)日:2022-04-05
申请号:US16529730
申请日:2019-08-01
Applicant: SoundHound, Inc.
Inventor: Steffen Holm , Terry Kong , Kiran Garaga Lokeswarappa
IPC: G10L15/197 , G10L15/02 , G10L15/22 , G10L15/16 , G10L15/18
Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.
-
公开(公告)号:US12126868B2
公开(公告)日:2024-10-22
申请号:US18348249
申请日:2023-07-06
Applicant: SoundHound, Inc.
Inventor: Thor S. Khov , Terry Kong
IPC: H04N21/454 , G06N3/045 , G06V20/40 , H04N21/44 , H04N21/466
CPC classification number: H04N21/4542 , G06N3/045 , G06V20/46 , H04N21/44008 , H04N21/4665
Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.
-
公开(公告)号:US20210035569A1
公开(公告)日:2021-02-04
申请号:US16529730
申请日:2019-08-01
Applicant: SoundHound, Inc.
Inventor: Steffen Holm , Terry Kong , Kiran Garaga Lokeswarappa
IPC: G10L15/197 , G10L15/02 , G10L15/18 , G10L15/22 , G10L15/16
Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.
-
公开(公告)号:US10796107B2
公开(公告)日:2020-10-06
申请号:US16232984
申请日:2018-12-26
Applicant: SoundHound, Inc.
Inventor: Terry Kong
IPC: G06F40/216 , G06F40/58 , G06K9/62 , G06F40/295
Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.
-
-
-
-