-
公开(公告)号:US20230368796A1
公开(公告)日:2023-11-16
申请号:US18324440
申请日:2023-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US20240428797A1
公开(公告)日:2024-12-26
申请号:US18823198
申请日:2024-09-03
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US11335346B1
公开(公告)日:2022-05-17
申请号:US16215061
申请日:2018-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Chengwei Su , Spyridon Matsoukas , Sankaranarayanan Ananthakrishnan , Shirin Saleem , Chungnam Chan , Yugang Li , Mallory McManamon , Rahul Gupta , Luca Soldaini
IPC: G10L15/26 , G06K9/62 , G06N20/10 , G06N7/00 , G06F40/295
Abstract: Techniques for processing a user input are described. Text data representing a user input is processed with respect to at least one finite state transducer (FST) to generate at least one FST hypothesis. Context information may be required to traverse one or more paths of the at least one FST. The text data is also processed using at least one statistical model (e.g., perform intent classification, named entity recognition, and/or domain classification processing) to generate at least one statistical model hypothesis. The at least one FST hypothesis and the at least one statistical model hypothesis are input to a reranker that determines a most likely interpretation of the user input.
-
公开(公告)号:US11081104B1
公开(公告)日:2021-08-03
申请号:US15838917
申请日:2017-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Chengwei Su , Sankaranarayanan Ananthakrishnan , Spyridon Matsoukas , Shirin Saleem , Rahul Gupta , Kavya Ravikumar , John Will Crimmins , Kelly James Vanee , John Pelak , Melanie Chie Bomke Gens
IPC: G10L15/18 , G10L15/22 , G10L15/06 , G10L15/183 , H04L29/08 , G10L15/32 , G06K9/00 , H04W4/02 , G10L15/26 , G06F16/31 , G06F40/295
Abstract: A natural language understanding system that can determine an overall score for a natural language hypothesis using hypothesis-specific component scores from different aspects of NLU processing as well as context data describing the context surrounding the utterance corresponding to the natural language hypotheses. The individual component scores may be input into a feature vector at a location corresponding to a type of a device captured by the utterance. Other locations in the feature vector corresponding to other device types may be populated with zero values. The feature vector may also be populated with other values represent other context data. The feature vector may then be multiplied by a weight vector comprising trained weights corresponding to the feature vector positions to determine a new overall score for each hypothesis, where the overall score incorporates the impact of the context data. Natural language hypotheses can be ranked using their respective new overall scores.
-
公开(公告)号:US11043205B1
公开(公告)日:2021-06-22
申请号:US15838974
申请日:2017-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Chengwei Su , Sankaranarayanan Ananthakrishnan , Spyridon Matsoukas , Rahul Gupta , Kelly James Vanee
IPC: G10L15/22 , G10L15/18 , G10L15/06 , G10L15/16 , G10L15/183 , G06N3/02 , G06N20/00 , G06F16/31 , G06F40/295
Abstract: A natural language processing system that can determine an overall score for a natural language hypothesis using hypothesis-specific component scores from different aspects of NLU processing. The individual component scores may be weighted by weights trained to optimize the overall scores relative to each other. Each domain of the system may be configured with a separate component that determines the overall score with respect to the domain. Natural language hypotheses can be ranked using the overall score either within a specific domain or on a cross-domain basis.
-
公开(公告)号:US12087305B2
公开(公告)日:2024-09-10
申请号:US18324440
申请日:2023-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US12045288B1
公开(公告)日:2024-07-23
申请号:US17031062
申请日:2020-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Ahmet Emre Barut , Chengwei Su , Weitong Ruan , Wael Hamza
IPC: G06F16/30 , G06F16/532 , G06F16/583 , G06F16/9032 , G06V20/20 , G06N20/00
CPC classification number: G06F16/90332 , G06F16/532 , G06F16/583 , G06V20/20 , G06N20/00
Abstract: Devices and techniques are generally described for selection of objects in image data using natural language input. In various examples, first image data representing at least a first object and first natural language data may be received. In some examples, first embedding data representing the first natural language data may be generated. Second embedding data representing the first image data may be generated. Relative location data indicating a location of the first object in the first image data relative to at least one other object may be generated. The first embedding data, the second embedding data, and the relative location data may be input into a multi-modal transformer model. The multi-modal transformer model may determine that the first natural language data relates to the first object.
-
公开(公告)号:US11682400B1
公开(公告)日:2023-06-20
申请号:US17106600
申请日:2020-11-30
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
-
-
-
-
-
-