ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

    公开(公告)号:US20250094709A1

    公开(公告)日:2025-03-20

    申请号:US18827108

    申请日:2024-09-06

    Abstract: A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.

    Apparatus and method for compositional spoken language understanding

    公开(公告)号:US12211486B2

    公开(公告)日:2025-01-28

    申请号:US17647499

    申请日:2022-01-10

    Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.

    SYSTEM AND METHOD FOR ZERO-SHOT OBJECT NAVIGATION USING LARGE LANGUAGE MODELS

    公开(公告)号:US20240377829A1

    公开(公告)日:2024-11-14

    申请号:US18501887

    申请日:2023-11-03

    Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.

    SYSTEM AND METHOD FOR MASK-BASED NEURAL BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

    公开(公告)号:US20240331715A1

    公开(公告)日:2024-10-03

    申请号:US18457921

    申请日:2023-08-29

    CPC classification number: G10L21/0224 G10L2021/02166

    Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.

    System and method for complex task machine learning

    公开(公告)号:US11875231B2

    公开(公告)日:2024-01-16

    申请号:US16661827

    申请日:2019-10-23

    CPC classification number: G06N20/00 G06N5/02

    Abstract: An electronic device for complex task machine learning includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive an unknown command for performing a task and generate a prompt regarding the unknown command. The at least one processor is also configured to receive one or more instructions in response to the prompt, where each of the one or more instructions provides information on performing at least a portion of the task. The at least one processor is further configured to determine at least one action for each one of the one or more instructions. In addition, the at least one processor is configured to create a complex action for performing the task based on the at least one action for each one of the one or more instructions.

    APPARATUS AND METHOD FOR COMPOSITIONAL SPOKEN LANGUAGE UNDERSTANDING

    公开(公告)号:US20220375457A1

    公开(公告)日:2022-11-24

    申请号:US17647499

    申请日:2022-01-10

    Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.

    METHOD AND APPARATUS FOR CLASSIFYING IMAGES USING AN ARTIFICIAL INTELLIGENCE MODEL

    公开(公告)号:US20220309774A1

    公开(公告)日:2022-09-29

    申请号:US17701209

    申请日:2022-03-22

    Abstract: An apparatus for performing image processing, may include at least one processor configured to: input an image to a vision transformer comprising a plurality of encoders that correspond to at least one fixed encoder and a plurality of adaptive encoders; process the image via the at least one fixed encoder to obtain image representations; determine one or more layers of the plurality of adaptive encoders to drop, by inputting the image representations to a policy network configured to determine layer dropout actions for the plurality of adaptive encoders; and obtain a class of the input image using remaining layers of the plurality of adaptive encoders other than the dropped one or more layers.

    Method to learn personalized intents

    公开(公告)号:US11182565B2

    公开(公告)日:2021-11-23

    申请号:US15904203

    申请日:2018-02-23

    Abstract: A method includes retrieving, at an electronic device, a first natural language (NL) input. An intent of the first NL input is undetermined by both a generic parser and a personal parser. A paraphrase of the first NL input is retrieved at the electronic device. An intent of the paraphrase of the first NL input is determined using at least one of: the generic parser, the personal parser, or a combination thereof. A new personal intent for the first NL input is generated based on the determined intent. The personal parser is trained using existing personal intents and the new personal intent.

Patent Agency Ranking