-
公开(公告)号:US20250094709A1
公开(公告)日:2025-03-20
申请号:US18827108
申请日:2024-09-06
Applicant: Samsung Electronics Co., Ltd.
Inventor: Shikhar TULI , Chi-Heng Lin , Yen-Chang Hsu , Yilin Shen , Hongxia Jin
IPC: G06F40/284
Abstract: A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.
-
公开(公告)号:US12211486B2
公开(公告)日:2025-01-28
申请号:US17647499
申请日:2022-01-10
Applicant: Samsung Electronics Co., Ltd.
Inventor: Avik Ray , Yilin Shen , Hongxia Jin
Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.
-
公开(公告)号:US20240377829A1
公开(公告)日:2024-11-14
申请号:US18501887
申请日:2023-11-03
Applicant: Samsung Electronics Co., Ltd.
Inventor: Yilin Shen , Kaiwen Zhou , Hongxia Jin
Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.
-
24.
公开(公告)号:US20240331715A1
公开(公告)日:2024-10-03
申请号:US18457921
申请日:2023-08-29
Applicant: Samsung Electronics Co., Ltd.
Inventor: Ching-Hua Lee , Chou-Chang Yang , Yilin Shen , Hongxia Jin
IPC: G10L21/0224
CPC classification number: G10L21/0224 , G10L2021/02166
Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.
-
公开(公告)号:US11875231B2
公开(公告)日:2024-01-16
申请号:US16661827
申请日:2019-10-23
Applicant: Samsung Electronics Co., Ltd.
Inventor: Avik Ray , Yilin Shen , Hongxia Jin
Abstract: An electronic device for complex task machine learning includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive an unknown command for performing a task and generate a prompt regarding the unknown command. The at least one processor is also configured to receive one or more instructions in response to the prompt, where each of the one or more instructions provides information on performing at least a portion of the task. The at least one processor is further configured to determine at least one action for each one of the one or more instructions. In addition, the at least one processor is configured to create a complex action for performing the task based on the at least one action for each one of the one or more instructions.
-
公开(公告)号:US20220375457A1
公开(公告)日:2022-11-24
申请号:US17647499
申请日:2022-01-10
Applicant: Samsung Electronics Co., Ltd.
Inventor: Avik Ray , Yilin Shen , Hongxia Jin
Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.
-
公开(公告)号:US20220309774A1
公开(公告)日:2022-09-29
申请号:US17701209
申请日:2022-03-22
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Burak Uzkent , Vasili Ramanishka , Yilin Shen , Hongxia Jin
IPC: G06V10/82 , G06V10/764
Abstract: An apparatus for performing image processing, may include at least one processor configured to: input an image to a vision transformer comprising a plurality of encoders that correspond to at least one fixed encoder and a plurality of adaptive encoders; process the image via the at least one fixed encoder to obtain image representations; determine one or more layers of the plurality of adaptive encoders to drop, by inputting the image representations to a policy network configured to determine layer dropout actions for the plurality of adaptive encoders; and obtain a class of the input image using remaining layers of the plurality of adaptive encoders other than the dropped one or more layers.
-
公开(公告)号:US11314940B2
公开(公告)日:2022-04-26
申请号:US15986633
申请日:2018-05-22
Applicant: Samsung Electronics Co., Ltd.
Inventor: Avik Ray , Yilin Shen , Hongxia Jin
IPC: G10L15/22 , G06F40/30 , G06F9/451 , G06N5/02 , G06F40/205 , G06F40/253 , G10L15/07 , G10L15/06
Abstract: A method includes determining, by an electronic device, a skill from a first natural language (NL) input. Upon successful determination of the skill, the first NL input is transmitted to a custom skill parser for determination of a skill intent. The custom skill parser is trained based on data including at least a custom training data set. Upon unsuccessful determination of the skill, the first NL input is transmitted to a generic parser for determination of a general intent of the first NL input.
-
公开(公告)号:US11182565B2
公开(公告)日:2021-11-23
申请号:US15904203
申请日:2018-02-23
Applicant: Samsung Electronics Co., Ltd.
Inventor: Avik Ray , Yilin Shen , Hongxia Jin
IPC: G06F40/35 , G10L15/18 , G06F3/16 , G06F40/30 , G06F40/247 , G06F40/295 , G10L15/07 , G06F40/216
Abstract: A method includes retrieving, at an electronic device, a first natural language (NL) input. An intent of the first NL input is undetermined by both a generic parser and a personal parser. A paraphrase of the first NL input is retrieved at the electronic device. An intent of the paraphrase of the first NL input is determined using at least one of: the generic parser, the personal parser, or a combination thereof. A new personal intent for the first NL input is generated based on the determined intent. The personal parser is trained using existing personal intents and the new personal intent.
-
公开(公告)号:US20210342624A1
公开(公告)日:2021-11-04
申请号:US17231958
申请日:2021-04-15
Applicant: Samsung Electronics Co., Ltd.
Inventor: Yu Wang , Yilin Shen , Hongxia Jin
Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.
-
-
-
-
-
-
-
-
-