-
公开(公告)号:US20210056974A1
公开(公告)日:2021-02-25
申请号:US16547263
申请日:2019-08-21
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Young Mo KANG , Sungrack YUN , Kyu Woong HWANG , Hye Jin JANG
IPC: G10L17/00
Abstract: A device to process an audio signal representing input sound includes a user voice verifier configured to generate a first indication based on whether the audio signal represents a user's voice. The device includes a speaking target detector configured to generate a second indication based on whether the audio signal represents at least one of a command or a question. The device includes an activation signal unit configured to selectively generate an activation signal based on the first indication and the second indication. The device also includes an automatic speech recognition engine configured to be activated, responsive to the activation signal, to process the audio signal.
-
公开(公告)号:US20240185088A1
公开(公告)日:2024-06-06
申请号:US18323197
申请日:2023-05-24
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Juntae LEE , Seunghan YANG , Simyung CHANG
IPC: G06N3/0985 , G06N3/045 , G06N3/048
CPC classification number: G06N3/0985 , G06N3/045 , G06N3/048
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for scalable weight reparameterization for efficient transfer learning. One example method generally includes training a first neural network to perform a task based on weights defined for a machine learning (ML) model trained to perform a different task and learned reparameterizing weights for each of a plurality of layers in the ML model; training a second neural network to generate a plurality of gating parameters based on a cost factor and the trained first neural network, each respective gating parameter of the plurality of gating parameters corresponding to weights in a respective layer of the plurality of layers; and updating the ML model based on the weights defined for the ML model, each gating parameter for each layer of the plurality of layers, and the learned reparameterizing weights for each layer of the plurality of layers.
-
公开(公告)号:US20240119360A1
公开(公告)日:2024-04-11
申请号:US18338174
申请日:2023-06-20
Applicant: QUALCOMM Incorporated
Inventor: Hyesu LIM , Byeonggeun KIM , Sungha CHOI
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: Certain aspects of the present disclosure provide techniques and apparatuses for adapting a machine learning model for inferencing against a target data set in a shifted domain from a source data set used to train the machine learning model. An example method generally includes identifying one or more domain-sensitive layers in a machine learning model based on differences between outputs generated by one or more layers in the machine learning model for inputs in a source domain and inputs in a shifted domain. Normalizing values are updated for each respective domain-sensitive layer of the one or more domain-sensitive layers based on a mixing factor, fixed normalizing values for data in the source domain, and calculated normalizing values for data in the shifted domain. The updated normalizing values are applied to each respective domain-sensitive layer of the one or more domain-sensitive layers in the machine learning model.
-
公开(公告)号:US20210304734A1
公开(公告)日:2021-09-30
申请号:US16830029
申请日:2020-03-25
Applicant: Qualcomm Incorporated
Inventor: Young Mo KANG , Sungrack YUN , Kyu Woong HWANG , Hye Jin JANG , Byeonggeun KIM
Abstract: In one embodiment, an electronic device includes an input device configured to provide an input stream, a first processing device, and a second processing device. The first processing device is configured to use a keyword-detection model to determine if the input stream comprises a keyword, wake up the second processing device in response to determining that a segment of the input stream comprises the keyword, and modify the keyword-detection model in response to a training input received from the second processing device. The second processing device is configured to use a first neural network to determine whether the segment of the input stream comprises the keyword and provide the training input to the first processing device in response to determining that the segment of the input stream does not comprise the keyword.
-
公开(公告)号:US20240004889A1
公开(公告)日:2024-01-04
申请号:US18153899
申请日:2023-01-12
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Juntae LEE , Simyung CHANG
IPC: G06F16/2458 , G06F16/28
CPC classification number: G06F16/2462 , G06F16/285
Abstract: Systems and techniques are provided for processing one or more data samples. For example, a neural network classifier can be trained to perform few-shot open-set recognition (FSOSR) based on a task-agnostic open-set prototype. A process can include determining one or more prototype representations for each class included in a plurality of support samples. A task-agnostic open-set prototype representation can be determined, in a same learned metric space as the one or more prototype representations. One or more distance metrics can be determined for each query sample of one or more query samples, based on the one or more prototype representations and the task-agnostic open-set prototype representation. Based on the one or more distance metrics, each query sample can be classified into one of classes associated with the one or more prototype representations or an open-set class associated with the task-agnostic open-set prototype representation.
-
公开(公告)号:US20230298592A1
公开(公告)日:2023-09-21
申请号:US18153932
申请日:2023-01-12
Applicant: QUALCOMM Incorporated
Inventor: Seunghan YANG , Byeonggeun KIM , Inseop CHUNG , Simyung CHANG
Abstract: Systems and techniques are provided for processing audio data. For example, the systems and techniques can be used for personalized keyword spotting through multi-task learning (PK-MTL). A process can include obtaining an audio sample, generating a representation of a keyword based on the audio sample, and generating a representation of a speaker based on the audio sample. The speaker can be associated with the keyword. A first similarity score can be determined based on a reference representation and one or more of the representation of the keyword and a representation of the speaker. The reference representation can be associated with one or more of the keyword and the speaker. A keyword spotting (KWS) output can be generated based on analyzing the first similarity score against at least a first threshold, wherein the KWS output accepts or rejects the audio sample as including a target keyword.
-
公开(公告)号:US20230298572A1
公开(公告)日:2023-09-21
申请号:US18062976
申请日:2022-12-07
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Seunghan YANG , Inseop CHUNG , Simyung CHANG
CPC classification number: G10L15/16 , G06F18/22 , G10L2015/088
Abstract: Systems and techniques are provided for processing audio data. For example, a dummy prototypical network may be used to perform few-shot open-set keyword spotting (FSOS-KWS). A process can include determining one or more prototype representations based on a plurality of support samples associated with one or more classes. Each prototype representation may be associated with one of the class(es). A dummy prototype representation can be determined in a same learned metric space as the prototype representations. One or more distance metrics can be determined for each query sample of one or more query samples. The distance metrics may be based on the prototype representations and the dummy prototype representation. Each query sample can be classified based on the distance metrics. Each query sample may be classified into one of the class(es) associated with the prototype representations or into an open-set class associated with the dummy prototype representation.
-
公开(公告)号:US20220101827A1
公开(公告)日:2022-03-31
申请号:US17038887
申请日:2020-09-30
Applicant: QUALCOMM Incorporated
Inventor: Wonil CHANG , Jinseok LEE , Mingu LEE , Jinkyu LEE , Byeonggeun KIM , Dooyong SUNG , Jae-Won CHOI , Kyu Woong HWANG
Abstract: System and method for operating an always-on ASR (automatic speech recognition) system by selecting target keywords and continuously detecting the selected target keywords in voice commands in a mobile device are provided. In the mobile device, a processor is configured to collect keyword candidates, collect usage frequency data for keywords in the keyword candidates, collect situational usage frequency data for the keywords in the keyword candidates, select target keywords from the keyword candidates based on the usage frequency data and the situational usage frequency data, and detect one or more of the target keywords in a voice command using continuous detection of the target keywords.
-
公开(公告)号:US20240185078A1
公开(公告)日:2024-06-06
申请号:US18456112
申请日:2023-08-25
Applicant: QUALCOMM Incorporated
Inventor: Simyung CHANG , Byeonggeun KIM , Seunghan YANG , Kyuhong SHIM
Abstract: A processor-implement method includes generating, for each input of a group of inputs, a clean sample and an augmented sample. The method also includes associating, for each input of the group of inputs, the clean sample with the augmented sample to form a positive pair. The method further includes associating, for each input of the group of inputs, the clean sample with another clean sample associated with another input of the group of inputs to form a negative pair. The method still further includes learning one or more representations of the group of inputs based on the positive pair and the negative pair of each input of the group of inputs.
-
公开(公告)号:US20230281885A1
公开(公告)日:2023-09-07
申请号:US17685278
申请日:2022-03-02
Applicant: QUALCOMM Incorporated
Inventor: Hyunsin PARK , Juntae LEE , Simyung CHANG , Byeonggeun KIM , Jaewon CHOI , Kyu Woong HWANG
CPC classification number: G06T11/00 , G06F3/013 , G06V40/174 , G06V40/18
Abstract: Imaging systems and techniques are described. An imaging system receives image data representing at least a portion (e.g., a face) of a first user as captured by a first image sensor. The imaging system identifies that a gaze of the first user as represented in the image data is directed toward a displayed representation of at least a portion (e.g., a face) of a second user. The imaging system identifies an arrangement of representations of users for output. The imaging system generates modified image data based on the gaze and the arrangement at least in part by modifying the image data to modify at least the portion of the first user in the image data to be visually directed toward a direction corresponding to the second user based on the gaze and the arrangement. The imaging system outputs the modified image data arranged according to the arrangement.
-
-
-
-
-
-
-
-
-