-
公开(公告)号:US20230119791A1
公开(公告)日:2023-04-20
申请号:US17937765
申请日:2022-10-03
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Seunghan YANG , Hyunsin PARK , Juntae LEE , Simyung CHANG
IPC: G10L21/034 , G10L17/18 , G10L25/30 , G10L25/51 , G10L17/04
Abstract: Techniques and apparatus for training a neural network to classify audio into one of a plurality of categories and using such a trained neural network. An example method generally includes receiving a data set including a plurality of audio samples. A relaxed feature-normalized data set is generated by normalizing each audio sample of the plurality of audio samples. A neural network is trained to classify audio into one of a plurality of categories based on the relaxed feature-normalized data set, and the trained neural network is deployed.
-
公开(公告)号:US20220405547A1
公开(公告)日:2022-12-22
申请号:US17807479
申请日:2022-06-17
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Simyung Chang , Jangho Kim , Seunghan Yang , Kyu Woong Hwang
Abstract: Certain aspects of the present disclosure provide techniques for residual normalization. A first tensor comprising a frequency dimension and a temporal dimension is accessed. A second tensor is generated by applying a frequency-based instance normalization operation to the first tensor, comprising, for each respective frequency bin in the frequency dimension, computing a respective frequency-specific mean of the first tensor. A third tensor is generated by: scaling the first tensor by a scale value, and aggregating the scaled first tensor and the second tensor. The third tensor is provided as input to a layer of a neural network.
-
公开(公告)号:US20220309344A1
公开(公告)日:2022-09-29
申请号:US17656621
申请日:2022-03-25
Applicant: QUALCOMM Incorporated
Inventor: Byeonggeun KIM , Simyung Chang , Jinkyu Lee , Dooyong Sung
Abstract: Certain aspects of the present disclosure provide techniques for efficient broadcasted residual machine learning. An input tensor comprising a frequency dimension and a temporal dimension is received, and the input tensor is processed with a first convolution operation to generate a multidimensional intermediate feature map comprising the frequency dimension and the temporal dimension. The multidimensional intermediate feature map is converted to a one-dimensional intermediate feature map in the temporal dimension using a frequency dimension reduction operation, and the one-dimensional intermediate feature map is processed using a second convolution operation to generate a temporal feature map. The temporal feature map is expanded to the frequency dimension using a broadcasting operation to generate a multidimensional output feature map, and the multidimensional output feature map is augmented with the multidimensional intermediate feature map via a first residual connection.
-
-