-
公开(公告)号:US11003959B1
公开(公告)日:2021-05-11
申请号:US16440384
申请日:2019-06-13
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Ilya Levner , Konstantinos Boulis , Gurbinder Gill , Canku Calargun , Prajwal Yadapadithaya , Venkata Krishnan Ramamoorthy , Zhaoqing Ma
Abstract: Categorizing images may include training a first neural network to cluster a plurality of images to obtain a first image embedding space, wherein a vector representation is determined for each of the plurality of images based on the training, determining a vector norm value corresponding to each of the plurality of images based on the vector representation for each of the plurality of images, and identifying a first subset of the images for which a corresponding vector norm value satisfies a predetermined vector norm quality threshold. Then, a second neural network may be trained using the first subset of images to obtain a second image embedding space, and the second image embedding space may be used to categorize additional images.
-
公开(公告)号:US10276185B1
公开(公告)日:2019-04-30
申请号:US15677659
申请日:2017-08-15
Applicant: Amazon Technologies, Inc.
Inventor: Zhaoqing Ma , Tony Roy Hardie , Christo Frank Devaraj
Abstract: A system configured to vary a speech speed of speech represented in input audio data without changing a pitch of the speech. The system may vary the speech speed based on a number of different inputs, including non-audio data, data associated with a command, or data associated with the voice message itself. The non-audio data may correspond to information about an account, device or user, such as user preferences, calendar entries, location information, etc. The system may analyze audio data associated with the command to determine command speech speed, identity of person listening, etc. The system may analyze the input audio data to determine a message speech speed, background noise level, identity of the person speaking, etc. Using all of these inputs, the system may dynamically determine a target speech speed and may generate output audio data having the target speech speed.
-
公开(公告)号:US11354936B1
公开(公告)日:2022-06-07
申请号:US16929387
申请日:2020-07-15
Applicant: Amazon Technologies, Inc.
Inventor: Dharmil Satishbhai Chandarana , Ilya Levner , Zhaoqing Ma , Prajwal Yadapadithaya , Riley James Williams , Canku Alp Calargun , Prama Anand
Abstract: Techniques for improved image classification are provided. Face embeddings are generated for each face depicted in a collection of images, and the face embeddings are clustered based on the individual whose face is depicted. Based on these clusters, each embedding is assigned a label reflecting the cluster assignments. Some or all of the face embeddings are then used to train a classifier model to generate cluster labels for new input images. This classifier model can then be used to process new images in an efficient manner, and classify them into appropriate clusters.
-
公开(公告)号:US11232808B2
公开(公告)日:2022-01-25
申请号:US16394717
申请日:2019-04-25
Applicant: Amazon Technologies, Inc.
Inventor: Zhaoqing Ma , Tony Roy Hardie , Christo Frank Devaraj
Abstract: A system configured to vary a speech speed of speech represented in input audio data without changing a pitch of the speech. The system may vary the speech speed based on a number of different inputs, including non-audio data, data associated with a command, or data associated with the voice message itself. The non-audio data may correspond to information about an account, device or user, such as user preferences, calendar entries, location information, etc. The system may analyze audio data associated with the command to determine command speech speed, identity of person listening, etc. The system may analyze the input audio data to determine a message speech speed, background noise level, identity of the person speaking, etc. Using all of these inputs, the system may dynamically determine a target speech speed and may generate output audio data having the target speech speed.
-
公开(公告)号:US20190318758A1
公开(公告)日:2019-10-17
申请号:US16394717
申请日:2019-04-25
Applicant: Amazon Technologies, Inc.
Inventor: Zhaoqing Ma , Tony Roy Hardie , Christo Frank Devaraj
Abstract: A system configured to vary a speech speed of speech represented in input audio data without changing a pitch of the speech. The system may vary the speech speed based on a number of different inputs, including non-audio data, data associated with a command, or data associated with the voice message itself. The non-audio data may correspond to information about an account, device or user, such as user preferences, calendar entries, location information, etc. The system may analyze audio data associated with the command to determine command speech speed, identity of person listening, etc. The system may analyze the input audio data to determine a message speech speed, background noise level, identity of the person speaking, etc. Using all of these inputs, the system may dynamically determine a target speech speed and may generate output audio data having the target speech speed.
-
-
-
-