-
公开(公告)号:US10672387B2
公开(公告)日:2020-06-02
申请号:US15839499
申请日:2017-12-12
申请人: GOOGLE LLC
发明人: Richard Lyon , Christopher Hughes , Yuxuan Wang , Ryan Rifkin , Pascal Getreuer
摘要: The various implementations described herein include methods, devices, and systems for recognizing speech, such as user commands. In one aspect, a method includes: (1) receiving audio input data via the one or more microphones; (2) generating a plurality of energy channels for the audio input data; (3) generating a feature vector by performing a per-channel normalization to each channel of the plurality of energy channels; and (4) obtaining recognized speech from the audio input utilizing the feature vector.
-
2.
公开(公告)号:US20190130192A1
公开(公告)日:2019-05-02
申请号:US15798733
申请日:2017-10-31
申请人: Google LLC
发明人: Alejandro Kauffmann , Andrew Dahley , Phuong Le , Mark Bowers , Ignacio Garcia Dorado , Robin Debreuil , William Lindmeier , Brian Allen , Ashley Ma , Pascal Getreuer
IPC分类号: G06K9/00 , H04N5/14 , G11B27/036 , H04N5/262
CPC分类号: G06K9/00751 , G06K9/00765 , G11B27/031 , G11B27/036 , G11B27/28 , G11B27/34 , H04N5/144 , H04N5/147 , H04N5/2628
摘要: The present disclosure provides systems and methods that generate a summary storyboard from a plurality of image frames. An example computer-implemented method can include inputting a plurality of image frames into a machine-learned model and receiving as an output of the machine-learned model, object data that describes the respective locations of a plurality of objects recognized in the plurality of image frames. The method can include generating a plurality of image crops that respectively include the plurality of objects and arranging two or more of the plurality of image crops to generate a storyboard.
-
公开(公告)号:US20180197533A1
公开(公告)日:2018-07-12
申请号:US15839499
申请日:2017-12-12
申请人: GOOGLE LLC
发明人: Richard Lyon , Christopher Hughes , Yuxuan Wang , Ryan Rifkin , Pascal Getreuer
摘要: The various implementations described herein include methods, devices, and systems for recognizing speech, such as user commands. In one aspect, a method includes: (1) receiving audio input data via the one or more microphones; (2) generating a plurality of energy channels for the audio input data; (3) generating a feature vector by performing a per-channel normalization to each channel of the plurality of energy channels; and (4) obtaining recognized speech from the audio input utilizing the feature vector.
-
-