System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Invention Grant

US09208778B2 System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification 有权

Title translation: 用于组合帧和段级处理的系统和方法，通过时间池进行语音分类

Please log in to see more content

Patent Title: System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification
Patent Title (中): 用于组合帧和段级处理的系统和方法，通过时间池进行语音分类
Application No.: US14537400

Application Date: 2014-11-10
Publication No.: US09208778B2

Publication Date: 2015-12-08
Inventor: Sumit Chopra , Dimitrios Dimitriadis , Patrick Haffner
Applicant: AT&T Intellectual Property I, L.P.
Applicant Address: US GA Atlanta
Assignee: AT&T Intellectual Property I, L.P.
Current Assignee: AT&T Intellectual Property I, L.P.
Current Assignee Address: US GA Atlanta
Main IPC: G10L15/08
IPC: G10L15/08 ; G10L15/02 ; G10L15/16

System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Abstract:

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations. Based on the scores, the plurality of segmental classification units selects a class label and returns a result.

Abstract(Chinese):

本文公开了用于通过时间池来组合帧和段级处理用于语音分类的系统，方法和非暂时的计算机可读存储介质。帧处理器单元接收输入并从输入中提取与时间相关的特征。多个池化接口单元基于集合时间依赖特征并根据多个选择策略选择多个时间相关特征来生成多个特征向量。接下来，多个分段分类单元生成特征向量的得分。每个分段分类单元（SCU）可专用于特定的汇聚接口单元（PIU）以形成PIU-SCU组合。可以进一步组合多个PIU-SCU组合以形成组合的集合，并且可以通过改变PIU-SCU组合使用的合并操作来使集合多样化。基于分数，多个分段分类单元选择分类标签并返回结果。

Public/Granted literature

US20150058012A1 System and Method for Combining Frame and Segment Level Processing, Via Temporal Pooling, for Phonetic Classification Public/Granted day:2015-02-26

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索