-
公开(公告)号:US08670983B2
公开(公告)日:2014-03-11
申请号:US13221270
申请日:2011-08-30
申请人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
发明人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
IPC分类号: G10L15/04
CPC分类号: G10L25/00
摘要: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
摘要翻译: 一种用于确定第一音频源和第二音频源之间的相似度的方法包括:对于第一音频源,确定多个音素序列中的每一个的第一出现频率,并且确定多个音素中的每一个的第一加权频率 基于音素序列的第一个发生频率的音素序列; 对于第二音频源,确定多个音素序列中的每一个的第二出现频率,并且基于音素序列的第二出现频率确定多个音素序列中的每一个的第二加权频率; 将每个音素序列的第一加权频率与相应音素序列的第二加权频率进行比较; 以及基于所述比较的结果生成表示所述第一音频源和所述第二音频源之间的相似度的相似度分数。
-
公开(公告)号:US20120059656A1
公开(公告)日:2012-03-08
申请号:US13221270
申请日:2011-08-30
申请人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
发明人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
IPC分类号: G10L15/04
CPC分类号: G10L25/00
摘要: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
摘要翻译: 一种用于确定第一音频源和第二音频源之间的相似度的方法包括:对于第一音频源,确定多个音素序列中的每一个的第一出现频率,并且确定多个音素中的每一个的第一加权频率 基于音素序列的第一个发生频率的音素序列; 对于第二音频源,确定多个音素序列中的每一个的第二出现频率,并且基于音素序列的第二出现频率确定多个音素序列中的每一个的第二加权频率; 将每个音素序列的第一加权频率与相应音素序列的第二加权频率进行比较; 以及基于所述比较的结果生成表示所述第一音频源和所述第二音频源之间的相似度的相似度分数。
-
公开(公告)号:US20120278079A1
公开(公告)日:2012-11-01
申请号:US13097830
申请日:2011-04-29
IPC分类号: G10L15/04
CPC分类号: G10L15/02 , G10L19/24 , G10L2015/088
摘要: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.
摘要翻译: 音频处理系统利用多个级别的压缩或数据缩减,从而提供降低的存储要求,同时保持原始音频输入中的关键字检测的高精度。
-
公开(公告)号:US09361879B2
公开(公告)日:2016-06-07
申请号:US12391395
申请日:2009-02-24
申请人: Robert W. Morris , Jon A. Arrowood , Mark A. Clements , Kenneth King Griggs , Peter S. Cardillo , Marsal Gavalda
发明人: Robert W. Morris , Jon A. Arrowood , Mark A. Clements , Kenneth King Griggs , Peter S. Cardillo , Marsal Gavalda
IPC分类号: G10L15/10 , G10L15/187 , G10L25/54 , G10L15/08
CPC分类号: G10L15/10 , G10L15/187 , G10L25/54 , G10L2015/088
摘要: In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns.
摘要翻译: 一方面,用于处理媒体的方法包括接受查询。 识别与查询相似的一种或多种语言模式。 查询的推定实例位于媒体中。 推定的实例与媒体中的相应位置相关联。 将推定实例附近的媒体与所识别的语言模式进行比较,并且根据媒体与语言模式的比较来提供表征推定的查询实例的数据,例如作为推定实例的得分, 是根据媒体与语言模式的比较来确定的。
-
公开(公告)号:US08719022B2
公开(公告)日:2014-05-06
申请号:US13097830
申请日:2011-04-29
IPC分类号: G10L15/00
CPC分类号: G10L15/02 , G10L19/24 , G10L2015/088
摘要: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.
摘要翻译: 音频处理系统利用多个级别的压缩或数据缩减,从而提供降低的存储要求,同时保持原始音频输入中的关键字检测的高精度。
-
公开(公告)号:US20110044447A1
公开(公告)日:2011-02-24
申请号:US12545282
申请日:2009-08-21
CPC分类号: H04M3/51 , G06T11/206 , G10L2015/088 , H04M2201/38 , H04M2203/357
摘要: Techniques for processing data representative of text associated with one or more content sources to generate a specification of a set of keyphrases of interest; processing a first set of audio signals collected during a first time period to generate first data characterizing putative occurrences of one or more keyphrases of the set in the first set of audio signals; evaluating the first data to generate keyphrase-specific comparison values for the first set of audio signals; deriving first trending data between the first set of audio signals and a second set of audio signals based in part on an analysis of the keyphrase-specific comparison values for the first set of audio signals relative to stored keyphrase-specific baseline values; and generating a visual representation of at least some of the first trending data and causing the visual representation of the first trending data to be presented on a display terminal.
摘要翻译: 用于处理表示与一个或多个内容源相关联的文本的数据的技术,以生成一组感兴趣的关键短语的说明; 处理在第一时间段期间收集的第一组音频信号,以产生第一数据,以表征所述第一组音频信号中所述的一个或多个关键短语的推定出现; 评估所述第一数据以产生所述第一组音频信号的关键短语特定比较值; 部分地基于相对于存储的关键短语特定基线值对第一组音频信号的关键词特定比较值的分析,得出第一组音频信号和第二组音频信号之间的第一趋势数据; 以及生成第一趋势数据中的至少一些的视觉表示,并使第一趋势数据的视觉表示呈现在显示终端上。
-
公开(公告)号:US20100332225A1
公开(公告)日:2010-12-30
申请号:US12493786
申请日:2009-06-29
CPC分类号: G10L15/26
摘要: Some general aspects relate to systems and methods for media processing. One aspect, for example, relates to a method for aligning multimedia recording with a transcript. A group of search terms are formed from the transcript, with each search term being associated with a location within the transcript. Putative locations of the search terms are determined in a time interval of the multimedia recording. For each search term, zero or more putative locations are determined and, for at least some of the search terms, multiple putative locations are determined in the time interval of the multimedia recording. According to a first sequencing constraint, a first representation of a group of sequences each of a subset of the putative locations of the search terms is formed. A second representation of a group of sequences each of a subset of the search terms is formed. Using the first and the second representations, the time interval of the multimedia recording is partially aligned with the transcript.
摘要翻译: 一些一般方面涉及用于媒体处理的系统和方法。 一方面,例如涉及用于将多媒体记录与抄本对齐的方法。 一组搜索词由抄本形成,每个搜索词与抄本中的一个位置相关联。 在多媒体记录的时间间隔内确定搜索项的推定位置。 对于每个搜索项,确定零个或多个推定位置,并且对于至少一些搜索项,在多媒体记录的时间间隔中确定多个推定位置。 根据第一排序约束,形成搜索项的推定位置的子集中的每一个序列组的第一表示。 形成搜索项的子集中的每一个的一组序列的第二表示。 使用第一和第二表示,多媒体记录的时间间隔与抄本部分对齐。
-
公开(公告)号:US07640161B2
公开(公告)日:2009-12-29
申请号:US11748319
申请日:2007-05-14
申请人: Robert W. Morris , Jon A. Arrowood , Marsal Gavalda , Peter S. Cardillo , Mark Finlay , Zahi Karam
发明人: Robert W. Morris , Jon A. Arrowood , Marsal Gavalda , Peter S. Cardillo , Mark Finlay , Zahi Karam
IPC分类号: G10L13/00
CPC分类号: G10L15/26 , G10L2015/025 , G10L2015/0638 , G10L2015/088
摘要: An approach to improving the performance of a wordspotting system includes providing an interface for interactive improvement of a phonetic representation of a query based on an operator identifying true detections and false alarms in a data set.
摘要翻译: 改进字注系统的性能的方法包括提供用于基于识别数据集中的真实检测和虚假警报的操作者的交互式改进查询的语音表示的接口。
-
公开(公告)号:US09001976B2
公开(公告)日:2015-04-07
申请号:US13463104
申请日:2012-05-03
CPC分类号: G10L15/07 , G10L2015/088 , H04M3/51 , H04M2201/40 , Y10S379/907
摘要: A method for speaker adaptation includes receiving a plurality of media files, each associated with a call center agent of a plurality of call center agents and receiving a plurality of terms. Speech processing is performed on at least some of the media files to identify putative instances of at least some of the plurality of terms. Each putative instance is associated with a hit quality that characterizes a quality of recognition of the corresponding term. One or more call center agents for performing speaker adaptation are determined, including identifying call center agents that are associated with at least one media file that includes one or more putative instances with a hit quality below a predetermined threshold. Speaker adaptation is performed for each identified call center agent based on the media files associated with the identified call center agent and the identified instances of the plurality of terms.
摘要翻译: 用于说话者适应的方法包括接收多个媒体文件,每个媒体文件与多个呼叫中心代理的呼叫中心代理相关联并且接收多个条目。 对至少一些媒体文件执行语音处理,以识别多个术语中的至少一些术语的推定实例。 每个推定的实例与表征相应术语的识别质量的命中质量相关联。 确定用于执行说话者适应的一个或多个呼叫中心代理,包括识别与包括具有低于预定阈值的命中质量的一个或多个推定实例的至少一个媒体文件相关联的呼叫中心代理。 基于与所识别的呼叫中心代理相关联的媒体文件和所识别的多个术语的实例,为每个确定的呼叫中心代理执行音箱适配。
-
公开(公告)号:US20130294587A1
公开(公告)日:2013-11-07
申请号:US13463104
申请日:2012-05-03
IPC分类号: H04M1/64
CPC分类号: G10L15/07 , G10L2015/088 , H04M3/51 , H04M2201/40 , Y10S379/907
摘要: A method for speaker adaptation includes receiving a plurality of media files, each associated with a call center agent of a plurality of call center agents and receiving a plurality of terms. Speech processing is performed on at least some of the media files to identify putative instances of at least some of the plurality of terms. Each putative instance is associated with a hit quality that characterizes a quality of recognition of the corresponding term. One or more call center agents for performing speaker adaptation are determined, including identifying call center agents that are associated with at least one media file that includes one or more putative instances with a hit quality below a predetermined threshold. Speaker adaptation is performed for each identified call center agent based on the media files associated with the identified call center agent and the identified instances of the plurality of terms.
摘要翻译: 用于说话者适应的方法包括接收多个媒体文件,每个媒体文件与多个呼叫中心代理的呼叫中心代理相关联并且接收多个条目。 对至少一些媒体文件执行语音处理,以识别多个术语中的至少一些术语的推定实例。 每个推定的实例与表征相应术语的识别质量的命中质量相关联。 确定用于执行说话者适应的一个或多个呼叫中心代理,包括识别与包括具有低于预定阈值的命中质量的一个或多个推定实例的至少一个媒体文件相关联的呼叫中心代理。 基于与所识别的呼叫中心代理相关联的媒体文件和所识别的多个术语的实例,为每个确定的呼叫中心代理执行音箱适配。
-
-
-
-
-
-
-
-
-