-
公开(公告)号:US20130060572A1
公开(公告)日:2013-03-07
申请号:US13602991
申请日:2012-09-04
申请人: Jacob B. Garland , Drew Lanham , Daryl Kip Watters , Marsal Gavalda , Mark Finlay , Kenneth K. Griggs
发明人: Jacob B. Garland , Drew Lanham , Daryl Kip Watters , Marsal Gavalda , Mark Finlay , Kenneth K. Griggs
IPC分类号: G10L15/04
摘要: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.
摘要翻译: 一方面,通常,用于对准音频记录和抄本的方法包括接收包括多个术语的抄本,所述多个术语的每个术语与音频记录的不同版本内的时间位置相关联,形成多个 根据抄本的条款,确定音频记录中的搜索项的可能的时间位置,确定与搜索项相关联的音频记录的不同版本之间的时间位置与搜索的可能时间位置之间的对应关系 音频记录中的术语,以及对准音频记录和记录,包括基于所确定的对应来更新与抄本的术语相关联的时间位置。
-
公开(公告)号:US09536567B2
公开(公告)日:2017-01-03
申请号:US13602991
申请日:2012-09-04
申请人: Jacob B. Garland , Drew Lanham , Daryl Kip Watters , Marsal Gavalda , Mark Finlay , Kenneth K. Griggs
发明人: Jacob B. Garland , Drew Lanham , Daryl Kip Watters , Marsal Gavalda , Mark Finlay , Kenneth K. Griggs
摘要: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.
摘要翻译: 一方面,通常,用于对准音频记录和抄本的方法包括接收包括多个术语的抄本,所述多个术语的每个术语与音频记录的不同版本内的时间位置相关联,形成多个 根据抄本的条款,确定音频记录中的搜索项的可能的时间位置,确定与搜索项相关联的音频记录的不同版本之间的时间位置与搜索的可能时间位置之间的对应关系 音频记录中的术语,以及对准音频记录和记录,包括基于所确定的对应来更新与抄本的术语相关联的时间位置。
-
公开(公告)号:US20100299131A1
公开(公告)日:2010-11-25
申请号:US12469916
申请日:2009-05-21
申请人: Drew Lanham , Daryl Kip Watters , Marsal Gavalda
发明人: Drew Lanham , Daryl Kip Watters , Marsal Gavalda
CPC分类号: G10L15/10 , G06K9/00711
摘要: Some general aspects relate to systems, software, and methods for media processing. In one aspect, a script associated with a multimedia recording is accepted, wherein the script includes dialogue, speaker indications and video event indications. A group of search terms are formed from the dialogue, with each search term being associated with a location within the script. Zero or more putative locations of each of the search terms are identified in a time interval of the multimedia recording. For at least some of the search terms, multiple putative locations are identified in the time interval of the multimedia recording. The time interval of the multimedia recording and the script are partially aligned using the determined putative locations of the search terms and one or more of the following: a result of matching audio characteristics of the multimedia recording with the speaker indications, and a result of matching video characteristics of the multimedia recording with the video event indications. Based on a result of the partial alignment, event-localization information is generated. Further processing of the generated event-localization information is enabled.
摘要翻译: 一些一般方面涉及媒体处理的系统,软件和方法。 一方面,接受与多媒体记录相关联的脚本,其中脚本包括对话,说话者指示和视频事件指示。 从对话形成一组搜索词,每个搜索词与脚本中的位置相关联。 在多媒体记录的时间间隔中识别每个搜索词的零个或多个推定的位置。 对于至少一些搜索术语,在多媒体记录的时间间隔中识别出多个推定位置。 多媒体记录和脚本的时间间隔使用确定的搜索项的推定位置和以下的一个或多个来部分对齐:多媒体记录与扬声器指示的音频特性匹配的结果以及匹配的结果 具有视频事件指示的多媒体录像的视频特性。 基于部分对准的结果,生成事件定位信息。 可以进一步处理生成的事件定位信息。
-
公开(公告)号:US20120059656A1
公开(公告)日:2012-03-08
申请号:US13221270
申请日:2011-08-30
申请人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
发明人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
IPC分类号: G10L15/04
CPC分类号: G10L25/00
摘要: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
摘要翻译: 一种用于确定第一音频源和第二音频源之间的相似度的方法包括:对于第一音频源,确定多个音素序列中的每一个的第一出现频率,并且确定多个音素中的每一个的第一加权频率 基于音素序列的第一个发生频率的音素序列; 对于第二音频源,确定多个音素序列中的每一个的第二出现频率,并且基于音素序列的第二出现频率确定多个音素序列中的每一个的第二加权频率; 将每个音素序列的第一加权频率与相应音素序列的第二加权频率进行比较; 以及基于所述比较的结果生成表示所述第一音频源和所述第二音频源之间的相似度的相似度分数。
-
公开(公告)号:US08670983B2
公开(公告)日:2014-03-11
申请号:US13221270
申请日:2011-08-30
申请人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
发明人: Jacob B. Garland , Jon A. Arrowood , Drew Lanham , Marsal Gavalda
IPC分类号: G10L15/04
CPC分类号: G10L25/00
摘要: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
摘要翻译: 一种用于确定第一音频源和第二音频源之间的相似度的方法包括:对于第一音频源,确定多个音素序列中的每一个的第一出现频率,并且确定多个音素中的每一个的第一加权频率 基于音素序列的第一个发生频率的音素序列; 对于第二音频源,确定多个音素序列中的每一个的第二出现频率,并且基于音素序列的第二出现频率确定多个音素序列中的每一个的第二加权频率; 将每个音素序列的第一加权频率与相应音素序列的第二加权频率进行比较; 以及基于所述比较的结果生成表示所述第一音频源和所述第二音频源之间的相似度的相似度分数。
-
公开(公告)号:US20100274667A1
公开(公告)日:2010-10-28
申请号:US12429218
申请日:2009-04-24
申请人: Drew Lanham , Marsal Gavalda , John Willcutts , Gordon Edwards
发明人: Drew Lanham , Marsal Gavalda , John Willcutts , Gordon Edwards
CPC分类号: G06Q30/0251 , G06F16/433 , G06F16/48 , G06Q30/02 , G10L15/26
摘要: A computer-implemented method provides access to multimedia content, which include units of content that include audio components. Meta data for the units of content is formed to an association of key phrases detected in the audio components and the units. In some examples, forming the meta data includes determining a candidate set of key phrases associated with the unit of multimedia and searching for the presence of the candidate key phrases in the audio components. Forming the meta data then includes forming data representing the presence of key phrases in the audio components.
摘要翻译: 计算机实现的方法提供对多媒体内容的访问,其包括包括音频组件的内容单元。 内容单元的元数据形成为在音频组件和单元中检测到的关键短语的关联。 在一些示例中,形成元数据包括确定与多媒体单元相关联的关键短语的候选集合,并且搜索音频分量中候选键短语的存在。 形成元数据然后包括形成表示在音频分量中存在关键短语的数据。
-
-
-
-
-