System and method for extracting text captions from video and generating video summaries
    1.
    发明授权
    System and method for extracting text captions from video and generating video summaries 有权
    从视频中提取文字字幕并生成视频摘要的系统和方法

    公开(公告)号:US07339992B2

    公开(公告)日:2008-03-04

    申请号:US10494739

    申请日:2002-12-06

    摘要: Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain (210) and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions (240) within the caption boxes and then applying character and word recognition processing (250) to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.

    摘要翻译: 嵌入在视频内容中的字幕框可以被定位,并且标题框内的文本被解码。 通过将标题框区域定位在压缩视频域(210)中并且在字幕框所在的视频帧的区域内执行基于像素的处理操作来增强实时处理。 通过识别字幕框内的字区域(240),然后将字符和字识别处理(250)应用到所识别的字区域,来进一步改进字幕框。 基于域的模型用于改进文本识别结果。 提取的字幕框文本可用于检测视频内容中感兴趣的事件和应用于提取感兴趣事件的视频片段的语义模型。

    System and method for extracting text captions from video and generating video summaries
    2.
    发明授权
    System and method for extracting text captions from video and generating video summaries 有权
    从视频中提取文字字幕并生成视频摘要的系统和方法

    公开(公告)号:US08488682B2

    公开(公告)日:2013-07-16

    申请号:US11960424

    申请日:2007-12-19

    摘要: Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions within the caption boxes and then applying character and word recognition processing to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.

    摘要翻译: 嵌入在视频内容中的字幕框可以被定位,并且标题框内的文本被解码。 通过将标题框区域定位在压缩视频域中并且在字幕框所在的视频帧的区域内执行基于像素的处理操作来增强实时处理。 通过识别字幕框内的字区域,然后对识别的字区域应用字符和字识别处理,进一步细化字幕框。 基于域的模型用于改进文本识别结果。 提取的字幕框文本可用于检测视频内容中感兴趣的事件和应用于提取感兴趣事件的视频片段的语义模型。

    SYSTEM AND METHOD FOR EXTRACTING TEXT CAPTIONS FROM VIDEO AND GENERATING VIDEO SUMMARIES
    3.
    发明申请
    SYSTEM AND METHOD FOR EXTRACTING TEXT CAPTIONS FROM VIDEO AND GENERATING VIDEO SUMMARIES 有权
    从视频提取文本和生成视频摘要的系统和方法

    公开(公告)号:US20080303942A1

    公开(公告)日:2008-12-11

    申请号:US11960424

    申请日:2007-12-19

    IPC分类号: H04N7/00 H04N7/12

    摘要: Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions within the caption boxes and then applying character and word recognition processing to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.

    摘要翻译: 嵌入在视频内容中的字幕框可以被定位,并且标题框内的文本被解码。 通过将标题框区域定位在压缩视频域中并且在字幕框所在的视频帧的区域内执行基于像素的处理操作来增强实时处理。 通过识别字幕框内的字区域,然后对识别的字区域应用字符和字识别处理,进一步细化字幕框。 基于域的模型用于改进文本识别结果。 提取的字幕框文本可用于检测视频内容中感兴趣的事件和应用于提取感兴趣事件的视频片段的语义模型。

    Multimedia integration description scheme, method and system for MPEG-7

    公开(公告)号:US09239877B2

    公开(公告)日:2016-01-19

    申请号:US13169330

    申请日:2011-06-27

    IPC分类号: G06F17/30

    摘要: The invention provides a system and method for integrating multimedia descriptions in a way that allows humans, software components or devices to easily identify, represent, manage, retrieve, and categorize the multimedia content. In this manner, a user who may be interested in locating a specific piece of multimedia content from a database, Internet, or broadcast media, for example, may search for and find the multimedia content. In this regard, the invention provides a system and method that receives multimedia content and separates the multimedia content into separate components which are assigned to multimedia categories, such as image, video, audio, synthetic and text. Within each of the multimedia categories, the multimedia content is classified and descriptions of the multimedia content are generated. The descriptions are then formatted, integrated, using a multimedia integration description scheme, and the multimedia integration description is generated for the multimedia content. The multimedia description is then stored into a database. As a result, a user may query a search engine which then retrieves the multimedia content from the database whose integration description matches the query criteria specified by the user. The search engine can then provide the user a useful search result based on the multimedia integration description.

    Video concept classification using audio-visual atoms
    5.
    发明授权
    Video concept classification using audio-visual atoms 有权
    使用视听原子的视频概念分类

    公开(公告)号:US08135221B2

    公开(公告)日:2012-03-13

    申请号:US12574716

    申请日:2009-10-07

    IPC分类号: G06K9/62 G06K9/54

    CPC分类号: G06K9/00765 G10L25/00

    摘要: A method for determining a classification for a video segment, comprising the steps of: breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; analyzing the video frames for each short-term video slice to form a plurality of region tracks; analyzing each region track to form a visual feature vector and a motion feature vector; analyzing the audio signal for each short-term video slice to determine an audio feature vector; forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.

    摘要翻译: 一种用于确定视频段的分类的方法,包括以下步骤:将视频段分解成多个短视频片段,每个短片段包括多个视频帧和音频信号; 分析每个短期视频片段的视频帧以形成多个区域轨道; 分析每个区域轨迹以形成视觉特征向量和运动特征向量; 分析每个短期视频片段的音频信号以确定音频特征向量; 通过将特定区域轨道的视觉特征向量和运动特征向量与相应的音频特征向量组合,形成每个短期视频片段的多个短期视听原子; 并且使用分类器来确定响应于短期视听原子的视频片段的分类。

    METHODS AND ARCHITECTURE FOR INDEXING AND EDITING COMPRESSED VIDEO OVER THE WORLD WIDE WEB
    6.
    发明申请
    METHODS AND ARCHITECTURE FOR INDEXING AND EDITING COMPRESSED VIDEO OVER THE WORLD WIDE WEB 审中-公开
    在世界各地的网络上打印和编辑压缩视频的方法和架构

    公开(公告)号:US20110064136A1

    公开(公告)日:2011-03-17

    申请号:US12874337

    申请日:2010-09-02

    IPC分类号: H04N11/02

    CPC分类号: G11B27/28 G11B27/034

    摘要: A system and method is provided for editing and parsing compressed digital information. The compressed digital information may include visual information which is edited and parsed in the compressed domain. In a preferred embodiment, the present invention provides a method for detecting moving objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more captured scenes of video.

    摘要翻译: 提供了一种用于编辑和解析压缩数字信息的系统和方法。 压缩的数字信息可以包括在压缩域中编辑和解析的视觉信息。 在优选实施例中,本发明提供了一种用于检测压缩数字比特流中的运动对象的方法,该压缩数字比特流表示视频的一个或多个拍摄场景的视频信息的字段或帧序列。

    Image description system and method
    7.
    发明授权
    Image description system and method 有权
    图像描述系统和方法

    公开(公告)号:US07254285B1

    公开(公告)日:2007-08-07

    申请号:US09831215

    申请日:1999-11-05

    IPC分类号: G06K9/60

    CPC分类号: G06K9/4685

    摘要: Systems and methods for describing image content establish image description records which include an object set (24), an object hierarchy (26) and entity relation graphs (28). For image content, image objects can include global objects (O0 8) and local objects (O1 2 and O2 6). The image objects are further defined by a number of features of different classes (36, 38 and 40), which in turn are further defined by a number of feature descriptors. The relationships between and among the objects in the object set are defined by the object hierarchy (26) and entity relation graphs (28). The image description records provide a standard vehicle for describing the content and context of image information for subsequent access and processing by computer applications such as search engines, filters, and archive systems.

    摘要翻译: 用于描述图像内容的系统和方法建立包括对象集(24),对象层次(26)和实体关系图(28)的图像描述记录。 对于图像内容,图像对象可以包括全局对象(O 0 8)和本地对象(O 1 2和O 2 6)。 图像对象由不同类别(36,38和40)的许多特征进一步限定,这些特征又由许多特征描述符进一步限定。 对象集合中的对象之间和之间的关系由对象层次结构(26)和实体关系图(28)定义。 图像描述记录提供用于描述图像信息的内容和上下文的标准车辆,用于随后由计算机应用(例如搜索引擎,过滤器和归档系统)的访问和处理。

    Method and apparatus for watermarking images
    8.
    发明授权
    Method and apparatus for watermarking images 失效
    水印图像的方法和装置

    公开(公告)号:US06879703B2

    公开(公告)日:2005-04-12

    申请号:US10220776

    申请日:2002-01-10

    IPC分类号: G06T1/00 H04N1/32 G06K9/10

    摘要: Digital watermarks are embedded in image data (102)in order to enable authentication of the image data and/or replacement of rejected portions of the image data. Authentication codes are derived by comparing selected discrete cosine transform (DCT) (104) coefficients within DCT data (106) derived from the original, spatial-domain image data. The authentication codes thus generated are embedded in DCT coefficients (612) other than the ones which were used to derive the authentication codes. The resulting, watermarked data can be sent or made available to one or more recipients who can compress or otherwise use the watermarked data. Image data derived from the watermarked data—e.g, compressed versions of the watermarked data—can be authenticated by: extracting the embedded authentication codes, comparing DCT coefficients derived from the coefficients from which the original authentication codes were generated; and determining whether the compared DCT coefficients are consistent with the extracted authentication codes.

    摘要翻译: 数字水印被嵌入在图像数据(102)中,以便能够对图像数据进行认证和/或替换图像数据的被拒绝的部分。 通过比较从原始的空间域图像数据导出的DCT数据(106)内的选定的离散余弦变换(DCT)(104)系数,导出认证码。 这样生成的认证码被嵌入除了用于导出认证码的那些之外的DCT系数(612)中。 所得到的水印数据可以被发送或使其可用于可压缩或以其他方式使用水印数据的一个或多个接收者。 从水印数据导出的图像数据(例如,水印数据的压缩版本)可以通过以下方式来认证:提取嵌入的认证码,比较从产生原始认证码的系数导出的DCT系数; 以及确定所述比较的DCT系数是否与所提取的认证码一致。

    System and method for dynamically and interactively searching media data
    9.
    发明授权
    System and method for dynamically and interactively searching media data 有权
    用于动态和交互地搜索媒体数据的系统和方法

    公开(公告)号:US08364673B2

    公开(公告)日:2013-01-29

    申请号:US12969101

    申请日:2010-12-15

    IPC分类号: G06F7/00 G06F17/30

    摘要: Systems and methods for searching a database of media content wherein the user can dynamically and interactively perform searches and navigate search results. One or more search anchors are received, and at least one of the search anchors is associated with an anchor cell on a navigation map. One or more documents assigned to at least one cell on the navigation map can be determined, and the cells are populated with search results based at least in part on the search anchors. At least one of the documents is then displayed to a user.

    摘要翻译: 用于搜索媒体内容的数据库的系统和方法,其中用户可以动态地和交互地执行搜索并导航搜索结果。 接收一个或多个搜索锚点,并且搜索锚点中的至少一个与导航地图上的锚小区相关联。 可以确定分配给导航地图上的至少一个单元的一个或多个文档,并且至少部分地基于搜索锚点来填充具有搜索结果的单元。 然后,至少一个文档被显示给用户。

    System And Method For Annotating And Searching Media
    10.
    发明申请
    System And Method For Annotating And Searching Media 审中-公开
    用于注释和搜索媒体的系统和方法

    公开(公告)号:US20110314367A1

    公开(公告)日:2011-12-22

    申请号:US13165553

    申请日:2011-06-21

    IPC分类号: G06F17/20

    CPC分类号: G06F16/437

    摘要: A system and method for labeling and classifying multimedia data is provided that includes novel label propagation techniques and classification function characteristics. The system and method corrects and propagates a small number of potentially erroneous labels to a large amount of multimedia data and generate optimal ways of ranking, classification, and presentation of the data sets. The disclosed systems and methods improve upon prior systems and methods and provide an improved approach to the problems of imbalanced data sets and incorrect label data.

    摘要翻译: 提供了一种用于标记和分类多媒体数据的系统和方法,其包括新颖的标签传播技术和分类功能特征。 该系统和方法将少量潜在错误的标签校正并传播到大量的多媒体数据,并产生数据集的排序,分类和呈现的最佳方式。 所公开的系统和方法改进了现有系统和方法,并且提供了对不平衡数据集和不正确标签数据的问题的改进方法。