Systems and methods for the automatic extraction of audio excerpts
    1.
    发明授权
    Systems and methods for the automatic extraction of audio excerpts 失效
    自动提取音频摘录的系统和方法

    公开(公告)号:US07260439B2

    公开(公告)日:2007-08-21

    申请号:US09985073

    申请日:2001-11-01

    CPC classification number: G11B27/28

    Abstract: A method of extracting audio excerpts comprises: segmenting audio data into a plurality of audio data segments; setting a fitness criteria for the plurality of audio data segments; analyzing the plurality of audio data segments based on the fitness criteria; and selecting one of the plurality of audio data segments that satisfies the fitness criteria. In various exemplary embodiments, the method of extracting audio excerpts further comprises associating the selected one of the plurality of audio data segments with video data. In such embodiments, associating the selected one of the plurality of audio data segments with video data may comprise associating the selected one of the plurality of audio data segments with a keyframe.

    Abstract translation: 提取音频摘录的方法包括:将音频数据分割成多个音频数据段; 为所述多个音频数据段设置适合性标准; 基于适合性标准分析多个音频数据段; 以及选择满足适合度标准的多个音频数据段中的一个。 在各种示例性实施例中,提取音频摘录的方法还包括将所述多个音频数据段中的所选择的一个与视频数据相关联。 在这样的实施例中,将多个音频数据段中的所选择的一个与视频数据相关联可以包括将多个音频数据段中的所选择的一个与关键帧相关联。

    Summarization of digital files
    2.
    发明授权
    Summarization of digital files 有权
    数字文件汇总

    公开(公告)号:US07284004B2

    公开(公告)日:2007-10-16

    申请号:US10271407

    申请日:2002-10-15

    Abstract: Embodiments of the present invention provide a method for producing a summary of a digital file on one or more computers. The method includes segmenting the digital file into a plurality of segments, clustering said segments into a plurality of clusters and selecting a cluster from said plurality of clusters wherein said selected cluster includes segments representative of said digital file. Upon selection of a cluster a segment of the cluster is provided as a summary of said digital file.

    Abstract translation: 本发明的实施例提供一种用于在一个或多个计算机上产生数字文件概要的方法。 该方法包括将数字文件分割成多个段,将所述段聚类成多个群集,并从所述多个群集中选择群集,其中所述选定的群集包括表示所述数字文件的段。 在选择集群时,提供集群的一部分作为所述数字文件的概要。

    Systems and methods for media summarization
    3.
    发明授权
    Systems and methods for media summarization 有权
    媒体摘要的系统和方法

    公开(公告)号:US07424150B2

    公开(公告)日:2008-09-09

    申请号:US10728777

    申请日:2003-12-08

    CPC classification number: G11B27/28

    Abstract: A stream of ordered information, such as, for example, audio, video and/or text data, can be windowed and parameterized. A similarity between the parameterized and windowed stream of ordered information can be determined, and a probabilistic decomposition or probabilistic matrix factorization, such as non-negative matrix factorization, can be applied to the similarity matrix. The component matrices resulting from the decomposition indicate major components or segments of the ordered information. Excerpts can then be extracted from the stream of ordered information based on the component matrices to generate a summary of the stream of ordered information.

    Abstract translation: 可以对有序信息流(例如音频,视频和/或文本数据)进行加窗和参数化。 可以确定有序信息的参数化和窗口流之间的相似性,并且可以将概率分解或概率矩阵分解(例如非负矩阵因式分解)应用于相似矩阵。 由分解产生的分量矩阵表示有序信息的主要组成部分。 然后可以基于分量矩阵从有序信息流中提取摘录,以生成有序信息流的摘要。

    Image classifying systems and methods
    4.
    发明授权
    Image classifying systems and methods 有权
    图像分类系统和方法

    公开(公告)号:US07327347B2

    公开(公告)日:2008-02-05

    申请号:US10325913

    申请日:2002-12-23

    CPC classification number: G06F17/30265

    Abstract: Methods and systems for classifying images, such as photographs, allow a user to incorporate subjective judgments regarding photograph qualities when making classification decisions. A slide-show interface allows a user to classify and advance photographs with a one-key action or a single interaction event. The interface presents related information relevant to a displayed photograph that is to be classified, such as contiguous photographs, similar photographs, and other versions of the same photograph. The methods and systems provide an overview interface which allows a user to review and refine classification decisions in the context of the original sequence of photographs.

    Abstract translation: 用于分类图像(如照片)的方法和系统允许用户在进行分类决定时纳入关于照片质量的主观判断。 幻灯片显示界面允许用户使用单键操作或单个交互事件对照片进行分类和推进。 该界面显示与要分类的显示照片相关的相关信息,例如相邻照片,相似照片和相同照片的其他版本。 方法和系统提供概览界面,其允许用户在原始照片序列的上下文中审查和改进分类决定。

    Methods and systems for discriminative keyframe selection
    5.
    发明授权
    Methods and systems for discriminative keyframe selection 有权
    用于辨别关键帧选择的方法和系统

    公开(公告)号:US07778469B2

    公开(公告)日:2010-08-17

    申请号:US10678935

    申请日:2003-10-03

    CPC classification number: G11B27/28 G06K9/00711

    Abstract: Embodiments of the present invention provide a system and method for discriminatively selecting keyframes that are representative of segments of a source digital media and at the same time distinguishable from other keyframes representing other segments of the digital media. The method and system, in one embodiment, includes pre-processing the source digital media to obtain feature vectors for frames of the media. Discriminatively selecting a keyframe as a representative for each segment of a source digital media wherein said discriminative selection includes determining a similarity measure for each candidate keyframe and determining a dis-similarity measure for each candidate keyframe and selecting the keyframe with the highest goodness value computing from the similarity and dis-similarity measures.

    Abstract translation: 本发明的实施例提供了一种系统和方法,用于区分性地选择代表源数字媒体的片段的关键帧,并且同时与代表数字媒体的其他片段的其他关键帧可分辨。 在一个实施例中,该方法和系统包括预处理源数字媒体以获得媒体帧的特征向量。 识别性地选择关键帧作为源数字媒体的每个片段的代表,其中所述鉴别选择包括确定每个候选关键帧的相似性度量,并且确定每个候选关键帧的不相似性度量,并且选择具有最高善计值计算的关键帧 相似和不相似的措施。

    Automatic generation of multimedia presentation
    6.
    发明授权
    Automatic generation of multimedia presentation 有权
    自动生成多媒体演示

    公开(公告)号:US07383509B2

    公开(公告)日:2008-06-03

    申请号:US10243220

    申请日:2002-09-13

    CPC classification number: G09B7/00 G09B5/00 G10L15/26 G10L2021/105

    Abstract: The present invention provides a system and method for automatically combining image and audio data to create a multimedia presentation. In one embodiment, audio and image data are received by the system. The audio data includes a list of events that correspond to points of interest in an audio file. The audio data may also include an audio file or audio stream. The received images are then matched to the audio file or stream using the time. In one embodiment, the events represent times within the audio file or stream at which there is a certain feature or characteristic in the audio file. The audio events list may be processed to remove, sort or predict or otherwise generate audio events. Images processing may also occur, and may include image analysis to determine image matching to the event list, deleting images, and processing images to incorporate effects. Image effects may include cropping, panning, zooming and other visual effects.

    Abstract translation: 本发明提供一种用于自动组合图像和音频数据以创建多媒体呈现的系统和方法。 在一个实施例中,系统接收音频和图像数据。 音频数据包括与音频文件中的兴趣点对应的事件的列表。 音频数据还可以包括音频文件或音频流。 然后使用该时间将接收到的图像与音频文件或流进行匹配。 在一个实施例中,事件表示在音频文件或音频文件中具有特定特征的音频文件或流中的时间。 可以处理音频事件列表以移除,排序或预测或以其他方式生成音频事件。 也可能发生图像处理,并且可以包括图像分析以确定与事件列表的图像匹配,删除图像以及处理图像以合并效果。 图像效果可能包括裁剪,平移,缩放和其他视觉效果。

    Capturing and producing shared multi-resolution video
    8.
    发明授权
    Capturing and producing shared multi-resolution video 有权
    捕获和制作共享多分辨率视频

    公开(公告)号:US06839067B2

    公开(公告)日:2005-01-04

    申请号:US10205739

    申请日:2002-07-26

    CPC classification number: G08B13/19643 H04N7/142 H04N7/147 H04N7/15

    Abstract: A method and apparatus for providing multi-resolution video to multiple users under hybrid human and automatic control. Initial environment and close-up images are captured using a first camera and a PTZ camera. The initial images are then stored in memory. Current environment and close-up images are captured and the an estimated difference between the initial and current images and the true image is determined. The estimated differences are weighted and compared and the stored images are updated. A close-up image is then provided to each user of the system. The close-up camera is then directed to a portion of the environment image having high distortion, and current environment and close-up images are captured again.

    Abstract translation: 一种用于在混合人力和自动控制下向多个用户提供多分辨率视频的方法和装置。 使用第一台摄像机和一台PTZ摄像机拍摄初始环境和特写图像。 然后将初始图像存储在存储器中。 捕获当前环境和特写图像,并确定初始图像和当前图像与真实图像之间的估计差异。 估计的差异被加权和比较,并且存储的图像被更新。 然后将特写图像提供给系统的每个用户。 特写相机然后被引导到具有高失真的环境图像的一部分,并且再次捕获当前环境和特写图像。

    Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio speaker recognition
    9.
    发明授权
    Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio speaker recognition 有权
    用于使用图像相似性搜索和音频扬声器识别分割视听记录的方法和装置

    公开(公告)号:US06404925B1

    公开(公告)日:2002-06-11

    申请号:US09266561

    申请日:1999-03-11

    Abstract: Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order. Merged clustered audio intervals corresponding to a single speaker are then used as training data for a speaker segmentation system. Using speaker identification techniques, the full video is then segmented into individual presentations based on the extent of each presenter's speech. The speaker identification system optionally includes the construction of a hidden Markov model trained on the audio data from each slide interval. A Viterbi assignment then segments the audio according to speaker.

    Abstract translation: 描述了由一个或多个扬声器分割包含幻灯片呈现的会议音频视频记录的方法。 这些段作为记录会议的索引。 如果为会议提供议程,则可以使用来自议程的信息来标记这些细分。 系统自动检测与演示幻灯片相对应的视频间隔。 假设在视频中显示幻灯片的间隔期间只有一个人正在说话,通过查找这些区域,可以从音频音轨提取可能的扬声器间隔。 由于相同的说话者可以在多个幻灯片间隔中进行交谈,所以将来自这些间隔的声学数据进行聚类,以产生不同扬声器数量及其顺序的估计。 从这些间隔聚集音频数据产生不同扬声器数量及其顺序的估计。 然后将对应于单个扬声器的合并的群集音频间隔用作用于讲话者分割系统的训练数据。 使用扬声器识别技术,根据每位演讲者的讲话范围,将完整的视频分割成单独的演示文稿。 扬声器识别系统可选地包括针对来自每个幻灯片间隔的音频数据训练的隐马尔可夫模型的构造。 维特比分配然后根据扬声器分割音频。

    System and method for detecting and ranking images in order of usefulness based on vignette score
    10.
    发明授权
    System and method for detecting and ranking images in order of usefulness based on vignette score 有权
    用于基于小插曲得分的有用性检测和排序图像的系统和方法

    公开(公告)号:US07492921B2

    公开(公告)日:2009-02-17

    申请号:US11032576

    申请日:2005-01-10

    CPC classification number: G06F17/30247

    Abstract: A system and method for detecting useful images and for ranking images in order of usefulness based on a vignette score describing how closely each one resembles a “vignette,” or a central object or image surrounded by a featureless or deemphasized background. Several methods for determining an image's vignette score are disclosed as examples. Variance ratio analysis entails calculation of the ratio of variance between the edge region of the image and the entire image. Statistical model analysis entails developing a statistical classifier capable of determining a statistical model of each image class based on pre-entered training data. Spatial frequency analysis involves estimating the energy at different spatial frequencies in the central and edge regions and in the image as a whole. A vignette score is calculated as the ratio of mid-frequency energies in the edge region to the mid-frequency energies of the entire image.

    Abstract translation: 一种用于检测有用图像并根据用于评估图像的顺序对图像进行排序的系统和方法,所述小插曲得分描述了每个图像类似于“小插曲”的密切程度,或由无特征或不加重背景包围的中心对象或图像。 作为示例公开了用于确定图像晕影得分的几种方法。 方差比分析需要计算图像的边缘区域与整个图像之间的方差比。 统计模型分析需要开发能够基于预先输入的训练数据来确定每个图像类别的统计模型的统计分类器。 空间频率分析涉及估计中央和边缘区域以及整个图像中不同空间频率的能量。 晕影得分被计算为边缘区域中的中频能量与整个图像的中频能量的比率。

Patent Agency Ranking