Augmenting a training set for document categorization
    2.
    发明授权
    Augmenting a training set for document categorization 有权
    增加文件分类培训

    公开(公告)号:US09058382B2

    公开(公告)日:2015-06-16

    申请号:US12254798

    申请日:2008-10-20

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.

    摘要翻译: 提供了一种用于增加用于训练文档分类器的训练集的方法和系统。 增强系统使用基于文档层次结构的文档特征从训练数据中增加训练集。 初始训练集的训练数据可以从文档层级的根文档中导出。 增强系统生成额外的培训数据,其中包括表示文档层次结构的整体特征的聚合特征,而不仅仅是根文档。 在产生训练数据之后,增强系统利用新生成的训练数据增加初始训练集。

    Web forum crawling using skeletal links
    3.
    发明授权
    Web forum crawling using skeletal links 有权
    使用骨架链接的网页论坛抓取

    公开(公告)号:US08700600B2

    公开(公告)日:2014-04-15

    申请号:US13351952

    申请日:2012-01-17

    IPC分类号: G06F7/20 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.

    摘要翻译: 提供了一种用于识别用于爬行网站的网站的信息链接的方法和系统。 论坛搜寻器分析网页论坛的示例网页,以识别信息链接,然后通过确定为信息而不是遵循其他链接的链接抓取网页论坛。 论坛搜寻器系统基于它们是网站的整体结构的一部分还是用于选择分割到多个网页上的顺序信息来确定链接是否具有信息性。

    Selecting advertisements based on serving area and map area
    4.
    发明授权
    Selecting advertisements based on serving area and map area 有权
    根据服务区域和地图区域选择广告

    公开(公告)号:US08666821B2

    公开(公告)日:2014-03-04

    申请号:US11467810

    申请日:2006-08-28

    IPC分类号: G06Q30/00 G06F7/00 G06F17/30

    摘要: Methods and systems for selecting advertisements to present to a user of a computing device are provided. An advertisement system selects advertisements to display to a user based on the serving area of candidate advertisements. The advertisement system selects those candidate advertisements whose serving area encompasses the user's current location. The advertisement system may also select candidate advertisements to present to a user based on a map area currently being displayed to the user. The advertisement system may filter the candidate advertisements based on the provider location being within the map area that is currently being displayed to the user.

    摘要翻译: 提供了用于选择向计算设备的用户呈现的广告的方法和系统。 广告系统基于候选广告的服务区域选择向用户显示的广告。 广告系统选择其服务区域包含用户当前位置的那些候选广告。 广告系统还可以基于当前正在显示给用户的地图区域来选择向用户呈现的候选广告。 广告系统可以基于当前正在向用户显示的地图区域内的提供者位置来过滤候选广告。

    Music recommendation using emotional allocation modeling
    5.
    发明授权
    Music recommendation using emotional allocation modeling 有权
    音乐推荐使用情感配置建模

    公开(公告)号:US08650094B2

    公开(公告)日:2014-02-11

    申请号:US12116855

    申请日:2008-05-07

    IPC分类号: G06Q30/00

    摘要: An exemplary method includes defining a vocabulary for emotions; extracting descriptions for songs; generating distributions for the songs in an emotion space based at least in part on the vocabulary and the extracted descriptions; extracting salient words from a document; generating a distribution for the document in an emotion space based at least in part on the vocabulary and the extracted salient words; and matching the distribution for the document to one or more of the distributions for the songs. Various other exemplary methods, devices, systems, etc., are also disclosed.

    摘要翻译: 一种示例性方法包括定义情绪词汇; 提取歌曲的描述; 至少部分地基于词汇和所提取的描述来生成情感空间中的歌曲的分布; 从文档中提取突出的单词; 至少部分地基于词汇和提取的突出词语在情感空间中生成文档的分发; 并将文档的分发与歌曲的一个或多个分发相匹配。 还公开了各种其它示例性方法,装置,系统等。

    Scoring relevance of a document based on image text
    6.
    发明授权
    Scoring relevance of a document based on image text 有权
    基于图像文本评估文档的相关性

    公开(公告)号:US08645370B2

    公开(公告)日:2014-02-04

    申请号:US12972259

    申请日:2010-12-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30265

    摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.

    摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。

    Ranking advertisement(s) based upon advertisement feature(s)
    7.
    发明授权
    Ranking advertisement(s) based upon advertisement feature(s) 有权
    基于广告功能的排名广告

    公开(公告)号:US08620912B2

    公开(公告)日:2013-12-31

    申请号:US12816533

    申请日:2010-06-16

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06Q30/0241 G06Q30/0251

    摘要: While browsing, a user may interact with a wide variety of images. The user may upload and share images taken with a digital camera and/or search for image using a search engine. Because images are rich in contextual information, it may be advantageous to provide additional information, such as adjacent market advertising based upon matching advertisements with contextual information of the images. Accordingly, a query image may be used to retrieve a video frame set. The video frame set may be expanded with related video frames corresponding to adjacent markets. The expanded video frame set may be grouped into clusters of similar frames. The clusters may be used to rank advertisements based upon how similar the advertisements are to the clusters and/or video frames within the clusters. In this way, one or more ranked advertisements may be presented with the query image.

    摘要翻译: 在浏览时,用户可以与各种图像进行交互。 用户可以上传和共享用数码相机拍摄的图像和/或使用搜索引擎来搜索图像。 由于图像丰富的上下文信息,提供附加信息可能是有利的,例如基于匹配广告的相邻市场广告与图像的上下文信息。 因此,可以使用查询图像来检索视频帧集合。 视频帧集可以用对应于相邻市场的相关视频帧进行扩展。 扩展的视频帧集可以被分组成类似帧的簇。 群集可以用于基于广告与群集内的群集和/或视频帧的相似度来对广告进行排名。 以这种方式,可以向查询图像呈现一个或多个排名的广告。

    IDENTIFICATION OF DUPLICATES WITHIN AN IMAGE SPACE
    8.
    发明申请
    IDENTIFICATION OF DUPLICATES WITHIN AN IMAGE SPACE 有权
    在图像空间中识别重复数据

    公开(公告)号:US20130287302A1

    公开(公告)日:2013-10-31

    申请号:US13459777

    申请日:2012-04-30

    IPC分类号: G06K9/62 G06K9/46

    摘要: Implementations for identifying duplicate images in an image space are described. An image space is partitioned into a plurality of coarse clusters based on signatures of the images within the image space. The signatures are determined from compact descriptors of the images. Refined clusters that include one or more images of an individual coarse cluster are created based on pair-wise comparisons of the compact descriptors of images in the coarse cluster, and the refined clusters are identified as sets of duplicate images. The refined clusters are grown by searching in similar coarse clusters for images to add to the refined clusters.

    摘要翻译: 描述用于在图像空间中识别重复图像的实现。 基于图像空间内的图像的签名,图像空间被分割成多个粗簇。 签名由图像的紧凑描述符确定。 基于粗略集群中的图像的紧凑描述符的成对比较,创建包括单个粗集群的一个或多个图像的精细集群,并且将精细集群标识为重复图像的集合。 通过在类似的粗簇中搜索图像以增加到精细簇,生长精细簇。

    Interactive framework for name disambiguation
    9.
    发明授权
    Interactive framework for name disambiguation 有权
    互动框架的名称消歧

    公开(公告)号:US08538898B2

    公开(公告)日:2013-09-17

    申请号:US13118404

    申请日:2011-05-28

    IPC分类号: G06N5/00

    CPC分类号: G06N99/005 G06F17/30616

    摘要: A “Name Disambiguator” provides various techniques for implementing an interactive framework for resolving or disambiguating entity names (associated with objects such as publications) for entity searches where two or more same or similar names may refer to different entities. More specifically, the Name Disambiguator uses a combination of user input and automatic models to address the disambiguation problem. In various embodiments, the Name Disambiguator uses a two part process, including: 1) a global SVM trained from large sets of documents or objects in a simulated interactive mode, and 2) further personalization of local SVM models (associated with individual names or groups of names such as, for example, a group of coauthors) derived from the global SVM model. The result of this process is that large sets of documents or objects are rapidly and accurately condensed or clustered into ordered sets by that are organized by entity names.

    摘要翻译: “名称歧义者”提供了各种技术,用于实现用于解析或消除实体名称(与诸如出版物的对象相关联)的交互式框架,用于实体搜索,其中两个或多个相同或相似的名称可以指代不同的实体。 更具体地说,名称消歧器使用用户输入和自动模型的组合来解决消歧问题。 在各种实施例中,名称消歧器使用两部分过程,包括:1)以模拟交互模式从大量文档或对象训练的全局SVM,以及2)本地SVM模型的进一步个性化(与个体名称或组相关联 来自全球SVM模型的名称,例如一组合作者。 这个过程的结果是,大量的文档或对象可以通过按实体名称组织的快速,准确的浓缩或聚类成有序集。

    Annotation by search
    10.
    发明授权
    Annotation by search 有权
    通过搜索注释

    公开(公告)号:US08341112B2

    公开(公告)日:2012-12-25

    申请号:US11419368

    申请日:2006-05-19

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30265

    摘要: Annotation by search is described. In one aspect, a data store is searched for images that are semantically related to a baseline annotation of a given image and visually similar to the given image. The given image is then annotated with common concepts of annotations associated with at least a subset of the semantically and visually related images.

    摘要翻译: 描述了通过搜索的注释。 在一个方面,数据存储器搜索与给定图像的基线注释语义相关的图像,并且在视觉上类似于给定图像。 给定的图像然后用与语义和视觉相关的图像的至少一个子集相关联的注释的通用概念进行注释。