KEYWORDS EXTRACTION AND ENRICHMENT VIA CATEGORIZATION SYSTEMS
    1.
    发明申请
    KEYWORDS EXTRACTION AND ENRICHMENT VIA CATEGORIZATION SYSTEMS 有权
    关键词通过分类系统提取和丰富

    公开(公告)号:US20120166441A1

    公开(公告)日:2012-06-28

    申请号:US12978169

    申请日:2010-12-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Techniques for determining a set of keywords associated with a document are provided. A document is received that may be classified into a taxonomy that includes a plurality of categories. A categorization ranking is determined for each category for the received document. A set of categories of the taxonomy having highest categorization rankings is determined for the received document. Documents representing the set of categories having highest categorization rankings are combined together into a cumulative representative text that includes a plurality of terms. A cumulative term corpus importance score is determined for each term in the cumulative representative text. The cumulative term corpus importance score for a particular term indicates an importance of the particular term in a context of the cumulative representative text. A set of terms of the cumulative representative text having highest cumulative term corpus importance scores is selected to be keywords for the received document.

    摘要翻译: 提供了用于确定与文档相关联的一组关键词的技术。 收到可被分类为包括多个类别的分类法的文档。 为接收到的文档的每个类别确定分类排名。 对于接收到的文档确定具有最高分类排名的分类的一组类别。 表示具有最高分类排名的类别集合的文档被组合成包括多个项的累积代表性文本。 累积代表性文本中的每个术语确定累积项目语料库重要性分数。 特定术语的累积术语语料库重要性分数表示特定术语在累积代表性文本的上下文中的重要性。 选择具有最高累积项语料库重要性分数的累积代表性文本的一组术语作为接收到的文档的关键字。

    Keywords extraction and enrichment via categorization systems
    2.
    发明授权
    Keywords extraction and enrichment via categorization systems 有权
    关键词通过分类系统提取和浓缩

    公开(公告)号:US09342590B2

    公开(公告)日:2016-05-17

    申请号:US12978169

    申请日:2010-12-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3071

    摘要: Techniques for determining a set of keywords associated with a document are provided. A document is received that may be classified into a taxonomy that includes a plurality of categories. A categorization ranking is determined for each category for the received document. A set of categories of the taxonomy having highest categorization rankings is determined for the received document. Documents representing the set of categories having highest categorization rankings are combined together into a cumulative representative text that includes a plurality of terms. A cumulative term corpus importance score is determined for each term in the cumulative representative text. The cumulative term corpus importance score for a particular term indicates an importance of the particular term in a context of the cumulative representative text. A set of terms of the cumulative representative text having highest cumulative term corpus importance scores is selected to be keywords for the received document.

    摘要翻译: 提供了用于确定与文档相关联的一组关键词的技术。 收到可被分类为包括多个类别的分类法的文档。 为接收到的文档的每个类别确定分类排名。 对于接收到的文档确定具有最高分类排名的分类的一组类别。 表示具有最高分类排名的类别集合的文档被组合成包括多个项的累积代表性文本。 累积代表性文本中的每个术语确定累积项目语料库重要性分数。 特定术语的累积术语语料库重要性分数表示特定术语在累积代表性文本的上下文中的重要性。 选择具有最高累积项语料库重要性分数的累积代表性文本的一组术语作为接收到的文档的关键字。

    Hierarchical Content Classification Into Deep Taxonomies
    3.
    发明申请
    Hierarchical Content Classification Into Deep Taxonomies 审中-公开
    分层内容分类成深入分类法

    公开(公告)号:US20110282858A1

    公开(公告)日:2011-11-17

    申请号:US12777260

    申请日:2010-05-11

    IPC分类号: G06F17/30

    CPC分类号: G06F16/353

    摘要: A document may be classified by traversing a hierarchical classification tree and comparing the words in the document to words in documents representing the nodes on the classification tree. The document may be classified by traversing the classification tree and generating a comparison score based on word comparisons. The score may be used to trim the classification tree or to advance to another node on the tree. The score may be based on a scarcity or importance of individual words in the document compared to the scarcity or importance of words in the category. The result may be a set of classifications with scores for those classifications.

    摘要翻译: 可以通过遍历分层分类树并将文档中的单词与表示分类树上的节点的文档中的单词进行比较来分类文档。 可以通过遍历分类树并基于词比较来生成比较分数来分类文档。 该分数可用于修剪分类树或推进到树上的另一个节点。 该分数可能是基于文档中单词的稀缺性或重要性,与该类别中的单词的稀缺性或重要性相比较。 结果可能是一组具有这些分类的分数的分类。

    Online relevance engine
    4.
    发明授权
    Online relevance engine 有权
    在线相关引擎

    公开(公告)号:US08135739B2

    公开(公告)日:2012-03-13

    申请号:US12344812

    申请日:2008-12-29

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Information is automatically located which is relevant to source content that a user is viewing on a user interface without requiring the user to perform an additional search or navigate links of the source content. The source content can be, e.g., a web page or a document from a word processing or email application. The relevant information can include images, videos, web pages, maps or other location-based information, people-based information and special services which aggregate different types of information. Related content is located by analyzing textual content, user behavior and connectivity relative to the source. The related content is scored for similarity to the source. Content which is sufficiently similar but not too similar is selected. Similar related content is grouped to select representative results. The selected content is filtering in multiple stages based on attribute priorities to avoid unnecessary processing of content which is filtered out an early stage.

    摘要翻译: 自动定位与用户正在用户界面上观看的源内容相关的信息,而不需要用户执行附加搜索或浏览源内容的链接。 源内容可以是例如网页或来自文字处理或电子邮件应用的文档。 相关信息可以包括图像,视频,网页,地图或其他基于位置的信息,基于人群的信息和聚合不同类型信息的特殊服务。 通过分析文本内容,用户行为和相对于源的连接来定位相关内容。 相关内容的得分与来源相似。 选择足够相似但不太相似的内容。 类似的相关内容被分组以选择代表性的结果。 所选择的内容是基于属性优先级在多个阶段进行过滤,以避免对早期过滤掉的内容进行不必要的处理。

    ONLINE RELEVANCE ENGINE
    5.
    发明申请
    ONLINE RELEVANCE ENGINE 有权
    在线相关引擎

    公开(公告)号:US20100169331A1

    公开(公告)日:2010-07-01

    申请号:US12344812

    申请日:2008-12-29

    IPC分类号: G06F7/06 G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Information is automatically located which is relevant to source content that a user is viewing on a user interface without requiring the user to perform an additional search or navigate links of the source content. The source content can be, e.g., a web page or a document from a word processing or email application. The relevant information can include images, videos, web pages, maps or other location-based information, people-based information and special services which aggregate different types of information. Related content is located by analyzing textual content, user behavior and connectivity relative to the source. The related content is scored for similarity to the source. Content which is sufficiently similar but not too similar is selected. Similar related content is grouped to select representative results. The selected content is filtering in multiple stages based on attribute priorities to avoid unnecessary processing of content which is filtered out an early stage.

    摘要翻译: 自动定位与用户正在用户界面上观看的源内容相关的信息,而不需要用户执行附加搜索或浏览源内容的链接。 源内容可以是例如网页或来自文字处理或电子邮件应用的文档。 相关信息可以包括图像,视频,网页,地图或其他基于位置的信息,基于人群的信息和聚合不同类型信息的特殊服务。 通过分析文本内容,用户行为和相对于源的连接来定位相关内容。 相关内容的得分与来源相似。 选择足够相似但不太相似的内容。 类似的相关内容被分组以选择代表性的结果。 所选择的内容是基于属性优先级在多个阶段进行过滤,以避免对早期过滤掉的内容进行不必要的处理。

    Flat Navigation of Information and Content Presented on User Monitor
    6.
    发明申请
    Flat Navigation of Information and Content Presented on User Monitor 审中-公开
    在用户监视器上呈现的信息和内容的平面导航

    公开(公告)号:US20100162174A1

    公开(公告)日:2010-06-24

    申请号:US12344104

    申请日:2008-12-24

    IPC分类号: G06F3/048

    CPC分类号: G06F3/048 G06F2203/04806

    摘要: A method of presenting information on a display monitor within a computing environment includes accessing a website containing a related collection of electronic pages, crawling the website to obtain raw image data for at least some of each of the pages, porting the raw image data into a template so that each of the crawled pages is converted into a corresponding information panel containing a mapping of the content of its respective corresponding page, and displaying each of the information panels on a respective display monitor so all of the panels are viewable to a user in a single screen shot. Related methods, apparatus, and systems are further provided.

    摘要翻译: 在计算环境中在显示监视器上呈现信息的方法包括访问包含电子页面的相关集合的网站,爬行网站以获得每个页面中的至少一些页面的原始图像数据,将原始图像数据移植到 模板,使得每个被抓取的页面被转换成包含其各自对应页面的内容的映射的对应的信息面板,并且将各个信息面板显示在各自的显示监视器上,使得所有面板都可以向用户显示 一个屏幕截图。 还提供了相关方法,装置和系统。

    Semantic Image Collection Visualization
    7.
    发明申请
    Semantic Image Collection Visualization 审中-公开
    语义图像集合可视化

    公开(公告)号:US20090313558A1

    公开(公告)日:2009-12-17

    申请号:US12137157

    申请日:2008-06-11

    IPC分类号: G06F7/06 G06F17/30 G06F3/048

    CPC分类号: G06F16/583 G06F16/951

    摘要: A service provides an image collection as a visual preview of content pages having a link in or otherwise related to a current page. A first content page is provided to a user and may have one or more links to additional content pages. Each of the related content pages may have one or more images. Selected images of the one or more content pages are provided in an image collection. The images may be positioned in rows, columns, or some other manner within the collection. The image collection is prepared dynamically from related content pages when the current page is loaded and does not require any software in the currently content page to be changed as the linked content pages change.

    摘要翻译: 服务提供图像集合作为具有当前页面中的链接或以其他方式与当前页面相关的内容页面的可视预览。 向用户提供第一内容页面,并且可以具有到另外的内容页面的一个或多个链接。 每个相关内容页面可以具有一个或多个图像。 在图像集合中提供一个或多个内容页面的所选图像。 图像可以以集合中的行,列或某些其他方式定位。 当加载当前页面时,从相关内容页面动态准备图像集合,并且当链接的内容页面改变时不需要改变当前内容页面中的任何软件。

    Dynamically Providing Relevant Browser Content
    8.
    发明申请
    Dynamically Providing Relevant Browser Content 审中-公开
    动态提供相关的浏览器内容

    公开(公告)号:US20090313536A1

    公开(公告)日:2009-12-17

    申请号:US12136889

    申请日:2008-06-11

    IPC分类号: G06F17/00

    CPC分类号: G06F16/972

    摘要: A requested content page is provided with additional relevant content that is dynamically generated. A page originally requested by a browser application is generated and examined to determine key words, address information, and other information for which relevant content may be retrieved. The other information may not be part of the original page content, but it can be the relation between the content page and other pages. The relevant content is determined based on the results of the content page examination. After retrieving the relevant content, the retrieved content is embedded into the requested content page and provided to the requesting user. The retrieved relevant content may be provided with the requested content page in a designated portion within the requested content page, near related content in the page, and/or displayed in response to user input as a pop-up window or in a preview pane. Relevant content can be determined, retrieved and embedded in a content page by a relevant content engine implemented as a server application, client application or browser application plug-in.

    摘要翻译: 请求的内容页面提供动态生成的其他相关内容。 生成并检查由浏览器应用程序最初请求的页面以确定可以检索相关内容的关键字,地址信息和其他信息。 其他信息可能不是原始页面内容的一部分,但它可以是内容页面和其他页面之间的关系。 相关内容是根据内容页面检查的结果确定的。 在检索相关内容之后,检索到的内容被嵌入到所请求的内容页面中并提供给请求用户。 所检索的相关内容可以被提供在所请求的内容页面内的指定部分中,在页面中的相关内容附近,和/或响应于用户输入显示为弹出窗口或在预览窗格中。 相关内容可以由实现为服务器应用程序,客户端应用程序或浏览器应用程序插件的相关内容引擎确定,检索和嵌入到内容页面中。

    Search session with refinement
    9.
    发明授权
    Search session with refinement 有权
    搜索会话与细化

    公开(公告)号:US08332393B2

    公开(公告)日:2012-12-11

    申请号:US12907193

    申请日:2010-10-19

    申请人: Oded Elyada

    发明人: Oded Elyada

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864 G06F17/30648

    摘要: A search system may use a stateful session that suggests new keywords for refining a search with each iteration of a search sequence. The keywords may be derived from a set of previous search results, or may be identified from a taxonomy of terms. A user may be able to select the keywords to include or exclude from a user interface to further refine the search. In some embodiments, the user interface may also include various metadata parameters to include or exclude. The system may use one or more conventional query-based search engines and may be implemented as a client application, intermediate service, or as part of a search engine.

    摘要翻译: 搜索系统可以使用有状态会话,其建议用于通过搜索序列的每次迭代来优化搜索的新关键字。 关键字可以从一组先前的搜索结果导出,或者可以根据术语的分类来识别。 用户可能能够选择要从用户界面包括或排除的关键字以进一步优化搜索。 在一些实施例中,用户界面还可以包括要包括或排除的各种元数据参数。 系统可以使用一个或多个常规的基于查询的搜索引擎,并且可以被实现为客户端应用,中间服务或者作为搜索引擎的一部分。