-
公开(公告)号:US08954425B2
公开(公告)日:2015-02-10
申请号:US12796345
申请日:2010-06-08
申请人: Rong Xiao , Qiang Hao , Changhu Wang , Rui Cai , Lei Zhang
发明人: Rong Xiao , Qiang Hao , Changhu Wang , Rui Cai , Lei Zhang
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30241
摘要: Described herein is a technology that facilitates efficient automated mining of topic-related aspects of user-generated content based on automated analysis of the user-generated content. Locations are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics. Techniques are described that facilitate automatically extracting snippets. These techniques include, for example, computer annotating travelogues with learned tags and images, performing topic learning to obtain an interest model, performing location matching based on the interest model, calculating geographic and semantic relevance scores, ranking snippets based on the geographic and semantic relevance scores, and searching snippets with a “location+context term” query.
摘要翻译: 这里描述了一种技术,其有助于基于对用户生成的内容的自动化分析来有效地自动挖掘用户生成的内容的主题相关方面。 根据将文档分割成文档段,自动学习位置,并将段分解为本地主题和全局主题。 描述了便于自动提取代码段的技术。 这些技术包括例如计算机注释具有学习标签和图像的旅行记录,执行主题学习以获得兴趣模型,基于兴趣模型执行位置匹配,计算地理和语义相关性分数,基于地理和语义相关性来排序片段 分数和搜索带有“位置+上下文术语”查询的片段。
-
公开(公告)号:US20110302162A1
公开(公告)日:2011-12-08
申请号:US12796345
申请日:2010-06-08
申请人: Rong Xiao , Qiang Hao , Changhu Wang , Rui Cai , Lei Zhang
发明人: Rong Xiao , Qiang Hao , Changhu Wang , Rui Cai , Lei Zhang
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30241
摘要: Described herein is a technology that facilitates efficient automated mining of topic-related aspects of user-generated content based on automated analysis of the user-generated content. Locations are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics. Techniques are described that facilitate automatically extracting snippets. These techniques include, for example, computer annotating travelogues with learned tags and images, performing topic learning to obtain an interest model, performing location matching based on the interest model, calculating geographic and semantic relevance scores, ranking snippets based on the geographic and semantic relevance scores, and searching snippets with a “location+context term” query.
摘要翻译: 这里描述了一种技术,其有助于基于对用户生成的内容的自动化分析来有效地自动挖掘用户生成的内容的主题相关方面。 根据将文档分割成文档段,自动学习位置,并将段分解为本地主题和全局主题。 描述了便于自动提取代码段的技术。 这些技术包括例如计算机注释具有学习标签和图像的旅行记录,执行主题学习以获得兴趣模型,基于兴趣模型执行位置匹配,计算地理和语义相关性分数,基于地理和语义相关性来排序片段 分数和搜索带有“位置+上下文术语”查询的片段。
-
公开(公告)号:US08458115B2
公开(公告)日:2013-06-04
申请号:US12796303
申请日:2010-06-08
申请人: Rui Cai , Qiang Hao , Changhu Wang , Rong Xiao , Lei Zhang
发明人: Rui Cai , Qiang Hao , Changhu Wang , Rong Xiao , Lei Zhang
CPC分类号: G06F17/30707
摘要: Described herein is a technology that facilitates efficient automated mining of topic-related aspects of user generated content based on automated analysis of the user generated content. Locations are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics. Techniques described herein include, for example, computer annotating travelogues with learned tags, performing topic learning to obtain an interest model, and performing location matching based on the interest model.
摘要翻译: 这里描述了一种技术,其有助于基于对用户生成的内容的自动化分析来有效地自动挖掘用户生成的内容的主题相关方面。 根据将文档分割成文档段,自动学习位置,并将段分解为本地主题和全局主题。 本文描述的技术包括例如计算机注释具有学习标签的旅行记录,执行主题学习以获得兴趣模型,以及基于兴趣模型执行位置匹配。
-
公开(公告)号:US08856129B2
公开(公告)日:2014-10-07
申请号:US13237142
申请日:2011-09-20
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/3071
摘要: This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.
摘要翻译: 本文档描述了为多个垂直中的每一个标记种子位置的文本节点的技术。 一旦种子站点被标记为给定的垂直线,该技术从种子站点的标记的文本节点提取特征。 该技术基于人类标签和提取的特征学习种子站点的垂直知识,并将学习的垂直知识适应于新的网站,以自动准确地识别属性并提取针对特定垂直线的属性值,以进行结构化网络数据提取 。
-
公开(公告)号:US20140029856A1
公开(公告)日:2014-01-30
申请号:US13561718
申请日:2012-07-30
IPC分类号: G06K9/46
CPC分类号: G06K9/46 , G06K9/4676 , G06K9/469
摘要: The techniques discussed herein discover three-dimensional (3-D) visual phrases for an object based on a 3-D model of the object. The techniques then describe the 3-D visual phrases. Once described, the techniques use the 3-D visual phrases to detect the object in an image (e.g., object recognition).
摘要翻译: 本文讨论的技术基于对象的3-D模型发现对象的三维(3-D)视觉短语。 然后,技术描述3-D视觉短语。 一旦描述,这些技术使用3-D视觉短语来检测图像中的对象(例如,对象识别)。
-
公开(公告)号:US20130073514A1
公开(公告)日:2013-03-21
申请号:US13237142
申请日:2011-09-20
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/3071
摘要: This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.
摘要翻译: 本文档描述了为多个垂直中的每一个标记种子位置的文本节点的技术。 一旦种子站点被标记为给定的垂直线,该技术从种子站点的标记的文本节点提取特征。 该技术基于人类标签和提取的特征学习种子站点的垂直知识,并将学习的垂直知识适应于新的网站,以自动准确地识别属性并提取针对特定垂直线的属性值,以进行结构化网络数据提取 。
-
公开(公告)号:US08983201B2
公开(公告)日:2015-03-17
申请号:US13561718
申请日:2012-07-30
CPC分类号: G06K9/46 , G06K9/4676 , G06K9/469
摘要: The techniques discussed herein discover three-dimensional (3-D) visual phrases for an object based on a 3-D model of the object. The techniques then describe the 3-D visual phrases. Once described, the techniques use the 3-D visual phrases to detect the object in an image (e.g., object recognition).
摘要翻译: 本文讨论的技术基于对象的3-D模型发现对象的三维(3-D)视觉短语。 然后,技术描述3-D视觉短语。 一旦描述,这些技术使用3-D视觉短语来检测图像中的对象(例如,对象识别)。
-
公开(公告)号:US20110302124A1
公开(公告)日:2011-12-08
申请号:US12796303
申请日:2010-06-08
申请人: Rui Cai , Qiang Hao , Changhu Wang , Rong Xiao , Lei Zhang
发明人: Rui Cai , Qiang Hao , Changhu Wang , Rong Xiao , Lei Zhang
CPC分类号: G06F17/30707
摘要: Described herein is a technology that facilitates efficient automated mining of topic-related aspects of user generated content based on automated analysis of the user generated content. Locations are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics. Techniques described herein include, for example, computer annotating travelogues with learned tags, performing topic learning to obtain an interest model, and performing location matching based on the interest model.
摘要翻译: 这里描述了一种技术,其有助于基于对用户生成的内容的自动化分析来有效地自动挖掘用户生成的内容的主题相关方面。 根据将文档分割成文档段,自动学习位置,并将段分解为本地主题和全局主题。 本文描述的技术包括例如计算机注释具有学习标签的旅行记录,执行主题学习以获得兴趣模型,以及基于兴趣模型执行位置匹配。
-
公开(公告)号:US09495453B2
公开(公告)日:2016-11-15
申请号:US13114643
申请日:2011-05-24
申请人: Rui Cai , Xiaodong Fan , Lei Zhang
发明人: Rui Cai , Xiaodong Fan , Lei Zhang
CPC分类号: G06F17/30864 , G06F17/30705 , G06F17/30861 , G06F17/3089 , G06F17/30899
摘要: Web crawling polices are generated based on user web browsing statistics. User browsing statistics are aggregated at the granularity of resource identifier patterns (such as URL patterns) that denote groups of resources within a particular domain or website that share syntax at a certain level of granularity. The web crawl policies rank the resource identifier patterns according to their associated aggregated user browsing statistics. A crawl ordering defined by the web crawl polices is used to download and discover new resources within a domain or website.
摘要翻译: 基于用户网络浏览统计信息生成Web爬行策略。 用户浏览统计信息以资源标识符模式(例如URL模式)的粒度进行聚合,这些资源标识符模式表示特定域或网站中以特定粒度级别共享语法的资源组。 网络爬网策略根据其关联的聚合用户浏览统计信息对资源标识符模式进行排序。 由网络抓取策略定义的爬网排序用于下载和发现域或网站中的新资源。
-
公开(公告)号:US08370119B2
公开(公告)日:2013-02-05
申请号:US12389368
申请日:2009-02-19
申请人: Rui Cai , Jiang-Ming Yang , Lei Zhang , Wei-Ying Ma
发明人: Rui Cai , Jiang-Ming Yang , Lei Zhang , Wei-Ying Ma
IPC分类号: G06G7/48
CPC分类号: G06F17/218 , G06F8/75 , G06F17/27
摘要: Website design pattern modeling technique embodiments are presented that model a website's design patterns. This can be based on the website's layout elements, its URL tokens, or both. When based on both, the design patterns can be modeled separately using first the layout elements and then the URL tokens, or vice versa. Alternately, the modeling can be based on coupled layout and URL token patterns. In operation, the modeling involves first identifying layout elements and/or URL tokens found on at least some of the pages of the website. The website design patterns are then modeled based on the occurrences of the identified layout elements and/or URL tokens in pages of the website. In cases where a coupled modeling scheme is employed, a modeling technique that exploits the correlations between the layout elements and URL tokens is used.
摘要翻译: 呈现网站设计模式建模技术实施例,模拟网站的设计模式。 这可以基于网站的布局元素,其网址令牌或两者兼而有之。 当基于这两者时,可以使用第一个布局元素和URL令牌来单独建模设计模式,反之亦然。 或者,建模可以基于耦合的布局和URL令牌模式。 在操作中,建模涉及首先识别在网站的至少一些页面上发现的布局元素和/或URL令牌。 然后基于网站页面中识别的布局元素和/或URL令牌的出现来对网站设计模式进行建模。 在使用耦合建模方案的情况下,使用利用布局元素和URL令牌之间的相关性的建模技术。
-
-
-
-
-
-
-
-
-