-
公开(公告)号:US20110264664A1
公开(公告)日:2011-10-27
申请号:US12764989
申请日:2010-04-22
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30613
摘要: Concepts and technologies are described herein for identifying location names within document text. Through an implementation of the concepts and technologies presented herein, functionality can be provided for identifying location names within articles, websites, travelogues, or other such documents. For instance, documents containing the names of cities, regions, countries, landmarks, or other locations may be associated with those locations. The location names may be unambiguously identified even when the location names may also have common word meanings that are not location associated or when the location name may be associated with more than one location.
摘要翻译: 这里描述了用于识别文档文本内的位置名称的概念和技术。 通过实施本文提出的概念和技术,可以提供功能来识别文章,网站,旅游文档或其他此类文档中的位置名称。 例如,包含城市,地区,国家,地标或其他地点名称的文件可能与这些位置相关联。 即使当位置名称也可以具有不是位置相关联的公共字义,或者位置名称可能与多于一个位置相关联时,位置名称可以被明确地识别。
-
公开(公告)号:US20100211533A1
公开(公告)日:2010-08-19
申请号:US12388517
申请日:2009-02-18
申请人: Jiangming Yang , Rui Cai , Lei Zhang , Wei-Ying Ma
发明人: Jiangming Yang , Rui Cai , Lei Zhang , Wei-Ying Ma
CPC分类号: G06N20/00 , G06F16/958
摘要: The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures.
摘要翻译: 网络论坛数据提取技术是为了使用页面级信息和站点级知识,在Web论坛上的数据结构化数据提取。 为此,该技术可以找到论坛网站所拥有的页面对象的种类,页面所属的对象以及不同的页面对象如何相互连接。 该信息可以通过重新构建基于目标论坛的数据对象模型的目标论坛的站点地图来获得。 网络论坛数据提取技术收集了三种数据提取证据:1)内页特征,涵盖单个页面上的语义和布局信息; 2)描述连锁相关观察的顶点间特征; 和3)表示一个顶点中的页面之间的相互关系的内顶点特征。 该技术采用马可夫逻辑网络来统计证据的类型,从而推断出所需的结构。
-
公开(公告)号:US08572076B2
公开(公告)日:2013-10-29
申请号:US12764977
申请日:2010-04-22
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30616
摘要: Concepts and technologies are described herein for mining location contexts within document text. Through an implementation of the concepts and technologies presented herein, functionality can be provided for location context mining within articles, websites, travelogues, or other such documents. A location context is a concept associated with a specific location. For example, the contexts “beach” and “hula” are associated with Hawaii. Similarly, “glacier” and “polar bear” are contexts associated with Alaska. Location context mining can automatically discover locations and location contexts by mining information from a set of documents. User interfaces to support queries of the mined information are also presented herein.
摘要翻译: 这里描述了在文档文本内挖掘位置上下文的概念和技术。 通过实施这里提出的概念和技术,可以为文章,网站,旅游文档或其他此类文档中的位置上下文挖掘提供功能。 位置上下文是与特定位置相关联的概念。 例如,“海滩”和“呼啦”的背景与夏威夷有关。 类似地,“冰川”和“北极熊”是与阿拉斯加相关的上下文。 位置上下文挖掘可以通过从一组文档挖掘信息来自动发现位置和位置上下文。 本文还介绍了支持挖掘信息查询的用户界面。
-
公开(公告)号:US08099408B2
公开(公告)日:2012-01-17
申请号:US12163895
申请日:2008-06-27
申请人: Lei Zhang , Wei-Ying Ma , Wei Lai , Jiangming Yang , Rui Cai
发明人: Lei Zhang , Wei-Ying Ma , Wei Lai , Jiangming Yang , Rui Cai
CPC分类号: G06F17/30864
摘要: A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.
摘要翻译: 提供了一种用于识别用于爬行网站的网站的信息链接的方法和系统。 论坛搜寻器分析网页论坛的示例网页,以识别信息链接,然后通过确定为信息而不是遵循其他链接的链接抓取网页论坛。 论坛搜寻器系统基于它们是网站的整体结构的一部分还是用于选择分割到多个网页上的顺序信息来确定链接是否具有信息性。
-
公开(公告)号:US20110264655A1
公开(公告)日:2011-10-27
申请号:US12764977
申请日:2010-04-22
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30616
摘要: Concepts and technologies are described herein for mining location contexts within document text. Through an implementation of the concepts and technologies presented herein, functionality can be provided for location context mining within articles, websites, travelogues, or other such documents. A location context is a concept associated with a specific location. For example, the contexts “beach” and “hula” are associated with Hawaii. Similarly, “glacier” and “polar bear” are contexts associated with Alaska. Location context mining can automatically discover locations and location contexts by mining information from a set of documents. User interfaces to support queries of the mined information are also presented herein.
摘要翻译: 这里描述了在文档文本内挖掘位置上下文的概念和技术。 通过实施这里提出的概念和技术,可以为文章,网站,旅游文档或其他此类文档中的位置上下文挖掘提供功能。 位置上下文是与特定位置相关联的概念。 例如,“海滩”和“呼啦”的背景与夏威夷有关。 类似地,“冰川”和“北极熊”是与阿拉斯加相关的上下文。 位置上下文挖掘可以通过从一组文档挖掘信息来自动发现位置和位置上下文。 本文还介绍了支持挖掘信息查询的用户界面。
-
公开(公告)号:US07962487B2
公开(公告)日:2011-06-14
申请号:US12345645
申请日:2008-12-29
申请人: Qi Liu , Ruihua Song , Jiangming Yang
发明人: Qi Liu , Ruihua Song , Jiangming Yang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: Techniques described herein allow for suggesting creation of tools for improving search engine performance. Specifically, these tools focus on producing more relevant search engine results via a URL-based query clustering method. These tools first extract tokens from Uniform Resource Locators associated to search queries. With these tokens, these tools form query clusters of common tokens. The resulting clusters can be used to help understand the similarities in user search queries via URL-based cluster queries to produce more relevant search results.
摘要翻译: 本文描述的技术允许建议创建用于改善搜索引擎性能的工具。 具体来说,这些工具专注于通过基于URL的查询群集方法生成更相关的搜索引擎结果。 这些工具首先从与搜索查询相关联的统一资源定位器中提取令牌。 使用这些令牌,这些工具形成了常见令牌的查询集群。 所得到的集群可以用于通过基于URL的集群查询来帮助理解用户搜索查询的相似性,以产生更相关的搜索结果。
-
公开(公告)号:US20110078139A1
公开(公告)日:2011-03-31
申请号:US12568749
申请日:2009-09-29
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
CPC分类号: G06F17/30241 , G06F17/3087 , Y10S707/919
摘要: A location extraction component analyzes a set of travelogues to identify all of the locations mentioned therein. A co-occurrence extraction component computes co-occurrence values for the identified locations. When the identity of a specified location is received, suggested locations for the specified location are identified through the use of the co-occurrence values. A map is displayed that encompasses an area including the specified location and the suggested locations. The map might include indicators for the specified location and for each of the suggested locations. Attributes of the indicators, such as their size or color, can be modified based upon the co-occurrence value associated with the corresponding suggested location.
摘要翻译: 位置提取组件分析一组旅行记录以识别其中提到的所有位置。 同现提取组件计算所识别位置的同现值。 当接收到指定位置的身份时,通过使用同现值来识别指定位置的建议位置。 显示包含指定位置和建议位置的区域的地图。 地图可能包含指定位置和每个建议位置的指示符。 可以基于与相应的建议位置相关联的共现值来修改指示符的属性,例如其大小或颜色。
-
公开(公告)号:US08977632B2
公开(公告)日:2015-03-10
申请号:US12568749
申请日:2009-09-29
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
IPC分类号: G06F17/30
CPC分类号: G06F17/30241 , G06F17/3087 , Y10S707/919
摘要: A location extraction component analyzes a set of travelogues to identify all of the locations mentioned therein. A co-occurrence extraction component computes co-occurrence values for the identified locations. When the identity of a specified location is received, suggested locations for the specified location are identified through the use of the co-occurrence values. A map is displayed that encompasses an area including the specified location and the suggested locations. The map might include indicators for the specified location and for each of the suggested locations. Attributes of the indicators, such as their size or color, can be modified based upon the co-occurrence value associated with the corresponding suggested location.
摘要翻译: 位置提取组件分析一组旅行记录以识别其中提到的所有位置。 同现提取组件计算所识别位置的同现值。 当接收到指定位置的身份时,通过使用同现值来识别指定位置的建议位置。 显示包含指定位置和建议位置的区域的地图。 地图可能包含指定位置和每个建议位置的指示符。 可以基于与相应的建议位置相关联的共现值来修改指示符的属性,例如其大小或颜色。
-
公开(公告)号:US08051083B2
公开(公告)日:2011-11-01
申请号:US12103712
申请日:2008-04-16
申请人: Wei Lai , Rui Cai , Jiangming Yang , Lei Zhang , Wei-Ying Ma
发明人: Wei Lai , Rui Cai , Jiangming Yang , Lei Zhang , Wei-Ying Ma
CPC分类号: G06Q10/10
摘要: Described is a technology by which forum web pages are processed into clusters for classification purposes, including by determining repetitive regions between pages and associating pages that have similar repetitive regions into a common cluster. Patterns corresponding to the regions are determined, and a feature set based at least in part on those patterns (e.g., pattern frequency) is extracted from the page. The feature set of a page is compared against the feature set of another page to determine similarity therewith, e.g., via a feature space distance computation that is evaluated against a threshold distance.
摘要翻译: 描述了一种技术,通过该技术将论坛网页处理成用于分类目的的群集,包括通过确定页面之间的重复区域并将具有相似重复区域的页面关联到公共群集中。 确定与区域对应的模式,并且至少部分地基于那些模式(例如,模式频率)从页面提取特征集。 将页面的特征集合与另一页面的特征集进行比较以确定其相似性,例如通过针对阈值距离评估的特征空间距离计算。
-
公开(公告)号:US20110078575A1
公开(公告)日:2011-03-31
申请号:US12568735
申请日:2009-09-29
申请人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
发明人: Rong Xiao , Jiangming Yang , Lei Zhang , Xingrong Chen
IPC分类号: G06F3/00
CPC分类号: G06F17/30241
摘要: A map user interface control provides functionality for displaying a map in conjunction with the display of a Web page. The map control operates in combination with a location extraction component that analyzes the contents of the Web page to identify locations mentioned therein. Once the location extraction component has identified the locations mentioned in the Web page, a map is generated that encompasses the locations identified in the Web page. Once the map has been generated, the map control displays the map in conjunction with the display of the Web page. The map might include visual indicators corresponding to the locations mentioned in the Web page. The map might also include visual indicators corresponding to other locations near the locations identified in the Web page that have been identified using co-occurrence values generated through an analysis of a set of travelogues.
摘要翻译: 地图用户界面控件提供了与网页的显示一起显示地图的功能。 地图控制与位置提取组件结合操作,分析网页的内容以识别其中提到的位置。 一旦位置提取组件已经识别出网页中提到的位置,则生成包含网页中标识的位置的地图。 生成地图后,地图控件会与网页的显示一起显示地图。 地图可能包括与网页中提到的位置相对应的可视指示符。 地图还可以包括与网页中识别的位置附近的其他位置相对应的可视指示符,这些位置是使用通过一组旅行记录分析生成的共现值来识别的。
-
-
-
-
-
-
-
-
-