-
公开(公告)号:US08423538B1
公开(公告)日:2013-04-16
申请号:US12938205
申请日:2010-11-02
申请人: Eldar Sadikov , Jayant Madhavan , Alon Halevy
发明人: Eldar Sadikov , Jayant Madhavan , Alon Halevy
CPC分类号: G06N7/005 , G06F17/30389 , G06F17/30463 , G06F17/30598 , G06F17/30958 , G06F17/30979
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for clustering query refinements. One method includes building a representation of a graph for a first query, wherein the graph has a node for the first query, a node for each of a plurality of refinements for the first query, and a node for each document in the document sets of the refinements, and wherein the graph has edges from the first query node to each of the refinement nodes, edges from the first query to each document in the respective document set of the first query, edges from each refinement to each document in the respective document set of the refinement, and edges from each refinement to each co-occurring query of the refinement. The method further includes clustering the refinements into refinement clusters by partitioning the refinement nodes in the graph into proper subsets.
摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于聚类查询优化。 一种方法包括构建用于第一查询的图形的表示,其中该图具有用于第一查询的节点,用于第一查询的多个细化中的每一个的节点,以及用于第一查询的文档集合中的每个文档的节点 并且其中图形具有从第一查询节点到每个细化节点的边缘,从第一查询到第一查询的相应文档集合中的每个文档的边缘,每个细化的边缘到相应文档中的每个文档 精确的集合,以及每个细化的边缘到精炼的每个共同查询。 该方法还包括通过将图中的细化节点划分成适当的子集来将细化聚类成细化簇。
-
公开(公告)号:US08244743B2
公开(公告)日:2012-08-14
申请号:US12796142
申请日:2010-06-08
CPC分类号: G06F17/30241 , G06F17/30091 , G06F17/30312 , G06F17/30554 , G06F17/30991 , G06F17/30994
摘要: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.
摘要翻译: 本发明的方面提供了用于广泛应用的数据管理和集成的服务。 集群计算机可以以云型配置来布置,用于在前端管理服务器的控制下存储和处理大量的用户数据。 分布式用户的社区可能会跨多个企业的数据进行协作。 非常大的表格数据文件被上传到存储设施。 数据文件被保持为表,并且响应于用户查询创建和维护相关信息的复合表。 提供了可视化数据的不同方法。 根据可显示的信息量,空间索引中的特征可能会变薄以进行呈现。 空间和结构化查询是处理,结果相交以获取显示信息。
-
公开(公告)号:US20110302194A1
公开(公告)日:2011-12-08
申请号:US12796142
申请日:2010-06-08
IPC分类号: G06F17/30
CPC分类号: G06F17/30241 , G06F17/30091 , G06F17/30312 , G06F17/30554 , G06F17/30991 , G06F17/30994
摘要: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.
摘要翻译: 本发明的方面提供了用于广泛应用的数据管理和集成的服务。 集群计算机可以以云型配置来布置,用于在前端管理服务器的控制下存储和处理大量的用户数据。 分布式用户的社区可能会跨多个企业的数据进行协作。 非常大的表格数据文件被上传到存储设施。 数据文件被保持为表,并且响应于用户查询创建和维护相关信息的复合表。 提供了可视化数据的不同方法。 根据可显示的信息量,空间索引中的特征可能会变薄以进行呈现。 空间和结构化查询是处理,结果相交以获取显示信息。
-
公开(公告)号:US08732116B1
公开(公告)日:2014-05-20
申请号:US13412346
申请日:2012-03-05
申请人: Hazem Elmeleegy , Jayant Madhavan , Alon Halevy
发明人: Hazem Elmeleegy , Jayant Madhavan , Alon Halevy
CPC分类号: G06F17/30569
摘要: List information can be extracted into database tables. A number of fields are independently determined for items in list. A number of database table columns are determined from most common number of list item fields. New fields are determined for items with more fields than database columns. Null fields are inserted into items with fewer fields than database columns. Information from items having the same number of fields as database columns is written to database table rows. Information from each field is written to a corresponding database table column. Streaks of poorly matching cells in a database table row are determined. Streak cells are merged and new cells are determined. Null cells are inserted if number of new cells is less than number of cells in the streak. Information from the new cells is written to the table row and columns that define the streak.
摘要翻译: 列表信息可以提取到数据库表中。 列表中的项目会独立确定多个字段。 从最常见的列表项字段数确定多个数据库表列。 对于具有比数据库列更多的字段的项目确定新字段。 空字段插入到具有比数据库列少的字段的项目中。 具有与数据库列相同数量字段的项目的信息将写入数据库表行。 来自每个字段的信息将写入相应的数据库表列。 确定数据库表行中不良匹配单元格的条纹。 条纹细胞被合并,确定新的细胞。 如果新细胞数少于条纹细胞数,则插入无细胞。 来自新单元格的信息将写入表行和定义条纹的列。
-
公开(公告)号:US20130031083A1
公开(公告)日:2013-01-31
申请号:US12062274
申请日:2008-04-03
申请人: Jayant Madhavan , David Ko , Lucja A. Kot , Alon Halevy
发明人: Jayant Madhavan , David Ko , Lucja A. Kot , Alon Halevy
CPC分类号: G06F16/951
摘要: Among other disclosed subject matter, a computer-implemented method of analyzing a form page for indexing includes identifying a form page that is configured for use in requesting any of multiple target pages, the form page including at least one text input control for retrieving any of the multiple target pages. The method includes identifying at least one keyword as being informative with regard to the text input control. The method includes updating an indexing record associated with the form page to reflect the identified keyword.
摘要翻译: 在其他公开的主题中,分析用于索引的表单页的计算机实现的方法包括识别被配置为用于请求多个目标页面中的任何一个的表单页面,该表单页面包括至少一个文本输入控件,用于检索 多个目标页面。 该方法包括将至少一个关键字识别为关于文本输入控件的信息。 该方法包括更新与表单页面相关联的索引记录以反映所识别的关键字。
-
公开(公告)号:US20120278313A1
公开(公告)日:2012-11-01
申请号:US13547409
申请日:2012-07-12
IPC分类号: G06F17/30
CPC分类号: G06F17/30241 , G06F17/30091 , G06F17/30312 , G06F17/30554 , G06F17/30991 , G06F17/30994
摘要: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.
摘要翻译: 本发明的方面提供了用于广泛应用的数据管理和集成的服务。 集群计算机可以以云型配置来布置,用于在前端管理服务器的控制下存储和处理大量的用户数据。 分布式用户的社区可能会跨多个企业的数据进行协作。 非常大的表格数据文件被上传到存储设施。 数据文件被保持为表,并且响应于用户查询创建和维护相关信息的复合表。 提供了可视化数据的不同方法。 根据可显示的信息量,空间索引中的特征可能会变薄以进行呈现。 空间和结构化查询是处理,结果相交以获取显示信息。
-
公开(公告)号:US08140533B1
公开(公告)日:2012-03-20
申请号:US12694160
申请日:2010-01-26
申请人: Hazem Elmeleegy , Jayant Madhavan , Alon Halevy
发明人: Hazem Elmeleegy , Jayant Madhavan , Alon Halevy
CPC分类号: G06F17/30569
摘要: Computer implemented methods and apparatus for extracting list information into database tables. A number of fields are independently determined for items in list. A number of database table columns are determined from most common number of list item fields. New fields are determined for items with more fields than database columns. Null fields are inserted into items with fewer fields than database columns. Information from items having the same number of fields as database columns is written to database table rows. Information from each field is written to a corresponding database table column. Streaks of poorly matching cells in a database table row are determined. Streak cells are merged and new cells are determined. Null cells are inserted if number of new cells is less than number of cells in the streak. Information from the new cells is written to the table row and columns that define the streak.
摘要翻译: 用于将列表信息提取到数据库表中的计算机实现方法和装置 列表中的项目会独立确定多个字段。 从最常见的列表项字段数确定多个数据库表列。 对于具有比数据库列更多的字段的项目确定新字段。 空字段插入到具有比数据库列少的字段的项目中。 具有与数据库列相同数量字段的项目的信息将写入数据库表行。 来自每个字段的信息将写入相应的数据库表列。 确定数据库表行中不良匹配单元格的条纹。 条纹细胞被合并,确定新的细胞。 如果新细胞数少于条纹细胞数,则插入无细胞。 来自新单元格的信息将写入表行和定义条纹的列。
-
公开(公告)号:US08589425B2
公开(公告)日:2013-11-19
申请号:US13547409
申请日:2012-07-12
CPC分类号: G06F17/30241 , G06F17/30091 , G06F17/30312 , G06F17/30554 , G06F17/30991 , G06F17/30994
摘要: Aspects of the invention provide a service for data management and integration across a wide range of applications. Clustered computers may be arranged in a cloud-type configuration for storing and handling large amounts of user data under the control of a front-end management server. Communities of distributed users may collaborate on the data across multiple enterprises. Very large tabular data files are uploaded to the storage facilities. The data files are maintained as tables, and a composite table of related information is created and maintained in response to user queries. Different ways of visualizing the data are provided. Depending on the amount of information that can be displayed, features in a spatial index may the thinned for presentation. Spatial and structured queries are processing and results are intersected to obtain information for display.
-
公开(公告)号:US20130031503A1
公开(公告)日:2013-01-31
申请号:US11872621
申请日:2007-10-15
申请人: Jayant Madhavan , Alon Halevy , David Ko
发明人: Jayant Madhavan , Alon Halevy , David Ko
IPC分类号: G06F3/048
CPC分类号: G06F17/2211 , G06F17/243 , G06F17/30864
摘要: Among other disclosure, a computer-implemented method of analyzing a form page for indexing includes identifying a form page that is configured for use in requesting any of multiple target pages. The form page includes multiple input controls. The method includes identifying at least one of the multiple input controls as being informative with regard to requesting the multiple target pages. The method includes updating an indexing record associated with the form page to reflect the identification.
摘要翻译: 在其他公开内容中,分析用于索引的表单页面的计算机实现的方法包括标识被配置为用于请求多个目标页面中的任何一个的表单页面。 表单页面包含多个输入控件。 该方法包括将多个输入控件中的至少一个识别为关于请求多个目标页面的信息。 该方法包括更新与表单页相关联的索引记录以反映该标识。
-
公开(公告)号:US20100198837A1
公开(公告)日:2010-08-05
申请号:US12512908
申请日:2009-07-30
申请人: Fei Wu , Jayant Madhavan , Alon Halevy
发明人: Fei Wu , Jayant Madhavan , Alon Halevy
IPC分类号: G06F17/30
CPC分类号: G06F17/30554 , G06F17/30528 , G06F17/3053 , G06F17/30672 , G06F17/30867
摘要: Methods, systems, and apparatus, including computer program products, for generating aspects associated with entities. In some implementations, a method includes receiving data identifying an entity; generating a group of candidate aspects for the entity; modifying the group of candidate aspects to generate a group of modified candidate aspects comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; and storing an association between one or more highest ranked modified candidate aspects and the entity. The aspects can be used to organize and present search results in response to queries for the entity.
摘要翻译: 用于生成与实体相关的方面的方法,系统和装置,包括计算机程序产品。 在一些实现中,一种方法包括接收识别实体的数据; 为该实体产生一组候选方面; 修改所述候选方面的组以生成一组修改的候选方面,其包括组合类似候选方面并使用与一个或多个候选方面相关联的一个或多个方面类别对候选方面分组; 基于多样性分数和受欢迎程度,对修改的候选方面组中的一个或多个修改后的候选方面进行排序; 以及存储一个或多个最高排名的修改候选方面与所述实体之间的关联。 这些方面可以用于组织和呈现搜索结果以响应对实体的查询。
-
-
-
-
-
-
-
-
-