-
公开(公告)号:US07921416B2
公开(公告)日:2011-04-05
申请号:US11551336
申请日:2006-10-20
申请人: Marcus Felipe Fontoura , Vanja Josifovski , Shanmugasundaram Ravikumar , Christopher Olston , Benjamin Clay Reed , Andrew Tomkins
发明人: Marcus Felipe Fontoura , Vanja Josifovski , Shanmugasundaram Ravikumar , Christopher Olston , Benjamin Clay Reed , Andrew Tomkins
IPC分类号: G06F9/45
CPC分类号: G06F17/30427 , G06F17/3041
摘要: The present invention, in an example embodiment, provides a special-purpose formal language and translator for the parallel processing of large databases in a distributed system. The special-purpose language has features of both a declarative programming language and a procedural programming language and supports the co-grouping of tables, each with an arbitrary alignment function, and the specification of procedural operations to be performed on the resulting co-groups. The language's translator translates a program in the language into optimized structured calls to an application programming interface for implementations of functionality related to the parallel processing of tasks over a distributed system. In an example embodiment, the application programming interface includes interfaces for MapReduce functionality, whose implementations are supplemented by the embodiment.
摘要翻译: 本发明在一个示例性实施例中提供了用于并行处理分布式系统中的大型数据库的专用形式语言和翻译器。 专用语言具有声明式编程语言和程序性编程语言的特征,并且支持表的共同分组,每个表具有任意对齐功能,以及对所得到的协同组执行的过程操作的说明。 语言的翻译者将语言中的程序转换为应用程序编程接口的优化结构化调用,以实现与分布式系统上并行处理任务相关的功能。 在示例实施例中,应用编程接口包括用于MapReduce功能的接口,其实现由该实施例补充。
-
公开(公告)号:US20080098370A1
公开(公告)日:2008-04-24
申请号:US11551336
申请日:2006-10-20
申请人: Marcus Felipe Fontoura , Vanja Josifovski , Shanmugasundaram Ravikumar , Christopher Olston , Benjamin Clay Reed , Andrew Tomkins
发明人: Marcus Felipe Fontoura , Vanja Josifovski , Shanmugasundaram Ravikumar , Christopher Olston , Benjamin Clay Reed , Andrew Tomkins
IPC分类号: G06F9/45
CPC分类号: G06F17/30427 , G06F17/3041
摘要: The present invention, in an example embodiment, provides a special-purpose formal language and translator for the parallel processing of large databases in a distributed system. The special-purpose language has features of both a declarative programming language and a procedural programming language and supports the co-grouping of tables, each with an arbitrary alignment function, and the specification of procedural operations to be performed on the resulting co-groups. The language's translator translates a program in the language into optimized structured calls to an application programming interface for implementations of functionality related to the parallel processing of tasks over a distributed system. In an example embodiment, the application programming interface includes interfaces for MapReduce functionality, whose implementations are supplemented by the embodiment.
摘要翻译: 本发明在一个示例性实施例中提供了用于并行处理分布式系统中的大型数据库的专用形式语言和翻译器。 专用语言具有声明式编程语言和程序性编程语言的特征,并且支持表的共同分组,每个表具有任意对齐功能,以及对所得到的协同组执行的过程操作的说明。 语言的翻译者将语言中的程序转换为应用程序编程接口的优化结构化调用,以实现与分布式系统上并行处理任务相关的功能。 在示例实施例中,应用编程接口包括用于MapReduce功能的接口,其实现由该实施例补充。
-
公开(公告)号:US20080010250A1
公开(公告)日:2008-01-10
申请号:US11483047
申请日:2006-07-07
申请人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
发明人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
IPC分类号: G06F17/30
CPC分类号: G06F16/3325 , G06F16/951
摘要: An improved system and method is provided for searching a collection of objects that may be located in hierarchies of auxiliary information for retrieval of response objects. A framework to perform a generalization search in hierarchies may be used to generalize a search by moving up to a higher level in a hierarchy of taxonomies or to specialize a search by moving down to a lower level in the hierarchy of taxonomies. Once the system may decide to enumerate response objects at a particular level of generalization, a budgeted generalization search may be used for enumerating a set of response objects within a budgeted cost.
摘要翻译: 提供了一种改进的系统和方法,用于搜索可能位于用于检索响应对象的辅助信息的层级中的对象的集合。 在层次结构中执行泛化搜索的框架可以用于通过在分类法的层次结构中移动到更高级别来推广搜索,或者通过向下移动到分类法层级中的较低级别来专门化搜索。 一旦系统可以决定在特定的泛化级别枚举响应对象,则可以使用预算的泛化搜索来枚举在预算成本内的一组响应对象。
-
4.
公开(公告)号:US07991769B2
公开(公告)日:2011-08-02
申请号:US11483048
申请日:2006-07-07
申请人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
发明人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
IPC分类号: G06F17/30
CPC分类号: G06F17/30646 , G06F17/30864
摘要: An improved system and method is provided for searching a collection of objects that may be located in hierarchies of auxiliary information for retrieval of response objects. A framework to perform a generalization search in hierarchies may be used to generalize a search by moving up to a higher level in a hierarchy of taxonomies or to specialize a search by moving down to a lower level in the hierarchy of taxonomies. Once the system may decide to enumerate response objects at a particular level of generalization, a budgeted generalization search may be used for enumerating a set of response objects within a budgeted cost.
摘要翻译: 提供了一种改进的系统和方法,用于搜索可能位于用于检索响应对象的辅助信息的层级中的对象的集合。 在层次结构中执行泛化搜索的框架可以用于通过在分类法的层次结构中移动到更高级别来推广搜索,或者通过向下移动到分类法层级中的较低级别来专门化搜索。 一旦系统可以决定在特定的泛化级别枚举响应对象,则可以使用预算的泛化搜索来枚举在预算成本内的一组响应对象。
-
5.
公开(公告)号:US20080010251A1
公开(公告)日:2008-01-10
申请号:US11483048
申请日:2006-07-07
申请人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
发明人: Marcus Felipe Fontoura , Vanja Josifovski , Christopher Olston , Shanmugasundaram Ravikumar , Andrew Tomkins
IPC分类号: G06F17/30
CPC分类号: G06F17/30646 , G06F17/30864
摘要: An improved system and method is provided for searching a collection of objects that may be located in hierarchies of auxiliary information for retrieval of response objects. A framework to perform a generalization search in hierarchies may be used to generalize a search by moving up to a higher level in a hierarchy of taxonomies or to specialize a search by moving down to a lower level in the hierarchy of taxonomies. Once the system may decide to enumerate response objects at a particular level of generalization, a budgeted generalization search may be used for enumerating a set of response objects within a budgeted cost.
摘要翻译: 提供了一种改进的系统和方法,用于搜索可能位于用于检索响应对象的辅助信息的层级中的对象的集合。 在层次结构中执行泛化搜索的框架可以用于通过在分类法的层次结构中移动到更高级别来推广搜索,或者通过向下移动到分类法层级中的较低级别来专门化搜索。 一旦系统可以决定在特定的泛化级别枚举响应对象,则可以使用预算的泛化搜索来枚举在预算成本内的一组响应对象。
-
公开(公告)号:US07970760B2
公开(公告)日:2011-06-28
申请号:US12046123
申请日:2008-03-11
申请人: Christopher Olston , Sandeep Pandey
发明人: Christopher Olston , Sandeep Pandey
CPC分类号: G06F17/30864
摘要: Methods, systems, and computer readable media comprising instructions for identifying needy queries for which additional responsive content is needed. A method comprises receiving a query comprising one or more terms and retrieving one or more content items identified as responsive to the query, the one or more content items ranked according to one or more ranking techniques. A score is generated for the one or more ranked content items identified as responsive to the query. A determination is thereafter made as to whether the query is needy based upon a comparison of the one or more scores associated with the one or more content items identified as responsive to the query and a needy query score threshold.
摘要翻译: 方法,系统和计算机可读介质包括用于识别需要其他响应内容的有需要的查询的指令。 一种方法包括接收包括一个或多个术语的查询,并且检索一个或多个被识别为响应于该查询的内容项,该一个或多个内容项根据一个或多个排名技术排列。 对于被识别为响应于查询的一个或多个排名的内容项目生成分数。 此后,基于与识别为响应于查询的一个或多个内容项目和有需要的查询分数阈值相关联的一个或多个内容项目的比较来确定查询是否需要。
-
公开(公告)号:US07899807B2
公开(公告)日:2011-03-01
申请号:US12004881
申请日:2007-12-20
申请人: Christopher Olston , Sandeep Pandey
发明人: Christopher Olston , Sandeep Pandey
CPC分类号: G06F17/30864
摘要: An improved system and method for crawl ordering of a web crawler by impact upon search results of a search engine is provided. Content-independent features of uncrawled web pages may be obtained, and the impact of uncrawled web pages may be estimated for queries of a workload using the content-independent features. The impact of uncrawled web pages may be estimated for queries by computing an expected impact score for uncrawled web pages that match needy queries. Query sketches may be created for a subset of the queries by computing an expected impact score for crawled web pages and uncrawled web pages matching the queries. Web pages may then be selected to fetch using a combined query-based estimate and query-independent estimate of the impact of fetching the web pages on search query results.
摘要翻译: 提供了一种改进的系统和方法,用于通过对搜索引擎的搜索结果的影响来爬取对网页爬虫的排序。 未获取的网页的内容无关的功能可能会被获取,并且可以使用内容无关的功能来估计未浏览的网页对工作负载的查询的影响。 未查询的网页的影响可以通过计算与有需要的查询匹配的未浏览的网页的预期影响分数来估计查询。 可以通过计算搜索的网页和匹配查询的未浏览的网页的预期影响分数来为查询的子集创建查询草图。 然后,可以使用基于查询的组合估计和对查询网页对搜索查询结果的影响的独立于查询的估计来选择网页。
-
公开(公告)号:US20100114867A1
公开(公告)日:2010-05-06
申请号:US12266364
申请日:2008-11-06
申请人: Christopher Olston
发明人: Christopher Olston
CPC分类号: G06F17/30442 , G06F17/30566 , G06F17/30864
摘要: A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.
摘要翻译: 给出了一种方法和系统,用于提供跨越桌面和云的虚拟环境。 在一个示例中,该方法包括通过位于云中的数据集接收查询模板,优化查询模板以将查询模板分段成离线阶段和在线阶段,在云上执行脱机阶段以构建一个或 更多索引,并将一个或多个索引发送到桌面。
-
公开(公告)号:US20090182706A1
公开(公告)日:2009-07-16
申请号:US12015392
申请日:2008-01-16
IPC分类号: G06F17/30
CPC分类号: G06F17/30442
摘要: Computer-implemented methods, modules and clients relate to expanded, pruned sample table for testing database queries against a base table. The expanded, pruned sample table is formed from the base table by a process of initial sampling, synthesis, and pruning.
摘要翻译: 计算机实现的方法,模块和客户端与扩展的,已修剪的示例表相关,用于根据基表测试数据库查询。 通过初始采样,合成和修剪的过程,从基表形成扩展的修剪的样本表。
-
公开(公告)号:US08745183B2
公开(公告)日:2014-06-03
申请号:US11588020
申请日:2006-10-26
申请人: Christopher Olston
发明人: Christopher Olston
IPC分类号: G06F15/173 , G06F7/00 , G06F17/00 , G06F17/30
CPC分类号: G06F17/30899
摘要: An improved system and method is provided for adaptively refreshing a web page. A base version of the web page may be partitioned into a collection of fragments. Then the collection of fragments may be compared with the corresponding fragments of a recent version of the web page to determine a divergence measurement of the difference between the base version and the recent version of the web page. The divergence measurement may be recorded in a change profile representing a change history of the web page that includes a sequence of numeric pairs indicating a time offset and a divergence measurement of the difference between a version of the web page at the time offset and a base version of the web page. The refresh period for the web page may be adjusted by applying an adaptive refresh policy using the divergence measurements recorded in the change profile.
摘要翻译: 提供了一种改进的系统和方法来自适应地刷新网页。 网页的基本版本可以被分割成片段的集合。 然后将片段的收集与网页的最近版本的相应片段进行比较,以确定基本版本和网页的最近版本之间的差异的发散度度量。 发散度测量可以被记录在表示网页的变化历史的变化曲线中,该变化历史包括指示时间偏移的时间偏移和网页的版本之间的差的基准的数字对的序列, 版本的网页。 可以通过使用记录在改变简档中的发散度测量应用自适应刷新策略来调整网页的刷新周期。
-
-
-
-
-
-
-
-
-