-
公开(公告)号:US07149735B2
公开(公告)日:2006-12-12
申请号:US10603035
申请日:2003-06-24
申请人: Surajit Chaudhuri , Venkatesh Ganti , Luis Gravano
发明人: Surajit Chaudhuri , Venkatesh Ganti , Luis Gravano
IPC分类号: G06F17/30
CPC分类号: G06F17/30985 , Y10S707/99936
摘要: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.
摘要翻译: 在数据库查询中估计给定字符串谓词的选择性的方法。 在方法中,估计各种子串长度的子串的选择性。 例如,可以估计长度l(或一些常数q)与给定字符串谓词的长度之间的子串的选择性。 然后,该方法基于所估计的子串的选择性来选择每个子串长度的候选子串。 合并候选子串的估计选择性。 候选子串的组合估计选择性作为给定字符串谓词的估计选择性返回。
-
公开(公告)号:US07007039B2
公开(公告)日:2006-02-28
申请号:US09881500
申请日:2001-06-14
申请人: Surajit Chaudhuri , Nicolas Bruno , Luis Gravano
发明人: Surajit Chaudhuri , Nicolas Bruno , Luis Gravano
IPC分类号: G06F17/30
CPC分类号: G06F17/30463 , G06F17/30536 , Y10S707/99935
摘要: In a database system, a method of maintaining a self-tuning histogram having a plurality of existing rectangular shaped buckets arranged in a hierarchical manner and defined by at least two bucket boundaries, a bucket volume, and a bucket frequency. At least one new bucket is created in response to a query on the database. Each new bucket is contained within at least one existing bucket and the new bucket becomes a child bucket and the existing bucket containing it becomes a parent bucket. The boundaries of each new bucket correspond to a region of the database accessed by the query and the frequency of the new bucket is a number of data records returned by the query. Buckets may be merged based on a merge criterion such as similar bucket density when the total number of buckets exceeds the predetermined budget.
摘要翻译: 在数据库系统中,一种保持自调整直方图的方法,该自调整直方图具有以分层方式布置并由至少两个桶边界,桶体积和桶频率定义的多个现有矩形桶。 响应于数据库上的查询,至少创建一个新的桶。 每个新的桶都包含在至少一个现有的桶中,新的桶将成为一个小桶,并且包含它的现有桶成为一个主桶。 每个新桶的边界对应于由查询访问的数据库的区域,并且新桶的频率是查询返回的多个数据记录。 当桶的总数超过预定预算时,桶可以基于合并标准合并,例如相似桶密度。
-
公开(公告)号:US5806061A
公开(公告)日:1998-09-08
申请号:US859556
申请日:1997-05-20
申请人: Surajit Chaudhuri , Luis Gravano
发明人: Surajit Chaudhuri , Luis Gravano
CPC分类号: G06F17/30017 , Y10S707/99933 , Y10S707/99945
摘要: A method for optimizing the cost of searches through a multimedia repository is disclosed where the repository contains a plurality of objects having at least two different attributes such as color in a newspaper photograph and text in the subtitle. The method comprises selecting a ranking expression, translating the ranking expression into resulting filter conditions and then optimizing the resulting filter conditions to perform the search. A database look-up step is included which determines the cost of performing searches over the various subconditions of the filter condition. The least costly subcondition is searched first to retrieve objects from the multimedia repository. The remaining subconditions are then evaluated on the retrieved objects using either a search step or probe step depending upon the determined cost to perform each. A further database look-up step predicts a grade of match necessary in the translated ranking expression to retrieve at least the number of objects requested in the search.
摘要翻译: 公开了一种用于优化通过多媒体存储库的搜索成本的方法,其中存储库包含具有至少两个不同属性的多个对象,例如报纸照片中的颜色和副标题中的文本。 该方法包括选择排序表达式,将排名表达式转换成所得到的过滤条件,然后优化所得到的过滤条件以执行搜索。 包括数据库查找步骤,其确定在过滤条件的各种子条件下执行搜索的成本。 首先搜索成本最低的子条件,以从多媒体库中检索对象。 然后使用搜索步骤或探测步骤根据确定的执行每个的成本,在检索到的对象上评估剩余的子条件。 进一步的数据库查找步骤预测在翻译的排序表达中必需的匹配等级以至少检索搜索中请求的对象的数量。
-
公开(公告)号:US10032131B2
公开(公告)日:2018-07-24
申请号:US13527601
申请日:2012-06-20
申请人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
发明人: Tao Cheng , Kris Ganjam , Kaushik Chakrabarti , Zhimin Chen , Vivek R. Narasayya , Surajit Chaudhuri
摘要: A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
-
公开(公告)号:US08874592B2
公开(公告)日:2014-10-28
申请号:US11427287
申请日:2006-06-28
申请人: Gary W. Flake , William H. Gates, III , Eric J. Horvitz , Joshua T. Goodman , Surajit Chaudhuri , Trenholme J. Griffin , Oliver Hurst-Hiller , Kenneth A. Moss
发明人: Gary W. Flake , William H. Gates, III , Eric J. Horvitz , Joshua T. Goodman , Surajit Chaudhuri , Trenholme J. Griffin , Oliver Hurst-Hiller , Kenneth A. Moss
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30061 , G06F17/30241 , G06F17/3087 , G06Q30/0261
摘要: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.
摘要翻译: 本发明涉及网络搜索,更具体地涉及影响结果内容以增加相关性。 可以通过基于用户位置和/或上下文信息(例如,用户特征/简档,先前交互/使用时间,当前事件和第三方状态/上下文)重新配置查询和/或过滤结果来影响所得到的内容。 )。 此外,本公开提供了在指定的web内容的至少一个子集上的查询执行,例如由用户指定的。 此外,公开了一种本地化的营销系统,其向与用户相匹配的商品标准(包括接近度)提供折扣优惠。 可以采用用于主动探测具有不同参数和监测响应的用户群体的系统来收集数据,以识别提供给用户以获得期望结果的最佳折扣和期限。
-
公开(公告)号:US20130275436A1
公开(公告)日:2013-10-17
申请号:US13444717
申请日:2012-04-11
申请人: Surajit Chaudhuri , Lev Novik , John C. Platt
发明人: Surajit Chaudhuri , Lev Novik , John C. Platt
IPC分类号: G06F17/30
CPC分类号: G06F16/319 , G06F16/245
摘要: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.
摘要翻译: 各种实施例提高了可以包含在数据库内的数据的可发现性。 在一个或多个实施例中,数据库内的数据被组织在具有模式的结构中。 结构和数据可以以呈现一个或多个伪文档的方式进行处理,每个伪文档构成可被索引的子结构。 一旦生成和索引,伪文档构成一组可搜索的对象,每个可搜索对象在数据库中相互关联地指向其相关联的结构。 现在可以针对伪文档执行搜索,这些伪文档又返回一组搜索结果。 该组搜索结果可以包括多个伪文档子集,每个子集的每个子集与不同的结构相关联。
-
公开(公告)号:US20130091120A1
公开(公告)日:2013-04-11
申请号:US13253315
申请日:2011-10-05
IPC分类号: G06F17/30
CPC分类号: G06F17/30303 , G06F17/30533
摘要: A fuzzy joins system that is integrated in a database system generates fuzzy joins between records from two datasets. The fuzzy joins system includes a tokenizer to generate tokens for data records and a transformer to find transforms for the tokens. The fuzzy joins system invokes a signature generator, running within a runtime layer of the database system, to generate signatures for data records based on the tokens and their transforms. Subsequently, an equi-join operation joins the records from the two datasets with at least one equal signature. A similarity calculator, running within a runtime layer of the database system, computes a similarity measure using the token information of the joined records. If the similarity measure for any two records is above a threshold, the fuzzy joins system generates a fuzzy join between such two records.
摘要翻译: 集成在数据库系统中的模糊连接系统在两个数据集的记录之间生成模糊连接。 模糊连接系统包括一个用于生成数据记录令牌的标记器和一个用于为令牌找到变换的变压器。 模糊连接系统调用在数据库系统的运行时层内运行的签名生成器,以基于令牌及其转换生成用于数据记录的签名。 随后,等连接操作将来自两个数据集的记录与至少一个相等的签名相连。 在数据库系统的运行时层内运行的相似度计算器使用所连接的记录的令牌信息来计算相似性度量。 如果任何两个记录的相似性度量高于阈值,则模糊连接系统在这两个记录之间生成模糊连接。
-
公开(公告)号:US08332388B2
公开(公告)日:2012-12-11
申请号:US12818237
申请日:2010-06-18
CPC分类号: G06F17/30463
摘要: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
摘要翻译: 描述技术用于查询优化器的转换规则剖析。 该方法可以包括获得配置为由数据库系统的查询优化器优化的数据库查询。 可以使用主机转换规则集查找数据库查询的优化查询计划。 一次可以删除和检查一个转换规则。 可以检查每个变换规则以确定变换规则是否影响最优查询计划输出。 每个转换规则已被删除后,可以生成测试查询计划。 在没有删除的转换规则的情况下,查询优化器可以确定测试查询计划是否与优化的查询计划不同。 可以创建一组等效的转换规则,其中包括转换规则,其中从等效转换规则集生成的测试查询计划等同于优化的计划。
-
公开(公告)号:US20110314000A1
公开(公告)日:2011-12-22
申请号:US12818237
申请日:2010-06-18
IPC分类号: G06F17/30
CPC分类号: G06F17/30463
摘要: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
摘要翻译: 描述技术用于查询优化器的转换规则剖析。 该方法可以包括获得配置为由数据库系统的查询优化器优化的数据库查询。 可以使用主机转换规则集查找数据库查询的优化查询计划。 一次可以删除和检查一个转换规则。 可以检查每个变换规则以确定变换规则是否影响最优查询计划输出。 每个转换规则已被删除后,可以生成测试查询计划。 在没有删除的转换规则的情况下,查询优化器可以确定测试查询计划是否与优化的查询计划不同。 可以创建一组等效的转换规则,其中包括转换规则,其中从等效转换规则集生成的测试查询计划等同于优化的计划。
-
公开(公告)号:US08032546B2
公开(公告)日:2011-10-04
申请号:US12031715
申请日:2008-02-15
申请人: Arvind Arasu , Surajit Chaudhuri
发明人: Arvind Arasu , Surajit Chaudhuri
CPC分类号: G06F17/30569 , G06F17/30675 , G06F17/30985
摘要: A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.
摘要翻译: 基于变换的记录匹配技术。 当通过采用显式输入用户定义的转换规则(例如,“Robert”和“Bob”)这样的事实来执行记录匹配时,该技术提供了一种灵活的方式来解释同义词和更一般的字符串等同形式 同义词)。 输入字符串和用户定义的转换规则用于生成在执行记录匹配时使用的较大的一组字符串。 可以使用用户定义的变换规则来转换数据库中的输入字符串和数据元素,以便生成更大的潜在记录匹配集合。 然后可以对这些潜在的记录匹配进行阈值测试,以确定一个或多个最佳匹配。 另外,使用基于签名的相似度函数来提高该技术的计算效率。
-
-
-
-
-
-
-
-
-