Invention Grant
- Patent Title: Integrated fuzzy joins in database management systems
- Patent Title (中): 在数据库管理系统中集成模糊连接
-
Application No.: US13253315Application Date: 2011-10-05
-
Publication No.: US09317544B2Publication Date: 2016-04-19
- Inventor: Kris Ganjam , Vivek Ravindranath Narasayya , Raghav Kaushik , Arvind Arasu , Surajit Chaudhuri
- Applicant: Kris Ganjam , Vivek Ravindranath Narasayya , Raghav Kaushik , Arvind Arasu , Surajit Chaudhuri
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agent Alin Corie; Sandy Swain; Micky Minhas
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A fuzzy joins system that is integrated in a database system generates fuzzy joins between records from two datasets. The fuzzy joins system includes a tokenizer to generate tokens for data records and a transformer to find transforms for the tokens. The fuzzy joins system invokes a signature generator, running within a runtime layer of the database system, to generate signatures for data records based on the tokens and their transforms. Subsequently, an equi-join operation joins the records from the two datasets with at least one equal signature. A similarity calculator, running within a runtime layer of the database system, computes a similarity measure using the token information of the joined records. If the similarity measure for any two records is above a threshold, the fuzzy joins system generates a fuzzy join between such two records.
Public/Granted literature
- US20130091120A1 INTEGRATED FUZZY JOINS IN DATABASE MANAGEMENT SYSTEMS Public/Granted day:2013-04-11
Information query