Finding Related Entities For Search Queries
    11.
    发明申请
    Finding Related Entities For Search Queries 有权
    查找搜索查询的相关实体

    公开(公告)号:US20080306908A1

    公开(公告)日:2008-12-11

    申请号:US11758024

    申请日:2007-06-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/278 G06F17/30864

    摘要: Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.

    摘要翻译: 用于查找网络搜索查询的相关实体的架构。 提取组件将文档作为输入并输出所有实体的所有提及(或出现),例如文档中的人员,组织,位置和产品的名称以及实体元数据。 索引组件采用文档标识符(docID)和命名实体的提及集合,并存储和索引信息进行检索。 基于文档的搜索组件接受关键字查询,并返回与查询匹配的顶级文档的docID。 检索组件将docID作为输入,访问由索引组件存储的信息,并返回文档中命名实体的提及集。 然后将该信息传递给实体计分和阈值组件,该组件计算每个实体的聚合分数,并选择要返回给用户的实体。

    DESIGNING RECORD MATCHING QUERIES UTILIZING EXAMPLES
    12.
    发明申请
    DESIGNING RECORD MATCHING QUERIES UTILIZING EXAMPLES 有权
    设计记录匹配问题应用实例

    公开(公告)号:US20070294221A1

    公开(公告)日:2007-12-20

    申请号:US11424191

    申请日:2006-06-14

    IPC分类号: G06F17/30

    摘要: The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.

    摘要翻译: 主题公开涉及用于记录匹配的强大且灵活的框架。 该框架便于设计由一组明确定义的原始运算符(例如,关系数据清理...)组成的记录匹配查询或包,其最终可以被执行以匹配记录。 为了协助这样的包装的设计,提供了基于示例的学习技术。 更具体地,可以输入并采用一组匹配和非匹配记录对来促进自动包装生成。 生成的包可以随后被手动和/或自动地变换成为执行而优化的语义上等同的形式。

    Detecting duplicate records in database
    14.
    发明授权
    Detecting duplicate records in database 有权
    检测数据库中的重复记录

    公开(公告)号:US06961721B2

    公开(公告)日:2005-11-01

    申请号:US10186031

    申请日:2002-06-28

    IPC分类号: G06F17/30 G06F7/00

    摘要: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

    摘要翻译: 本发明涉及对数据库中的重复元组的检测。 复制元组的先前的域独立检测依赖于多属性元组之间的标准相似度函数(例如,编辑距离,余弦度量)。 然而,如果这些现有技术的方法用于识别领域特定的缩写和惯例,则会产生大量的假阳性。 根据本发明,基于解释数据仓库中来自多个维度表的记录来实现重复检测的过程,数据仓库与通过雪花模式中的关键 - 外键关系指定的层次相关联。 本发明利用表层次结构中可用的额外知识来开发高质量,可扩展的重复检测过程。

    Segmentation of strings into structured records
    15.
    发明申请
    Segmentation of strings into structured records 有权
    将字符串分割成结构化记录

    公开(公告)号:US20050234906A1

    公开(公告)日:2005-10-20

    申请号:US10825488

    申请日:2004-04-14

    IPC分类号: G06F7/00 G06F17/30

    摘要: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

    摘要翻译: 用于将字符串分割成用于数据库管理系统的组件的系统。 字符串记录的引用表被分割成与数据库属性对应的多个子字符串。 分析属性中的子串以提供假定该属性的开始,中间和结束令牌拓扑的状态模型。 空标记考虑了空属性组件,状态复制允许错误的标记插入和错误。 一旦从干净的数据创建了模型,该过程会将输入记录分解或解析成令牌序列。 该过程然后通过将输入记录的令牌与从参考表导出的属性的状态模型进行比较来确定输入记录的最可能的分割。

    Identifying entity synonyms
    16.
    发明授权

    公开(公告)号:US09600566B2

    公开(公告)日:2017-03-21

    申请号:US12779964

    申请日:2010-05-14

    IPC分类号: G06F17/00 G06F17/30 G06F17/27

    CPC分类号: G06F17/30684 G06F17/2795

    摘要: Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.

    Assisted query formation, validation, and result previewing in a database having a complex schema

    公开(公告)号:US08996559B2

    公开(公告)日:2015-03-31

    申请号:US14058184

    申请日:2013-10-18

    IPC分类号: G06F17/30

    摘要: Disclosed are a method, a device and/or a system of assisted query formation, validation, and result previewing in a database having a complex schema. In one aspect, a method of a query editor includes generating a data profile which includes a set of characteristics captured at various granularities of an initial result set generated from an initial query using a processor and a memory. The method determines what a user expects in the initial result set of the initial query and/or a subsequent result set of a subsequent query based on the data profile and/or a heuristically estimated data profile. The method includes enabling the user to evaluate a semantic accuracy of the subsequent query based on the likely expectation of the user as determined through the set of characteristics of the data profile. The set of characteristics may include metadata of the initial query.

    Assisted query formation, validation, and result previewing in a database having a complex schema
    18.
    发明授权
    Assisted query formation, validation, and result previewing in a database having a complex schema 有权
    在具有复杂模式的数据库中辅助查询形成,验证和结果预览

    公开(公告)号:US08965915B2

    公开(公告)日:2015-02-24

    申请号:US14058189

    申请日:2013-10-18

    IPC分类号: G06F17/30

    摘要: Disclosed are a method, a device and/or a system of assisted query formation, validation, and result previewing in a database having a complex schema. In one aspect, a method of a query editor includes generating a data profile which includes a set of characteristics captured at various granularities of an initial result set generated from an initial query using a processor and a memory. The method determines what a user expects in the initial result set of the initial query and/or a subsequent result set of a subsequent query based on the data profile and/or a heuristically estimated data profile. The method includes enabling the user to evaluate a semantic accuracy of the subsequent query based on the likely expectation of the user as determined through the set of characteristics of the data profile. The set of characteristics may include metadata of the initial query.

    摘要翻译: 公开了具有复杂模式的数据库中的辅助查询形成,验证和结果预览的方法,设备和/或系统。 一方面,一种查询编辑器的方法包括生成数据简档,其包括使用处理器和存储器从初始查询生成的初始结果集的各种粒度捕获的一组特征。 该方法基于数​​据简档和/或启发式估计的数据简档确定初始查询的初始结果集中的用户期望值和/或后续查询的后续结果集。 该方法包括使得用户能够基于通过数据简档的特征集确定的用户的可能期望来评估后续查询的语义准确性。 该特征集可以包括初始查询的元数据。

    Curated answers community automatically populated through user query monitoring
    19.
    发明授权
    Curated answers community automatically populated through user query monitoring 有权
    策略响应社区通过用户查询监控自动填充

    公开(公告)号:US08935272B2

    公开(公告)日:2015-01-13

    申请号:US14058206

    申请日:2013-10-18

    IPC分类号: G06F7/00 G06F17/30

    摘要: In one embodiment, a method of a curated answers system includes automatically populating a profile markup page of a user with information describing an initial query of a database that the user has generated using a processor and a memory, determining that another user of the database has submitted a similar query that is semantically proximate to the initial query of the database that the user has generated, and presenting the profile markup page of the user to the other user. The method of the curated answers system may include enabling the other user to communicate with the user through a communication channel on the profile markup page. A question of the other user may be published to the user on the profile markup page of the user, and/or other profile markup page of the other user. The question may be associated as being posted by the other user.

    摘要翻译: 在一个实施例中,策展答案系统的方法包括使用描述用户使用处理器和存储器生成的数据库的初始查询的信息自动填充用户的简档标记页面,确定数据库的另一用户具有 提交了与用户已经生成的数据库的初始查询语义上接近的类似查询,并将用户的简档标记页面呈现给另一个用户。 策展答案系统的方法可以包括允许其他用户通过简档标记页面上的通信通道与用户通信。 可以在用户的​​简档标记页面和/或其他用户的其他简档标记页面上向用户发布另一用户的问题。 该问题可能会被另一个用户发布。

    EDITABLE AND SEARCHABLE MARKUP PAGES AUTOMATICALLY POPULATED THROUGH USER QUERY MONITORING
    20.
    发明申请
    EDITABLE AND SEARCHABLE MARKUP PAGES AUTOMATICALLY POPULATED THROUGH USER QUERY MONITORING 有权
    可编辑和可搜索的MARKUP页面通过用户查询监控自动播放

    公开(公告)号:US20140279845A1

    公开(公告)日:2014-09-18

    申请号:US14058208

    申请日:2013-10-18

    IPC分类号: G06F17/30

    摘要: Disclosed are a method, a device and/or a system of editable and searchable markup pages automatically populated through query monitoring of users of a database. In one aspect, a method includes automatically generating an editable markup page and/or a page name based on an initial query of a database using a processor and a memory, associating the generated markup page with a user of the database, and appending information to the editable markup page based on a similar query of the database by another user. The method may include permitting other users of the database to access, modify, append, and/or delete entries from the editable mark-up page.

    摘要翻译: 公开了通过对数据库的用户的查询监视自动填充的可编辑和可搜索的标记页面的方法,设备和/或系统。 一方面,一种方法包括使用处理器和存储器,基于使用处理器和存储器的数据库的初始查询来自动生成可编辑标记页和/或页名,将生成的标记页与数据库的用户相关联,并将信息附加到 可编辑的标记页面基于另一个用户对数据库的类似查询。 该方法可以包括允许数据库的其他用户从可编辑标记页面访问,修改,附加和/或删除条目。