Spelling correction with liaoalphagrams and inverted index
    1.
    发明授权
    Spelling correction with liaoalphagrams and inverted index 有权
    拼写纠正与iao舌和倒排指数

    公开(公告)号:US07856598B2

    公开(公告)日:2010-12-21

    申请号:US11481750

    申请日:2006-07-06

    CPC分类号: G06F17/273

    摘要: Systems, methods, media, and other embodiments associated with (non)contiguous n-gram based spell correction are described. One exemplary system embodiment includes logic for creating contiguous and non-contiguous trigrams, logic for creating an inverted index relating trigrams and the words from which they were generated, and logic for comparing trigrams associated with a word to spell check to trigrams associated with the words selected using the inverted index.

    摘要翻译: 描述与(非)连续的基于n-gram的拼写校正相关联的系统,方法,介质和其他实施例。 一个示例性系统实施例包括用于创建连续和不连续的三元组的逻辑,用于创建与三角形相关联的反向索引的逻辑和从其产生的单词的逻辑,以及用于将与单词相关联的三元组与拼写检查相对应的逻辑与用于与单词相关联的三元组 使用反向索引选择。

    Combined database index of unstructured and structured columns
    2.
    发明授权
    Combined database index of unstructured and structured columns 有权
    非结构化和结构化列的组合数据库索引

    公开(公告)号:US06980976B2

    公开(公告)日:2005-12-27

    申请号:US09928894

    申请日:2001-08-13

    IPC分类号: G06F7/00 G06F17/30

    摘要: A database management system and method provides access to a data table having structured data and unstructured data. A user interface allows a user to issue instructions to the database management system such as to build an index based on the structured and unstructured data and to search the data table. An indexing logic generates an index structure by combining the structured and unstructured data. With this index structure, a single query can contain search conditions from both the structured data and the unstructured data. In this manner, efficiency for searching the data table for combined structured and unstructured conditions is improved.

    摘要翻译: 数据库管理系统和方法提供对具有结构化数据和非结构化数据的数据表的访问。 用户界面允许用户向数据库管理系统发出指令,例如基于结构化和非结构化数据构建索引并搜索数据表。 索引逻辑通过组合结构化和非结构化数据来生成索引结构。 使用此索引结构,单个查询可以包含来自结构化数据和非结构化数据的搜索条件。 以这种方式,提高了用于搜索数据表以获得组合的结构化和非结构化条件的效率。

    Method and system for response time optimization of data query rankings and retrieval
    3.
    发明授权
    Method and system for response time optimization of data query rankings and retrieval 有权
    数据查询排序和检索的响应时间优化方法和系统

    公开(公告)号:US06947920B2

    公开(公告)日:2005-09-20

    申请号:US09885356

    申请日:2001-06-20

    申请人: Shamim A. Alpha

    发明人: Shamim A. Alpha

    IPC分类号: G06F17/30

    摘要: A method and system for optimizing response time for data query rankings and retrieval is provided. In response to a received search query that contains one or more terms, an information retrieval system identifies a candidate set of documents that match any of the terms. Terms are assigned a term weight making them more or less relevant in relation to other terms. A ranking logic defines score bins from a total score range based on possible matched term weights. A relationship is established that classifies a document into a score bin based on a sum of term weights from matched terms. Documents that match more term weights have higher total relevance scores than documents that match less term weights. The most relevant documents are retrievable without having to retrieve the entire set of candidate documents and without having to compute total relevance scores for all the candidate documents.

    摘要翻译: 提供了一种优化数据查询排序和检索响应时间的方法和系统。 响应于包含一个或多个术语的接收到的搜索查询,信息检索系统识别符合任何术语的文档的候选集合。 条款被赋予一个术语权重,使得它们与其他术语相关或多或少相关。 排名逻辑根据可能的匹配项权重从总分数范围定义得分仓。 建立了一种关系,根据匹配术语的术语权重之和,将文档分类为分数分数。 符合更多术语权重的文件与总体相关性较高的文献较高。 最相关的文件是可检索的,而不必检索整套候选文件,而不必计算所有候选文件的总相关性分数。

    Spelling correction with liaoalphagrams and inverted index
    4.
    发明申请
    Spelling correction with liaoalphagrams and inverted index 有权
    拼写纠正与iao舌和倒排指数

    公开(公告)号:US20080010316A1

    公开(公告)日:2008-01-10

    申请号:US11481750

    申请日:2006-07-06

    IPC分类号: G06F17/00 G06F17/30 G06F7/00

    CPC分类号: G06F17/273

    摘要: Systems, methods, media, and other embodiments associated with (non)contiguous n-gram based spell correction are described. One exemplary system embodiment includes logic for creating contiguous and non-contiguous trigrams, logic for creating an inverted index relating trigrams and the words from which they were generated, and logic for comparing trigrams associated with a word to spell check to trigrams associated with the words selected using the inverted index.

    摘要翻译: 描述与(非)连续的基于n-gram的拼写校正相关联的系统,方法,介质和其他实施例。 一个示例性系统实施例包括用于创建连续和不连续的三元组的逻辑,用于创建与三角形相关联的反向索引的逻辑和从其产生的单词的逻辑,以及用于将与单词相关联的三元组与拼写检查相对应的逻辑与用于与单词相关联的三元组 使用反向索引选择。

    Document ranking with sub-query series
    5.
    发明授权
    Document ranking with sub-query series 有权
    文件排序与子查询系列

    公开(公告)号:US07849077B2

    公开(公告)日:2010-12-07

    申请号:US11481686

    申请日:2006-07-06

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672 G06F17/30864

    摘要: Systems, methods, media, and other embodiments associated with ranking documents by providing a search engine with a series of sub-queries generated from an original query are described. One example system includes input logic for receiving a query. The example system may include a relaxation logic configured to produce sub-queries from the query. The sub-queries may describe metadata string matching, content string matching, and/or metadata numerical attribute analysis. The sub-queries may be provided by an output logic to a search engine in an order that facilitates defining document relevance without requiring post-retrieval relevance ranking.

    摘要翻译: 描述了通过向搜索引擎提供从原始查询生成的一系列子查询来对与文档排序相关联的系统,方法,媒体和其他实施例。 一个示例系统包括用于接收查询的输入逻辑。 示例系统可以包括被配置为从查询产生子查询的放松逻辑。 子查询可以描述元数据字符串匹配,内容字符串匹配和/或元数据值属性分析。 子查询可以通过输出逻辑以搜索引擎的顺序提供,这有助于定义文档相关性,而不需要后检索相关性排名。

    Document ranking with sub-query series
    6.
    发明申请
    Document ranking with sub-query series 有权
    文件排序与子查询系列

    公开(公告)号:US20080010268A1

    公开(公告)日:2008-01-10

    申请号:US11481686

    申请日:2006-07-06

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30672 G06F17/30864

    摘要: Systems, methods, media, and other embodiments associated with ranking documents by providing a search engine with a series of sub-queries generated from an original query are described. One example system includes input logic for receiving a query. The example system may include a relaxation logic configured to produce sub-queries from the query. The sub-queries may describe metadata string matching, content string matching, and/or metadata numerical attribute analysis. The sub-queries may be provided by an output logic to a search engine in an order that facilitates defining document relevance without requiring post-retrieval relevance ranking.

    摘要翻译: 描述了通过向搜索引擎提供从原始查询生成的一系列子查询来对与文档排序相关联的系统,方法,媒体和其他实施例。 一个示例系统包括用于接收查询的输入逻辑。 示例系统可以包括被配置为从查询产生子查询的放松逻辑。 子查询可以描述元数据字符串匹配,内容字符串匹配和/或元数据值属性分析。 子查询可以通过输出逻辑以搜索引擎的顺序提供,这有助于定义文档相关性,而不需要后检索相关性排名。

    Methods and systems for determining a language of a document
    7.
    发明授权
    Methods and systems for determining a language of a document 有权
    用于确定文档语言的方法和系统

    公开(公告)号:US07191116B2

    公开(公告)日:2007-03-13

    申请号:US09884403

    申请日:2001-06-19

    申请人: Shamim A Alpha

    发明人: Shamim A Alpha

    IPC分类号: G06F17/20

    CPC分类号: G06F17/275

    摘要: A system and method for determining the language of an unknown document is provided. For a set of candidate languages, a negative assumption is set for each candidate language that the document is not that language and the system attempts to prove the negative assumption is wrong. If the negative assumption fails for one language, then the document is identified as being in that language. The present system and method provides a higher degree of accuracy when determining the language of a document.

    摘要翻译: 提供了一种用于确定未知文档的语言的系统和方法。 对于一组候选语言,为每个候选语言设置一个负面假设,该文档不是该语言,并且系统尝试证明该负面假设是错误的。 如果一种语言的负面假设失败,则该文档被标识为该语言。 当确定文档的语言时,本系统和方法提供更高的准确度。

    Method and system of language detection
    8.
    发明授权
    Method and system of language detection 有权
    语言检测方法与系统

    公开(公告)号:US07979266B2

    公开(公告)日:2011-07-12

    申请号:US11700672

    申请日:2007-01-31

    申请人: Shamim A. Alpha

    发明人: Shamim A. Alpha

    IPC分类号: G06F17/20

    CPC分类号: G06F17/275

    摘要: Systems, methods, computer-readable media and other embodiments are provided for automatically determining a language of a document from a set of candidate languages. In one embodiment, a system includes a logic for setting an assumption value associated with each of the languages of the set of candidate languages where the assumption value indicates that the document is not in the language. A language analyzer determines the language and generates an output that indicates that the document is one language of the candidate languages when the assumption value for the one language passes a threshold value.

    摘要翻译: 系统,方法,计算机可读介质和其他实施例被提供用于从一组候选语言自动确定文档的语言。 在一个实施例中,系统包括用于设置与候选语言集合中的每种语言相关联的假设值的逻辑,其中假定值指示该文档不在该语言中。 语言分析器确定语言,并产生一个输出,指示当一种语言的假设值通过阈值时该文档是候选语言的一种语言。

    Linguistically aware link analysis method and system

    公开(公告)号:US07010527B2

    公开(公告)日:2006-03-07

    申请号:US09928962

    申请日:2001-08-13

    申请人: Shamim A. Alpha

    发明人: Shamim A. Alpha

    IPC分类号: G06F17/30

    摘要: A method and system for determining relevance rankings for pages identified in a search query is provided. In response to the search query, an information retrieval system identifies candidate pages/documents from a network that potentially match the search query. A relevance ranking system determines a relevance value for each of candidate pages so that the most relevant pages are displayed to a user. The relevance value is based on a combination of content-based relevance values of the pages and link values determined from a link structure of the pages. A link value is a function of a probability that a user will follow the link as compared to following all other links. With the present invention, improved relevance rankings are obtained for a candidate set of pages.