Extraction of information from documents
    21.
    发明授权
    Extraction of information from documents 有权
    从文件中提取信息

    公开(公告)号:US07469251B2

    公开(公告)日:2008-12-23

    申请号:US11192687

    申请日:2005-07-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/211 Y10S707/99935

    摘要: An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.

    摘要翻译: 对标示的培训文件中标识的格式特征进行信息提取模型的培训。 通过根据文档中单位的格式特征为单位分配标签来提取文档中的信息。 识别开始标签和结束标签,并在开始标签和结束标签之间提取信息。 提取的信息可以用于各种文档处理任务,如排名。

    Mining latent associations of objects using a typed mixture model
    22.
    发明申请
    Mining latent associations of objects using a typed mixture model 有权
    使用类型混合模型挖掘物体的潜在关联

    公开(公告)号:US20080147654A1

    公开(公告)日:2008-06-19

    申请号:US11786636

    申请日:2007-04-12

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30

    摘要: A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.

    摘要翻译: 类型的可分离混合模型用于挖掘对象集合之间的关联关系。 类型的可分离混合模型不是仅对一组对象中的一种类型的同现进行建模,而是可以在多于两组的对象之间建立多种不同类型的并发事件,以及存在于不同上下文中的共同事件。

    Learning and using generalized string patterns for information extraction
    23.
    发明授权
    Learning and using generalized string patterns for information extraction 有权
    学习和使用广义字符串模式进行信息提取

    公开(公告)号:US07299228B2

    公开(公告)日:2007-11-20

    申请号:US10733541

    申请日:2003-12-11

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3061 Y10S707/99936

    摘要: The present invention relates to extracting information from an information source. During extraction, strings in the information source are accessed. These strings in the information source are matched with generalized extraction patterns that include words and wildcards. The wildcards denote that at least one word in an individual string can be skipped in order to match the individual string to an individual generalized extraction pattern.

    摘要翻译: 本发明涉及从信息源提取信息。 在提取期间,访问信息源中的字符串。 信息源中的这些字符串与包括单词和通配符的通用提取模式相匹配。 通配符表示可以跳过单个字符串中的至少一个单词,以便将单个字符串与单个通用提取模式相匹配。

    Search By Document Type And Relevance
    24.
    发明申请
    Search By Document Type And Relevance 审中-公开
    按文件类型和相关性搜索

    公开(公告)号:US20070150473A1

    公开(公告)日:2007-06-28

    申请号:US11383638

    申请日:2006-05-16

    申请人: Hang Li Yunbo Cao Jun Xu

    发明人: Hang Li Yunbo Cao Jun Xu

    IPC分类号: G06F7/00

    CPC分类号: G06F16/951

    摘要: A method of finding documents. A method of finding documents comprising, ranking documents according to relevance to form a ranked relevance list, ranking documents according to type to form a ranked type list, and combining the ranked relevance list and the ranked type list to form a list of documents ranked by relevance and type.

    摘要翻译: 查找文档的方法。 一种查找文档的方法,包括:根据相关性对文档进行排序以形成排名相关性列表,根据类型排列文档以形成排名类型列表,以及组合排名相关性列表和排名类型列表,以形成由 相关性和类型。

    Factoid-based searching
    25.
    发明申请
    Factoid-based searching 有权
    基于实质的搜索

    公开(公告)号:US20070136280A1

    公开(公告)日:2007-06-14

    申请号:US11302560

    申请日:2005-12-13

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

    摘要翻译: 从用户接收到查询和事实类型选择。 访问基于事实的索引的段落索引,并检索与查询相关的段落,并且具有所选择的实例类型的段落。 检索到的段落按照排列顺序根据计算得分排列并提供给用户。

    Ranking and accessing definitions of terms
    26.
    发明申请
    Ranking and accessing definitions of terms 失效
    排名和访问术语的定义

    公开(公告)号:US20060248049A1

    公开(公告)日:2006-11-02

    申请号:US11115500

    申请日:2005-04-27

    申请人: Yunbo Cao Hang Li Jun Xu

    发明人: Yunbo Cao Hang Li Jun Xu

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30654 G06F2216/03

    摘要: A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.

    摘要翻译: 提供了处理信息的方法。 该方法包括从数据源收集定义候选的文本串。 定义候选人基于文本字符串进行排名。

    Text mining method
    27.
    发明申请
    Text mining method 审中-公开
    文本挖掘方法

    公开(公告)号:US20050283357A1

    公开(公告)日:2005-12-22

    申请号:US10970586

    申请日:2004-10-21

    IPC分类号: G06F17/28 G06F17/30

    CPC分类号: G06F16/313

    摘要: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.

    摘要翻译: 提供了一种执行数据挖掘的方法。 该方法包括选择非结构化文本的至少一个数据源。 此外,选择转换以识别非结构化文本中的术语列表。 建立运行时路径以将数据源连接到转换,以将标识的术语列表加载到目标数据库中。

    Method and apparatus for browsing document content
    28.
    发明申请
    Method and apparatus for browsing document content 有权
    用于浏览文档内容的方法和装置

    公开(公告)号:US20050108266A1

    公开(公告)日:2005-05-19

    申请号:US10714540

    申请日:2003-11-14

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F17/00 G06F17/22 G06F17/30

    摘要: A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.

    摘要翻译: 提供了一种计算机实现的方法,其包括接收文档并确定文档的文件类型。 另外,根据文件类型将文档分割成文本块,并为文档生成至少一个关键字和摘要。

    Two stage search
    29.
    发明授权
    Two stage search 有权
    两级搜索

    公开(公告)号:US08849787B2

    公开(公告)日:2014-09-30

    申请号:US13343160

    申请日:2012-01-04

    申请人: Yunbo Cao Hang Li

    发明人: Yunbo Cao Hang Li

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30684

    摘要: A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.

    摘要翻译: 两阶段模型识别在与查询相关的主题领域具有知识的个人。 相关性模型接收查询并识别与查询相关的文档或其他信息。 共同模型识别检索到的文档中与查询主题相关的个人。 通过将来自相关性模型和同现模型的分数与排序顺序列表中的输出相结合,可以对所识别的个体进行评分。

    Clustering question search results based on topic and focus
    30.
    发明授权
    Clustering question search results based on topic and focus 有权
    基于主题和焦点的聚类问题搜索结果

    公开(公告)号:US08024332B2

    公开(公告)日:2011-09-20

    申请号:US12185702

    申请日:2008-08-04

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30696

    摘要: A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.

    摘要翻译: 提供了一种方法和系统,用于根据问题的集群和问题的聚焦集提出与查询问题相关的问题。 问题搜索系统提供了一系列问题。 集合的每个问题都有相关的主题和焦点。 在收到查询问题后,问题搜索系统识别可能与查询问题相关的集合问题,并生成指示所识别问题的相关性的分数或排名。 问题搜索系统将识别的问题集中到具有相似主题的主题问题集群中。 问题搜索系统还可以将每个主题集群中的问题集中到具有类似重点的问题焦点集群中。