-
公开(公告)号:US20060047646A1
公开(公告)日:2006-03-02
申请号:US10943652
申请日:2004-09-01
申请人: David Maluf , David Bell , Mohana Gurram , Yuri Gawdiak
发明人: David Maluf , David Bell , Mohana Gurram , Yuri Gawdiak
CPC分类号: G06F16/8358
摘要: Method and system for querying a collection of unstructured and semi-structured documents in a specified database to identify presence of, and provide context and/or content for, keywords and/or keyphrases. The documents are analyzed and assigned a node structure, including an ordered sequence of mutually exclusive node segments or strings. Each node has an associated set of at least four, five or six attributes with node information and can represent a format marker or text, with the last node in any node segment usually being a text node. A keyword (or keyphrase) query is specified, the query is converted to a statement that is recognized and respondeed to by the specified database, and the last node in each node segment is searched for a match with the keyword. When a match is found at a query node, or at a node determined with reference to a query node, the system displays the context and/or the content of the query node.
摘要翻译: 用于在指定数据库中查询非结构化和半结构化文档集合以识别关键字和/或关键短语的存在以及为关键字和/或关键短语提供上下文和/或内容的方法和系统。 对文档进行分析并分配一个节点结构,包括互斥节点段或字符串的有序序列。 每个节点具有至少四个,五个或六个具有节点信息的属性的关联集合,并且可以表示格式标记或文本,其中任何节点段中的最后一个节点通常是文本节点。 指定了关键字(或关键短语)查询,将查询转换为指定数据库识别和响应的语句,并搜索每个节点段中的最后一个节点与关键字的匹配。 当在查询节点或在参考查询节点确定的节点上找到匹配时,系统显示查询节点的上下文和/或内容。