-
1.
公开(公告)号:US20080082505A1
公开(公告)日:2008-04-03
申请号:US11851260
申请日:2007-09-06
申请人: Tomoharu Kokubu , Toshihiko Manabe , Tetsuya Sakai
发明人: Tomoharu Kokubu , Toshihiko Manabe , Tetsuya Sakai
IPC分类号: G06F17/30
CPC分类号: G06F16/24534 , G06F16/2452 , G06F16/3332
摘要: A document searching apparatus includes an input unit that inputs a search query for conducting a search in a structured document, the structured document being obtained by expressing elements included in a document in a hierarchical manner; a query converting unit that converts a query sentence constituting the search query and a search target element of the query sentence according to a predetermined rule so as to generate a new search query; a document searching unit that searches the structured document by using the new search query; and a search-result presenting unit that presents a result of the search.
摘要翻译: 文档搜索装置包括:输入单元,其输入用于在结构化文档中进行搜索的搜索查询,所述结构化文档通过以分层方式表达包括在文档中的元素获得; 查询转换单元,其根据预定规则转换构成搜索查询的查询语句和查询语句的搜索目标元素,以生成新的搜索查询; 文档搜索单元,其通过使用所述新的搜索查询来搜索所述结构化文档; 以及呈现搜索结果的搜索结果呈现单元。
-
公开(公告)号:US08812504B2
公开(公告)日:2014-08-19
申请号:US13216380
申请日:2011-08-24
申请人: Tomoharu Kokubu , Toshihiko Manabe , Kosei Fume , Wataru Nakano , Hiromi Wakaki
发明人: Tomoharu Kokubu , Toshihiko Manabe , Kosei Fume , Wataru Nakano , Hiromi Wakaki
CPC分类号: G06F17/3064
摘要: According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.
摘要翻译: 根据一个实施例,关键词呈现装置包括提取单元,选择单元和聚类单元。 提取单元被配置为从文档集合中提取未在通用概念字典中定义的语素字符串作为技术术语。 选择单元被配置为评估每个基本术语候选者和技术术语之间的相关性,并优先选择具有高相关性的基本术语候选者作为基本术语。 聚类单元被配置为基于文档集来计算基本项之间的统计学相关度的加权和,以基于一般概念词典计算基本术语之间的概念相关度,并且基于 加权总和。
-
公开(公告)号:US20120078907A1
公开(公告)日:2012-03-29
申请号:US13216380
申请日:2011-08-24
申请人: Tomoharu Kokubu , Toshihiko Manabe , Kosei Fume , Wataru Nakano , Hiromi Wakaki
发明人: Tomoharu Kokubu , Toshihiko Manabe , Kosei Fume , Wataru Nakano , Hiromi Wakaki
IPC分类号: G06F17/30
CPC分类号: G06F17/3064
摘要: According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.
摘要翻译: 根据一个实施例,关键词呈现装置包括提取单元,选择单元和聚类单元。 提取单元被配置为从文档集合中提取未在通用概念字典中定义的语素字符串作为技术术语。 选择单元被配置为评估每个基本术语候选者和技术术语之间的相关性,并优先选择具有高相关性的基本术语候选者作为基本术语。 聚类单元被配置为基于文档集来计算基本项之间的统计学相关度的加权和,以基于一般概念词典计算基本术语之间的概念相关度,并且基于 加权总和。
-
公开(公告)号:US20090138473A1
公开(公告)日:2009-05-28
申请号:US12205636
申请日:2008-09-05
申请人: Toshihiko Manabe , Tomoharu Kokubu
发明人: Toshihiko Manabe , Tomoharu Kokubu
IPC分类号: G06F17/30
CPC分类号: G06F16/334
摘要: An apparatus for retrieving structured documents includes a first categorizing unit configured to categorize components into a first component of typical descriptions and a second component of atypical descriptions, based on statistics information for the components, a second categorizing unit configured to categorize the terms into a first term whose appearance ratio in the first component exceeds a threshold and a second term whose appearance ratio in the first component is not more than the threshold, an extraction unit configured to extract a set of structured documents each having the first component including the first term and the second component from the structured documents, and a ranking unit configured to rank the set of structured documents by a retrieval score calculating based o a relation between the second term and the second component.
摘要翻译: 一种用于检索结构化文档的装置包括:第一分类单元,被配置为基于组件的统计信息将组件分类为典型描述的第一组件和非典型描述的第二组件;第二分类单元,被配置为将所述术语分类为第一 所述第一成分的出现比率超过阈值的项目,以及所述第一成分的出现比率不大于所述阈值的第二项,提取单元,被配置为提取具有包括所述第一项的第一成分的一组结构化文档, 来自结构化文档的第二组件,以及排列单元,被配置为通过基于第二项和第二组件之间的关系的检索分数来计算该组结构化文档。
-
公开(公告)号:US09146999B2
公开(公告)日:2015-09-29
申请号:US12372570
申请日:2009-02-17
申请人: Masaru Suzuki , Tomoharu Kokubu
发明人: Masaru Suzuki , Tomoharu Kokubu
CPC分类号: G06F17/30867
摘要: A search keyword improvement apparatus includes a unit extracting a word as an additional keyword candidate from a new document, number of times of appearance of the word in the new document being greater than number of times of appearance of the word in each of a first documents except for the new document, if the new document and a new search target identification information item which is used to search the new document are accumulated, a unit generating a first search query based on an input keyword, a second search target associated with the input keyword, and one of the additional keywords, and generating a second search query, a unit moving the additional keyword candidate and the third search target identification information item, if the desired search result is selected from a third search result list corresponding to the second search query.
摘要翻译: 搜索关键词改进装置包括从新文档中提取作为附加关键词候选词的单元的单元,新文档中单词出现的次数大于第一文档中的单词的出现次数 除了新文件之外,如果用于搜索新文档的新文档和新的搜索目标识别信息项被累积,则基于输入关键字生成第一搜索查询的单元,与输入相关联的第二搜索目标 关键词和附加关键字之一,并且如果从对应于第二搜索的第三搜索结果列表中选择期望的搜索结果,则生成第二搜索查询,移动附加关键词候选的单元和第三搜索目标识别信息项 查询。
-
公开(公告)号:US20090248674A1
公开(公告)日:2009-10-01
申请号:US12372570
申请日:2009-02-17
申请人: Masaru SUZUKI , Tomoharu Kokubu
发明人: Masaru SUZUKI , Tomoharu Kokubu
IPC分类号: G06F17/30
CPC分类号: G06F17/30867
摘要: A search keyword improvement apparatus includes a unit extracting a word as an additional keyword candidate from a new document, number of times of appearance of the word in the new document being greater than number of times of appearance of the word in each of a first documents except for the new document, if the new document and a new search target identification information item which is used to search the new document are accumulated, a unit generating a first search query based on an input keyword, a second search target associated with the input keyword, and one of the additional keywords, and generating a second search query, a unit moving the additional keyword candidate and the third search target identification information item, if the desired search result is selected from a third search result list corresponding to the second search query.
摘要翻译: 搜索关键词改进装置包括从新文档中提取作为附加关键词候选词的单元的单元,新文档中单词出现的次数大于第一文档中的单词的出现次数 除了新文件之外,如果用于搜索新文档的新文档和新的搜索目标识别信息项被累积,则基于输入关键字生成第一搜索查询的单元,与输入相关联的第二搜索目标 关键词和附加关键字之一,并且如果从对应于第二搜索的第三搜索结果列表中选择期望的搜索结果,则生成第二搜索查询,移动附加关键词候选的单元和第三搜索目标识别信息项 查询。
-
-
-
-
-