DOCUMENT SEARCHING APPARATUS AND COMPUTER PROGRAM PRODUCT THEREFOR
    1.
    发明申请
    DOCUMENT SEARCHING APPARATUS AND COMPUTER PROGRAM PRODUCT THEREFOR 审中-公开
    文件搜索器和计算机程序产品

    公开(公告)号:US20080082505A1

    公开(公告)日:2008-04-03

    申请号:US11851260

    申请日:2007-09-06

    IPC分类号: G06F17/30

    摘要: A document searching apparatus includes an input unit that inputs a search query for conducting a search in a structured document, the structured document being obtained by expressing elements included in a document in a hierarchical manner; a query converting unit that converts a query sentence constituting the search query and a search target element of the query sentence according to a predetermined rule so as to generate a new search query; a document searching unit that searches the structured document by using the new search query; and a search-result presenting unit that presents a result of the search.

    摘要翻译: 文档搜索装置包括:输入单元,其输入用于在结构化文档中进行搜索的搜索查询,所述结构化文档通过以分层方式表达包括在文档中的元素获得; 查询转换单元,其根据预定规则转换构成搜索查询的查询语句和查询语句的搜索目标元素,以生成新的搜索查询; 文档搜索单元,其通过使用所述新的搜索查询来搜索所述结构化文档; 以及呈现搜索结果的搜索结果呈现单元。

    Keyword presentation apparatus and method
    2.
    发明授权
    Keyword presentation apparatus and method 有权
    关键词呈现装置及方法

    公开(公告)号:US08812504B2

    公开(公告)日:2014-08-19

    申请号:US13216380

    申请日:2011-08-24

    IPC分类号: G06F17/00 G06F17/30

    CPC分类号: G06F17/3064

    摘要: According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.

    摘要翻译: 根据一个实施例,关键词呈现装置包括提取单元,选择单元和聚类单元。 提取单元被配置为从文档集合中提取未在通用概念字典中定义的语素字符串作为技术术语。 选择单元被配置为评估每个基本术语候选者和技术术语之间的相关性,并优先选择具有高相关性的基本术语候选者作为基本术语。 聚类单元被配置为基于文档集来计算基本项之间的统计学相关度的加权和,以基于一般概念词典计算基本术语之间的概念相关度,并且基于 加权总和。

    KEYWORD PRESENTATION APPARATUS AND METHOD
    3.
    发明申请
    KEYWORD PRESENTATION APPARATUS AND METHOD 有权
    关键词介绍及方法

    公开(公告)号:US20120078907A1

    公开(公告)日:2012-03-29

    申请号:US13216380

    申请日:2011-08-24

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3064

    摘要: According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.

    摘要翻译: 根据一个实施例,关键词呈现装置包括提取单元,选择单元和聚类单元。 提取单元被配置为从文档集合中提取未在通用概念字典中定义的语素字符串作为技术术语。 选择单元被配置为评估每个基本术语候选者和技术术语之间的相关性,并优先选择具有高相关性的基本术语候选者作为基本术语。 聚类单元被配置为基于文档集来计算基本项之间的统计学相关度的加权和,以基于一般概念词典计算基本术语之间的概念相关度,并且基于 加权总和。

    APPARATUS AND METHOD FOR RETRIEVING STRUCTURED DOCUMENTS
    4.
    发明申请
    APPARATUS AND METHOD FOR RETRIEVING STRUCTURED DOCUMENTS 审中-公开
    检索结构化文档的装置和方法

    公开(公告)号:US20090138473A1

    公开(公告)日:2009-05-28

    申请号:US12205636

    申请日:2008-09-05

    IPC分类号: G06F17/30

    CPC分类号: G06F16/334

    摘要: An apparatus for retrieving structured documents includes a first categorizing unit configured to categorize components into a first component of typical descriptions and a second component of atypical descriptions, based on statistics information for the components, a second categorizing unit configured to categorize the terms into a first term whose appearance ratio in the first component exceeds a threshold and a second term whose appearance ratio in the first component is not more than the threshold, an extraction unit configured to extract a set of structured documents each having the first component including the first term and the second component from the structured documents, and a ranking unit configured to rank the set of structured documents by a retrieval score calculating based o a relation between the second term and the second component.

    摘要翻译: 一种用于检索结构化文档的装置包括:第一分类单元,被配置为基于组件的统计信息将组件分类为典型描述的第一组件和非典型描述的第二组件;第二分类单元,被配置为将所述术语分类为第一 所述第一成分的出现比率超过阈值的项目,以及所述第一成分的出现比率不大于所述阈值的第二项,提取单元,被配置为提取具有包括所述第一项的第一成分的一组结构化文档, 来自结构化文档的第二组件,以及排列单元,被配置为通过基于第二项和第二组件之间的关系的检索分数来计算该组结构化文档。

    Information search apparatus and system
    5.
    发明授权
    Information search apparatus and system 有权
    信息搜索装置和系统

    公开(公告)号:US09003284B2

    公开(公告)日:2015-04-07

    申请号:US13369417

    申请日:2012-02-09

    摘要: According to one embodiment, an information search apparatus includes a generation unit, a selection unit, a search unit and a display unit. The generation unit generates recognition candidate character strings based on shapes of strokes and combinations of the shapes. The selection unit calculates reliability values for the recognition candidate character strings and selects search keys from the recognition candidate character strings. The search unit searches a database for second character strings including the search keys, and obtains one or more result character strings indicating search results of each of the search keys. The display displays the one or more result character strings corresponding to each of the search keys distinctively.

    摘要翻译: 根据一个实施例,信息搜索装置包括生成单元,选择单元,搜索单元和显示单元。 生成单元基于笔画的形状和形状的组合生成识别候选字符串。 选择单元计算识别候选字符串的可靠性值,并从识别候选字符串中选择搜索关键字。 搜索单元在数据库中搜索包括搜索关键字的第二字符串,并获得指示每个搜索关键字的搜索结果的一个或多个结果字符串。 显示器显着地显示与每个搜索关键字相对应的一个或多个结果字符串。

    Feature-vector generation apparatus, search apparatus, feature-vector generation method, search method and program
    6.
    发明授权
    Feature-vector generation apparatus, search apparatus, feature-vector generation method, search method and program 有权
    特征向量生成装置,搜索装置,特征向量生成方法,搜索方法和程序

    公开(公告)号:US08036261B2

    公开(公告)日:2011-10-11

    申请号:US11269640

    申请日:2005-11-09

    IPC分类号: G06F7/00 H04B1/66 G10L15/26

    摘要: A feature-vector generation apparatus includes an input unit configured to input content data including at least one of video data and audio data, a generation unit configured to generate a feature vector, based on information indicating a time at which a characterizing state of the content data appears, the characterizing state being characterized by a change of the at least one of the video data and the audio data, and a storage unit configured to store the content data and the feature vector.

    摘要翻译: 特征矢量生成装置包括输入单元,其被配置为输入包括视频数据和音频数据中的至少一个的内容数据,生成单元,被配置为基于表示内容的表征状态的时间的信息生成特征向量 数据出现,表征状态的特征在于视频数据和音频数据中的至少一个的改变,以及被配置为存储内容数据和特征向量的存储单元。

    INFORMATION SEARCH APPARATUS AND SYSTEM
    8.
    发明申请
    INFORMATION SEARCH APPARATUS AND SYSTEM 有权
    信息搜索装置和系统

    公开(公告)号:US20120139859A1

    公开(公告)日:2012-06-07

    申请号:US13369417

    申请日:2012-02-09

    IPC分类号: G06F3/041

    摘要: According to one embodiment, an information search apparatus includes a generation unit, a selection unit, a search unit and a display unit. The generation unit generates recognition candidate character strings based on shapes of strokes and combinations of the shapes. The selection unit calculates reliability values for the recognition candidate character strings and selects search keys from the recognition candidate character strings. The search unit searches a database for second character strings including the search keys, and obtains one or more result character strings indicating search results of each of the search keys. The display displays the one or more result character strings corresponding to each of the search keys distinctively.

    摘要翻译: 根据一个实施例,信息搜索装置包括生成单元,选择单元,搜索单元和显示单元。 生成单元基于笔画的形状和形状的组合生成识别候选字符串。 选择单元计算识别候选字符串的可靠性值,并从识别候选字符串中选择搜索关键字。 搜索单元在数据库中搜索包括搜索关键字的第二字符串,并获得指示每个搜索关键字的搜索结果的一个或多个结果字符串。 显示器显着地显示与每个搜索关键字相对应的一个或多个结果字符串。

    Feature-vector generation apparatus, search apparatus, feature-vector generation method, search method and program
    9.
    发明申请
    Feature-vector generation apparatus, search apparatus, feature-vector generation method, search method and program 有权
    特征向量生成装置,搜索装置,特征向量生成方法,搜索方法和程序

    公开(公告)号:US20060101065A1

    公开(公告)日:2006-05-11

    申请号:US11269640

    申请日:2005-11-09

    IPC分类号: G06F17/00 G06F7/00

    摘要: A feature-vector generation apparatus includes an input unit configured to input content data including at least one of video data and audio data, a generation unit configured to generate a feature vector, based on information indicating a time at which a characterizing state of the content data appears, the characterizing state being characterized by a change of the at least one of the video data and the audio data, and a storage unit configured to store the content data and the feature vector.

    摘要翻译: 特征矢量生成装置包括输入单元,其被配置为输入包括视频数据和音频数据中的至少一个的内容数据,生成单元,被配置为基于表示内容的表征状态的时间的信息生成特征向量 数据出现,表征状态的特征在于视频数据和音频数据中的至少一个的改变,以及被配置为存储内容数据和特征向量的存储单元。

    Information retrieval system, method, and program
    10.
    发明申请
    Information retrieval system, method, and program 有权
    信息检索系统,方法和程序

    公开(公告)号:US20060173682A1

    公开(公告)日:2006-08-03

    申请号:US11230540

    申请日:2005-09-21

    IPC分类号: G10L17/00

    摘要: An information retrieval system, includes speech recognition means for making speech recognition for a spoken question to generate first text information, generation means for modifying the first text information to generate second text information as a interrogative to make a search for an answer to the question, and search means for searching the answer from a document database by using the second text information.

    摘要翻译: 一种信息检索系统,包括用于对口头问题进行语音识别以产生第一文本信息的语音识别装置,用于修改第一文本信息以产生第二文本信息作为疑问来产生对该问题的答案的生成装置, 以及搜索装置,用于通过使用第二文本信息从文档数据库搜索答案。