-
公开(公告)号:US08768941B2
公开(公告)日:2014-07-01
申请号:US13380561
申请日:2010-07-23
申请人: Toshiko Matsumoto
发明人: Toshiko Matsumoto
IPC分类号: G06F17/30
CPC分类号: G06F17/30722 , G06F17/212 , G06F17/2745 , G06T1/00
摘要: There is provided a technique for automatically acquiring metadata with respect to various organizations which significantly reduces the man-hours required to prepare models for metadata extraction. With a pair comprising a document and metadata appearing therein as input, using a layout feature, and proximate text string and partial text string features with respect to metadata and a text string that is not metadata, the use of the layout feature, the proximate text string and the partial text string with respect to the automatic acquisition of metadata is automatically configured (see FIG. 1).
摘要翻译: 提供了一种用于自动获取关于各种组织的元数据的技术,其显着地减少了为元数据提取准备模型所需的工时。 使用包括作为输入的文档和元数据的对,使用布局特征,以及关于元数据的接近文本串和部分文本串特征以及不是元数据的文本串,使用布局特征,邻近文本 字符串和相对于元数据的自动获取的部分文本字符串被自动配置(参见图1)。
-
公开(公告)号:US20130091091A1
公开(公告)日:2013-04-11
申请号:US13696881
申请日:2011-06-28
申请人: Toshiko Matsumoto
发明人: Toshiko Matsumoto
CPC分类号: G06F17/22 , G06F17/2294 , G06F17/277 , G06F17/30011 , G06F17/30616
摘要: A technique is provided in which, in a foreign language document in which a space character is provided for separation of words forming a document, it is possible to reliably detect each word separation and reliably reinsert a space character in the separation. It is decided whether adjacent characters are included in the same word (i.e., presence or absence of space) using: decision by an English writing rule, decision by information as to whether there is a space character in source document data; decision by the identification of character string objects including adjacent characters; and decision by a gap between the character string objects including the adjacent characters. It should be noted that it is preferable to make a decision by these processing in order of this description.
摘要翻译: 提供了一种技术,其中在提供用于分离形成文档的单词的空格字符的外语文档中,可以可靠地检测每个单词分离并且可靠地重新插入分离中的空格字符。 使用以下方式确定相邻字符是否包含在相同的单词中(即存在或不存在空格):英文写作规则的决定,关于源文档数据中是否存在空格字符的信息的决定; 通过识别包括相邻字符的字符串对象来决定; 并且通过包括相邻字符的字符串对象之间的间隙来决定。 应当注意,优选的是通过这些处理按照本说明书的顺序作出决定。
-