-
公开(公告)号:US20170091175A1
公开(公告)日:2017-03-30
申请号:US14870730
申请日:2015-09-30
发明人: HIROSHI KANAYAMA , RISA KAWANAKA
CPC分类号: G06F17/289 , G06F17/275 , G06F17/30719
摘要: A method of question answering from multilingual information sources is disclosed. The present invention discloses a method, a computer system and a program product for selecting an information source language of an information source, the method comprising; receiving a question; analyzing the question to obtain a category information of at least one word included in the question; obtaining a word included in the category information as estimated topic or region related to the question; determining a candidate for an information source language using the estimated topic or region; and selecting the information source language and corresponding information sources for retrieving documents to generate an answer of the question.
-
公开(公告)号:US20180032511A1
公开(公告)日:2018-02-01
申请号:US15729794
申请日:2017-10-11
发明人: HIROSHI KANAYAMA , RISA KAWANAKA
摘要: A method of question answering from multilingual information sources is disclosed. The present invention discloses a method, a computer system and a program product for selecting an information source language of an information source, the method includes: receiving a question; analyzing the question to obtain a category information of a word included in the question; obtaining a word included in the category information as estimated topic or region related to the question; determining a candidate for an information source language using the estimated topic or region; and selecting the information source language and corresponding information sources for retrieving documents to generate an answer of the question.
-
公开(公告)号:US20240220723A1
公开(公告)日:2024-07-04
申请号:US18091909
申请日:2022-12-30
发明人: TAKUMA UDAGAWA , HIROSHI KANAYAMA , Issei Yoshida
IPC分类号: G06F40/284 , G06F40/30
CPC分类号: G06F40/284 , G06F40/30
摘要: A probability of a given token of a given text being a beginning of sentence is computed and a probability of the given token of the given text being an end of sentence is computed. The probability of the token being the beginning of sentence and the probability of the token being the end of sentence are combined to determine a probability of a given span of text being a sentential unit. The given span of text is identified as most probably being the sentential unit.
-
公开(公告)号:US20220382972A1
公开(公告)日:2022-12-01
申请号:US17303349
申请日:2021-05-27
发明人: YOUSEF EL-KURDI , Radu Florian , HIROSHI KANAYAMA , Efsun Kayi , LAURA CHITICARIU , Takuya Ohko , Robert Todd Ward
IPC分类号: G06F40/205 , G06F40/47 , G06N3/08 , G06N3/04
摘要: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.
-
-
-