-
公开(公告)号:US20240135099A1
公开(公告)日:2024-04-25
申请号:US18048626
申请日:2022-10-20
IPC分类号: G06F40/253 , G06F40/117 , G06F40/279
CPC分类号: G06F40/253 , G06F40/117 , G06F40/279
摘要: Computer technology for determining and tagging parts of speech in a text (that is PoS ragging), where the context used by the natural language processing machine logic (for example, NLP software) includes both: (i) other words in the sentence under analysis where a given word to be tagged appears; and (ii) words in the other sentences besides the sentence under analysis. Other context sentences may be selected randomly, by Next Sentence Prediction technology and/or by choosing sentences in textual proximity to the sentence under analysis.
-
公开(公告)号:US11494433B2
公开(公告)日:2022-11-08
申请号:US16406416
申请日:2019-05-08
发明人: Yoshinori Kabeya , Toru Nagano , Masayuki Suzuki , Issei Yoshida
IPC分类号: G06F16/683 , G06F16/632 , G06F16/635 , G06F16/638 , G10L15/22 , G06F16/332
摘要: A system and method for expanding a question and answer (Q&A) database. The method includes obtaining a set of Q&A documents and speech recognition results, each Q&A document in the set having an identifier, and each speech recognition result having an identifier common with the identifier of a relevant Q&A document, and adding one or more repetition parts extracted from the speech recognition results to a corresponding Q&A document in the set to generate an expanded set of Q&A documents for increasing Q&A document extraction accuracy.
-
公开(公告)号:US11327980B2
公开(公告)日:2022-05-10
申请号:US16854709
申请日:2020-04-21
发明人: Issei Yoshida
IPC分类号: G06F16/2457 , G06F16/22 , G06F16/23 , G06F16/2455 , G06F16/248 , G06F16/93 , G06F16/338
摘要: A method is provided that stores, in a second memory, an index structure including, for each given word from among words included in documents, a group of document IDs of documents including the given word. The method stores an index structure subset in a main memory which is faster than secondary memory. The method acquires a keyword and identifies any documents including the keyword. The method finds top-K frequent words among the words included in the identified documents by: identifying, for each given group in descending order of the number of the documents IDs therein, the number of documents IDs of the identified documents in the given group, from the subset when the number of document IDs in the given group is within the range, and from the index structure otherwise; and presenting words of top-K groups with a largest amount of the document IDs identified.
-
公开(公告)号:US11308274B2
公开(公告)日:2022-04-19
申请号:US16415576
申请日:2019-05-17
IPC分类号: G06F40/205 , G06F17/15 , G06F40/284
摘要: A computer-implemented method is provided. The method includes acquiring a seed word; calculating a similarity score of each of a plurality of words relative to the seed word for each of a plurality of models to calculate a weighted sum of similarity scores for each of the plurality of words; outputting a plurality of candidate words among the plurality of words; acquiring annotations indicating at least one of preferred words and non-preferred words among the plurality of the candidate words; updating weights of the plurality of models in a manner to cause weighted sums of similarity scores for the preferred words to be relatively larger than the weighted sums of the similarity scores for the non-preferred words, based on the annotations; and grouping the plurality of candidate words output based on the weighted sum of similarity scores calculated with updated weights of the plurality of models.
-
公开(公告)号:US11182437B2
公开(公告)日:2021-11-23
申请号:US15795071
申请日:2017-10-26
发明人: Issei Yoshida
摘要: Aspects of the invention are configured to perform an operation comprising receiving a query specifying an AND condition and an OR condition, determining, based on an AND index structure, a set of documents, of a plurality of documents in a corpus, satisfying the AND condition of the query, computing a query similarity score for a first document in the set of documents, wherein the query similarity score is based on a first hash value computed for the OR condition of the query, a weight value for the OR condition, and a second hash value for the first document specified in an OR index, and returning an indication of the first document and the query similarity score as responsive to the query.
-
公开(公告)号:US20210326533A1
公开(公告)日:2021-10-21
申请号:US16853420
申请日:2020-04-20
IPC分类号: G06F40/289 , G06F40/247 , G06K9/62
摘要: A computer-implemented method is provided for estimating output confidence of a black box Application Programming Interface (API). The method includes generating paraphrases for an input text. The method further includes calculating a distance between the input text and each respective one of the paraphrases. The method also includes sorting the paraphrases in ascending order of the distance. The method additionally includes selecting a top predetermined number of the paraphrases. The method further includes inputting the input text and the selected paraphrases into the API to obtain an output confidence score for each of the input text and the selected paraphrases. The method also includes estimating, by a hardware processor, the output confidence of the input text from a robustness of output scores of the input text and the selected paraphrases.
-
公开(公告)号:US10733218B2
公开(公告)日:2020-08-04
申请号:US14970741
申请日:2015-12-16
发明人: Miki Enoki , Issei Yoshida
IPC分类号: G06F16/31
摘要: A system to reduce a required memory area (storage capacity) and save time and effort for updating target attributes in aggregation processing is disclosed. The system for aggregating data includes an index storing unit for storing DtoK indices arranged in predetermined order, each of the indices specifying a list of attributes included in a target data item from identification information of the target data item, and a word list that is a list of attributes included in a plurality of the target data items, and an aggregation processing unit for finding, for each attribute, target data items including the attribute and executing aggregation processing for aggregating attributes whose relation with the target data items meets a predetermined standard. A link is created for each attribute in the word list for sequentially following an element in the index for each target data item, and target data items are found based thereon.
-
公开(公告)号:US20240232528A9
公开(公告)日:2024-07-11
申请号:US18048626
申请日:2022-10-21
IPC分类号: G06F40/253 , G06F40/117 , G06F40/279
CPC分类号: G06F40/253 , G06F40/117 , G06F40/279
摘要: Computer technology for determining and tagging parts of speech in a text (that is PoS ragging), where the context used by the natural language processing machine logic (for example, NLP software) includes both: (i) other words in the sentence under analysis where a given word to be tagged appears; and (ii) words in the other sentences besides the sentence under analysis. Other context sentences may be selected randomly, by Next Sentence Prediction technology and/or by choosing sentences in textual proximity to the sentence under analysis.
-
公开(公告)号:US20240220723A1
公开(公告)日:2024-07-04
申请号:US18091909
申请日:2022-12-30
发明人: TAKUMA UDAGAWA , HIROSHI KANAYAMA , Issei Yoshida
IPC分类号: G06F40/284 , G06F40/30
CPC分类号: G06F40/284 , G06F40/30
摘要: A probability of a given token of a given text being a beginning of sentence is computed and a probability of the given token of the given text being an end of sentence is computed. The probability of the token being the beginning of sentence and the probability of the token being the end of sentence are combined to determine a probability of a given span of text being a sentential unit. The given span of text is identified as most probably being the sentential unit.
-
公开(公告)号:US20230394243A1
公开(公告)日:2023-12-07
申请号:US18449970
申请日:2023-08-15
IPC分类号: G06F40/30 , G06F40/289 , G06F40/247 , G06F16/242 , G06F16/33 , G06F16/36 , G06F40/10 , G06F40/40 , G10L15/18 , G06F18/22 , G06F18/21 , G06F18/2321
CPC分类号: G06F40/30 , G06F40/289 , G06F40/247 , G06F16/242 , G06F16/3344 , G06F16/36 , G06F40/10 , G06F40/40 , G10L15/1822 , G06F18/22 , G06F18/217 , G06F18/2321
摘要: A computer-implemented method is provided for estimating output confidence of a black box Application Programming Interface (API). The method includes generating paraphrases for an input text. The method further includes calculating a distance between the input text and each respective one of the paraphrases. The method also includes sorting the paraphrases in ascending order of the distance. The method additionally includes selecting a top predetermined number of the paraphrases. The method further includes inputting the input text and the selected paraphrases into the API to obtain an output confidence score for each of the input text and the selected paraphrases. The method also includes estimating, by a hardware processor, the output confidence of the input text from a robustness of output scores of the input text and the selected paraphrases.
-
-
-
-
-
-
-
-
-