-
公开(公告)号:US20220350827A1
公开(公告)日:2022-11-03
申请号:US17763793
申请日:2020-09-22
Applicant: SEMICONDUCTOR ENERGY LABORATORY CO., LTD.
Inventor: Kunitaka YAMAMOTO , Kazuki HIGASHI , Yoshitaka DOZEN
IPC: G06F16/33 , G06F40/279
Abstract: Input of natural language as query text and a search from a plurality of documents are enabled, and a portion highly relevant to the input text is presented to a reader. A document data processing system including a document readout unit that reads out a plurality of subject documents, a document division unit that divides each of the plurality of subject documents into a plurality of blocks, a first distributed representation acquisition unit that acquires a distributed representation of a word in each of the blocks, a first distributed representation retention unit that stores the distributed representation acquired by the first distributed representation acquisition unit on a subject-document-by-subject-document basis and on a block-by-block basis, a query text readout unit that reads out query text, a second distributed representation acquisition unit that extracts a word included in the query text and acquires a distributed representation of the word, a second distributed representation retention unit that stores the distributed representation acquired by the second distributed representation acquisition unit, and a similarity calculation unit that compares the distributed representation of the word included in the query text and the distributed representation of the word included in each of the blocks and calculates similarity of each of the blocks is provided.
-
公开(公告)号:US20250068669A1
公开(公告)日:2025-02-27
申请号:US18945924
申请日:2024-11-13
Applicant: SEMICONDUCTOR ENERGY LABORATORY CO., LTD.
Inventor: Kunitaka YAMAMOTO , Junpei MOMO , Kazuki HIGASHI
IPC: G06F16/35 , G06F16/93 , G06F40/268
Abstract: A document search system that enables efficient document search regardless of the ability of a user is achieved. Document search is performed using a document search system in which database document data is stored. After first document data and second document data are input to the document search system, the document search system extracts a plurality of terms from the first document data. The extraction of the terms is performed using morphological analysis, for example. Next, the extracted terms are weighted on the basis of the second document data. For example, texts included in a document represented by the second document data are classified into first and second texts. Among the terms extracted from the first document data, the weight of the term included in the first text is set larger than the weights of the other terms. The classification of the texts can be performed in accordance with a rule basis or using machine learning. After that, the similarity of the database document data to the first document data is calculated on the basis of the weighted term.
-
公开(公告)号:US20220197935A1
公开(公告)日:2022-06-23
申请号:US17612248
申请日:2020-05-11
Applicant: SEMICONDUCTOR ENERGY LABORATORY CO., LTD.
Inventor: Kunitaka YAMAMOTO , Junpei MOMO , Kazuki HIGASHI
IPC: G06F16/35 , G06F40/268 , G06F16/93
Abstract: A document search system that enables efficient document search regardless of the ability of a user is achieved. Document search is performed using a document search system in which database document data is stored. After first document data and second document data are input to the document search system, the document search system extracts a plurality of terms from the first document data. The extraction of the terms is performed using morphological analysis, for example. Next, the extracted terms are weighted on the basis of the second document data. For example, texts included in a document represented by the second document data are classified into first and second texts. Among the terms extracted from the first document data, the weight of the term included in the first text is set larger than the weights of the other terms. The classification of the texts can be performed in accordance with a rule basis or using machine learning. After that, the similarity of the database document data to the first document data is calculated on the basis of the weighted term.
-
公开(公告)号:US20230350949A1
公开(公告)日:2023-11-02
申请号:US17791316
申请日:2020-12-28
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Junpei MOMO , Kazuki HIGASHI , Motoki NAKASHIMA
IPC: G06F16/901 , G06F40/284 , G06F40/211 , G06F16/33 , G06F16/93
CPC classification number: G06F16/9024 , G06F40/284 , G06F40/211 , G06F16/3344 , G06F16/93
Abstract: A document retrieval system retrieving a document with the concept of the document taken into account is provided. The system includes a processing portion and the processing portion creates a retrieval graph from a retrieval composition. The retrieval graph includes first to m-th retrieval local graphs (m is an integer of greater than or equal to 1), and the retrieval local graphs are each constituted by two nodes and one edge. The processing portion performs retrieval of first to m-th sentences on a reference document. The i-th sentence (i is an integer of greater than or equal to 1 and less than or equal to m) includes one of the two nodes in the i-th retrieval local graph or a related term or a hyponym of the one of the two nodes; the other of the two nodes in the i-th retrieval local graph or a related term or a hyponym of the other of the two nodes; and the edge in the i-th retrieval local graph or a related term or a hyponym of the edge. A mark is assigned to the score of the reference document in accordance with the number of sentences included in the reference document among the first to m-th sentences.
-
公开(公告)号:US20210026861A1
公开(公告)日:2021-01-28
申请号:US17064871
申请日:2020-10-07
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Kazuki HIGASHI , Junpei MOMO
IPC: G06F16/2457 , G06F40/268 , G06N20/00 , G06N3/08 , G06F40/279 , G06F16/93
Abstract: A highly accurate document search, particularly a search for a document relating to intellectual property, is achieved with an easy input method. A document search system includes a processing portion. The processing portion has a function of extracting a keyword included in text data, a function of extracting a related term of the keyword from words included in a plurality of pieces of first reference text analysis data, a function of giving a weight to each of the keyword and the related term, a function of giving a score to each of a plurality of pieces of second reference text analysis data on the basis of the weight, a function of ranking the plurality of pieces of second reference text analysis data on the basis of the score to generate ranking data, and a function of outputting the ranking data.
-
公开(公告)号:US20220245181A1
公开(公告)日:2022-08-04
申请号:US17622930
申请日:2020-06-22
Applicant: SEMICONDUCTOR ENERGY LABORATORY CO., LTD.
Inventor: Yoshitaka DOZEN , Kazuki HIGASHI , Kunitaka YAMAMOTO
IPC: G06F16/33 , G06F40/40 , G06F40/253 , G06F40/279
Abstract: A reading comprehension support system or a reading comprehension support method that enables natural language to be input as query text and presents a reader with a part that is highly related to the input text is provided. The reading comprehension support system includes a document readout unit that reads out a subject document, a document division unit that divides the subject document into a plurality of blocks, a first distributed representation acquisition unit that acquires a distributed representation of a word in each of the plurality of blocks, a query text readout unit that reads out query text, a second distributed representation acquisition unit that extracts a word included in the query text and acquires a distributed representation of the word, and a similarity acquisition unit that compares distributed representations of words between the query text and each of the plurality of blocks and obtains similarity. From words included in the block, the similarity acquisition unit searches for a word that matches a word included in the query text, and obtains similarity between a distributed representation of the matching word in the block and a distributed representation of the matching word in the query text.
-
公开(公告)号:US20220207070A1
公开(公告)日:2022-06-30
申请号:US17600280
申请日:2020-04-16
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Kazuki HIGASHI , Junpei MOMO
IPC: G06F16/36 , G06F40/242 , G06F40/247 , G06F40/279
Abstract: Highly accurate document search, especially intellectual property-related document search, is achieved with a simple input method. A processing portion has a function of generating text analysis data from text data input to an input portion; a function of extracting a search word from words included in the text analysis data; and a function of generating first search data from the search word on the basis of weight dictionary data and thesaurus data. A memory portion stores second search data generated when the first search data is modified by a user. The processing portion updates the thesaurus data in accordance with the second search data.
-
公开(公告)号:US20210398025A1
公开(公告)日:2021-12-23
申请号:US17292783
申请日:2019-11-06
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Kunitaka YAMAMOTO , Junpei MOMO , Kazuki HIGASHI , Takahiro FUKUTOME
IPC: G06N20/20
Abstract: A novel content classification method is provided. A content classification method using machine learning for a learning model and a classifier fabrication method are provided. In Step 1, a data set containing a plurality of contents is acquired. Learning labels are attached to m contents, and the learning labels are not attached to the remaining contents. In Step 2, a first learning model is created by machine learning using the m contents. In Step 3, judgment labels are attached to the plurality of contents using the first learning model and are displayed on a GUI. In Step 4, new learning labels are attached to k contents in the plurality of contents. In Step 5, a second learning model is created by the machine learning using the k contents. In Step 6, judgment labels are attached to the plurality of contents using the second learning model and are displayed on the GUI.
-
公开(公告)号:US20240403351A1
公开(公告)日:2024-12-05
申请号:US18806826
申请日:2024-08-16
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Kazuki HIGASHI , Junpei MOMO
IPC: G06F16/36 , G06F40/242 , G06F40/247 , G06F40/279
Abstract: Highly accurate document search, especially intellectual property-related document search, is achieved with a simple input method. A processing portion has a function of generating text analysis data from text data input to an input portion; a function of extracting a search word from words included in the text analysis data; and a function of generating first search data from the search word on the basis of weight dictionary data and thesaurus data. A memory portion stores second search data generated when the first search data is modified by a user. The processing portion updates the thesaurus data in accordance with the second search data.
-
10.
公开(公告)号:US20240273108A1
公开(公告)日:2024-08-15
申请号:US18635181
申请日:2024-04-15
Applicant: Semiconductor Energy Laboratory Co., Ltd.
Inventor: Kazuki HIGASHI , Junpei MOMO
IPC: G06F16/2457 , G06F16/93 , G06F40/268 , G06F40/279 , G06N3/08 , G06N20/00 , G06Q10/10 , G06Q50/18
CPC classification number: G06F16/24578 , G06F16/93 , G06F40/268 , G06F40/279 , G06N3/08 , G06N20/00 , G06F2216/11 , G06Q10/10 , G06Q50/184
Abstract: A highly accurate document search, particularly a search for a document relating to intellectual property, is achieved with an easy input method. A document search system includes a processing portion. The processing portion has a function of extracting a keyword included in text data, a function of extracting a related term of the keyword from words included in a plurality of pieces of first reference text analysis data, a function of giving a weight to each of the keyword and the related term, a function of giving a score to each of a plurality of pieces of second reference text analysis data on the basis of the weight, a function of ranking the plurality of pieces of second reference text analysis data on the basis of the score to generate ranking data, and a function of outputting the ranking data.
-
-
-
-
-
-
-
-
-