-
1.
公开(公告)号:US11620448B2
公开(公告)日:2023-04-04
申请号:US17008572
申请日:2020-08-31
申请人: Recruit Co., Ltd.
发明人: Yoshihiko Suhara , Behzad Golshan , Yuliang Li , Chen Chen , Xiaolan Wang , Jinfeng Li , Wang-Chiew Tan , Çağatay Demiralp , Aaron Traylor
IPC分类号: G06F40/284 , G06F16/35 , G06K9/62 , G06N7/00
摘要: Disclosed embodiments relate to natural language processing. Techniques can include receiving input text, extracting, from the input text, at least one modifier and aspect pair, receiving data from a knowledgebase, based on the at least one modifier and aspect pair and commonsense data, generate one or more premise embeddings, convert the input text into tokens, generating at least one vector for one or more of the tokens based on an analysis of the tokens, combine the at least one vector with the one or more premise embeddings to create at least one combined vector, and analyze the at least one combined vector wherein the analysis generates an output indicative of a feature of the input text.
-
公开(公告)号:US20220351071A1
公开(公告)日:2022-11-03
申请号:US17246354
申请日:2021-04-30
申请人: Recruit Co., Ltd.,
发明人: Yuliang Li , Xiaolan Wang , Zhengjie Miao
IPC分类号: G06N20/00 , G06N5/02 , G06F16/21 , G06F40/284
摘要: Disclosed embodiments relate to generating training data for a machine learning model. Techniques can include accessing a machine learning model from a machine learning model repository and identifying a data set associated with the machine learning model. The identified data set is utilized to generate a set of data augmentation operators. The data augmentation operators applied on a selected sequence of tokens associated with the machine learning model to generate sequences of tokens. A subset of sequences of tokens are selected and stored in a training data repository. The stored sequences of tokens are provided to the machine learning model as training data.
-
公开(公告)号:US20220067298A1
公开(公告)日:2022-03-03
申请号:US17008563
申请日:2020-08-31
申请人: Recruit Co., Ltd.,
发明人: Behzad GOLSHAN , Chen CHEN , Wang-Chiew TAN , Danni MA
IPC分类号: G06F40/35 , G06F40/284 , G06F40/268 , G06K9/62
摘要: Disclosed embodiments relate to aligning pairs of sentences. Techniques can include receiving a plurality of sentences; generating a graph for each of at least two sentences of the plurality of sentences, wherein generating a graph for each sentence of the at least two sentences comprises: identifying one or more tokens for the sentence; and connecting via edges the one or more tokens; generating a combined graph for the at least two sentences wherein generating a combined graph comprises: aligning the identified tokens of the at least two sentences of the plurality of sentences; identifying matching and non-matching tokens between the at least two sentences based on the alignment; and merging matching tokens into a combined graph node.
-
公开(公告)号:US20240338391A1
公开(公告)日:2024-10-10
申请号:US18295757
申请日:2023-04-04
申请人: Recruit Co., Ltd.
发明人: Yutong Shao , Nikita Bhutani , Sajjadur Rahman , Estevam Hruschka
CPC分类号: G06F16/313 , G06F16/38
摘要: Disclosed embodiments relate to entity set expansion to associate with a text corpus. Techniques can include receiving unstructured data and a set of concepts associated with the data to determine, using a language model, a set of candidate entities in the data associated with the set of concepts, wherein the association is measured based on the relevancy of each candidate entity of the set of candidate entities to context of the data. Techniques can then determine, using a plurality of methods, associations between each candidate entity in the set of candidate entities and each concept in the concept of the set of concepts, wherein the each candidate entity is assigned a rank for each method of the plurality of methods. Techniques can use the assigned ranks to determine a combined rank of the each candidate entity of the set of candidate entities, wherein the combined rank of the each candidate entity is based on the assigned rank of the each candidate entity for the each method of the plurality of methods. Techniques can finally expand the entity set by determining a subset of entities of the set of candidate entities based on the combined rank of each candidate entity, wherein the subset of entities form the expanded entity set associated with the data.
-
公开(公告)号:US20230342558A1
公开(公告)日:2023-10-26
申请号:US17660813
申请日:2022-04-26
申请人: Recruit Co., Ltd.
发明人: Jin WANG , Yuliang Li , Wataru HIROTA
IPC分类号: G06F40/40 , G06F40/284 , G06F40/205
CPC分类号: G06F40/40 , G06F40/284 , G06F40/205
摘要: Disclosed embodiments relate to generalized entity matching. Techniques can include receiving a data pair of two entities that may be pre-processed to have parsable data structures, and serializing the data pair into a sequence of tokens based on data structure of each entity in the data pair. Techniques can further include encoding the serialized data pair to include topic attributes that may be mapped to data in the data pair and the topic of the mapped data matches the topic represented by topic attribute and the data in the data pair is concatenated. Techniques can further include pooling attributes in the data pair based on contextualized attributed representations of each encoded entity in the data pair and schema of each entity of the data pairs, where the contextual attribute representations are based on a first token of each encoded attribute in the sequence of tokens, and predicting matching labels between the data pairs based on pooled attributes.
-
公开(公告)号:US10176654B2
公开(公告)日:2019-01-08
申请号:US15769419
申请日:2016-10-19
申请人: Recruit Co., Ltd.
摘要: A suspicious person detection technology which is less likely to cause a blind spot of detection of a suspicious person is provided. A suspicious person detection system detects a suspicious person present in a predetermined area and includes a probe request detection terminal (100) configured to detect a probe request transmitted from a mobile terminal (400) to generate probe information including first identification information specific to the mobile terminal which transmits the probe information, and an analyzing apparatus (200) configured to acquire the probe information from the probe request detection terminal, and, in the case where the first identification information included in the probe information matches none of one or more pieces of second identification information set in advance, transmit suspicious person information indicating that a suspicious person is detected to a predetermined information processing apparatus (300).
-
公开(公告)号:US12093646B2
公开(公告)日:2024-09-17
申请号:US17151088
申请日:2021-01-15
申请人: Recruit Co., Ltd.
发明人: Zhengjie Miao , Yuliang Li , Xiaolan Wang , Wang-Chiew Tan
IPC分类号: G06F40/284 , G06F40/289 , G06N5/04 , G06N20/00
CPC分类号: G06F40/284 , G06F40/289 , G06N5/04 , G06N20/00
摘要: Disclosed embodiments relate to extracting classification information from input text. Techniques can include obtaining input text, identifying a plurality of tokens in the input text, pre-training a machine learning model, determining tagging information of the plurality of tokens using a first classification layer of the machine learning model, pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer, determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model, evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the language machine learning model, aggregating sentiments of the paired sequences associated with an attribute classifier, and storing the aggregated sentiments.
-
8.
公开(公告)号:US11934783B2
公开(公告)日:2024-03-19
申请号:US18295735
申请日:2023-04-04
申请人: Recruit Co., Ltd.
发明人: Yoshihiko Suhara , Behzad Golshan , Yuliang Li , Chen Chen , Xiaolan Wang , Jinfeng Li , Wang-Chiew Tan , çagatay Demiralp , Aaron Traylor
IPC分类号: G06F40/284 , G06F16/35 , G06F18/211 , G06N7/01
CPC分类号: G06F40/284 , G06F16/35 , G06F18/211 , G06N7/01
摘要: Disclosed embodiments relate to natural language processing. Techniques can include receiving input text, extracting, from the input text, at least one modifier and aspect pair, receiving data from a knowledgebase, based on the at least one modifier and aspect pair and commonsense data, generate one or more premise embeddings, convert the input text into tokens, generating at least one vector for one or more of the tokens based on an analysis of the tokens, combine the at least one vector with the one or more premise embeddings to create at least one combined vector, and analyze the at least one combined vector wherein the analysis generates an output indicative of a feature of the input text.
-
公开(公告)号:US11741312B2
公开(公告)日:2023-08-29
申请号:US17008563
申请日:2020-08-31
申请人: Recruit Co., Ltd.
发明人: Behzad Golshan , Chen Chen , Wang-Chiew Tan , Danni Ma
IPC分类号: G06F40/35 , G06F40/268 , G06F40/284 , G06F18/2323
CPC分类号: G06F40/35 , G06F18/2323 , G06F40/268 , G06F40/284
摘要: Disclosed embodiments relate to aligning pairs of sentences. Techniques can include receiving a plurality of sentences; generating a graph for each of at least two sentences of the plurality of sentences, wherein generating a graph for each sentence of the at least two sentences comprises: identifying one or more tokens for the sentence; and connecting via edges the one or more tokens; generating a combined graph for the at least two sentences wherein generating a combined graph comprises: aligning the identified tokens of the at least two sentences of the plurality of sentences; identifying matching and non-matching tokens between the at least two sentences based on the alignment; and merging matching tokens into a combined graph node.
-
公开(公告)号:US20240289629A1
公开(公告)日:2024-08-29
申请号:US18305657
申请日:2023-04-24
申请人: Recruit Co., Ltd.
发明人: Runhui Wang , Yuliang Li , Jin Wang
IPC分类号: G06N3/09 , G06F16/28 , G06N3/045 , G06N3/0464
CPC分类号: G06N3/09 , G06F16/285 , G06N3/045 , G06N3/0464
摘要: Disclosed embodiments relate to data management of entity pairs. Techniques can include receiving at least two sets of data and a data management task request with each including a set of entities. Techniques can determine a location of each entity in received data sets in a representative space by determining representative structure of the set of entities. Techniques can then for an entity, a set of representative entity pairs from each set of the at least two sets of data based on how close they are in the representative space. Technique can then analyze the set of representative entity pairs to identify most similar entity pairs include in a set of candidate pairs by determining closeness of location of entities in each entity pair in the representative space. Technique can then determine matched entity pairs of the candidate pairs using a first machine learning model is trained using the candidate pairs by applying labels, and utilizing the matched pairs to perform the requested data management task.
-
-
-
-
-
-
-
-
-