META-LEARNING DATA AUGMENTATION FRAMEWORK

    公开(公告)号:US20220351071A1

    公开(公告)日:2022-11-03

    申请号:US17246354

    申请日:2021-04-30

    摘要: Disclosed embodiments relate to generating training data for a machine learning model. Techniques can include accessing a machine learning model from a machine learning model repository and identifying a data set associated with the machine learning model. The identified data set is utilized to generate a set of data augmentation operators. The data augmentation operators applied on a selected sequence of tokens associated with the machine learning model to generate sequences of tokens. A subset of sequences of tokens are selected and stored in a training data repository. The stored sequences of tokens are provided to the machine learning model as training data.

    SYSTEMS AND METHODS FOR UNSUPERVISED PARAPHRASE MINING

    公开(公告)号:US20220067298A1

    公开(公告)日:2022-03-03

    申请号:US17008563

    申请日:2020-08-31

    摘要: Disclosed embodiments relate to aligning pairs of sentences. Techniques can include receiving a plurality of sentences; generating a graph for each of at least two sentences of the plurality of sentences, wherein generating a graph for each sentence of the at least two sentences comprises: identifying one or more tokens for the sentence; and connecting via edges the one or more tokens; generating a combined graph for the at least two sentences wherein generating a combined graph comprises: aligning the identified tokens of the at least two sentences of the plurality of sentences; identifying matching and non-matching tokens between the at least two sentences based on the alignment; and merging matching tokens into a combined graph node.

    SYSTEMS AND METHODS FOR ENTITY SET EXPANSION

    公开(公告)号:US20240338391A1

    公开(公告)日:2024-10-10

    申请号:US18295757

    申请日:2023-04-04

    申请人: Recruit Co., Ltd.

    IPC分类号: G06F16/31 G06F16/38

    CPC分类号: G06F16/313 G06F16/38

    摘要: Disclosed embodiments relate to entity set expansion to associate with a text corpus. Techniques can include receiving unstructured data and a set of concepts associated with the data to determine, using a language model, a set of candidate entities in the data associated with the set of concepts, wherein the association is measured based on the relevancy of each candidate entity of the set of candidate entities to context of the data. Techniques can then determine, using a plurality of methods, associations between each candidate entity in the set of candidate entities and each concept in the concept of the set of concepts, wherein the each candidate entity is assigned a rank for each method of the plurality of methods. Techniques can use the assigned ranks to determine a combined rank of the each candidate entity of the set of candidate entities, wherein the combined rank of the each candidate entity is based on the assigned rank of the each candidate entity for the each method of the plurality of methods. Techniques can finally expand the entity set by determining a subset of entities of the set of candidate entities based on the combined rank of each candidate entity, wherein the subset of entities form the expanded entity set associated with the data.

    SYSTEMS AND METHODS FOR GENERALIZED ENTITY MATCHING

    公开(公告)号:US20230342558A1

    公开(公告)日:2023-10-26

    申请号:US17660813

    申请日:2022-04-26

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to generalized entity matching. Techniques can include receiving a data pair of two entities that may be pre-processed to have parsable data structures, and serializing the data pair into a sequence of tokens based on data structure of each entity in the data pair. Techniques can further include encoding the serialized data pair to include topic attributes that may be mapped to data in the data pair and the topic of the mapped data matches the topic represented by topic attribute and the data in the data pair is concatenated. Techniques can further include pooling attributes in the data pair based on contextualized attributed representations of each encoded entity in the data pair and schema of each entity of the data pairs, where the contextual attribute representations are based on a first token of each encoded attribute in the sequence of tokens, and predicting matching labels between the data pairs based on pooled attributes.

    Suspicious person detection system, suspicious person detection method

    公开(公告)号:US10176654B2

    公开(公告)日:2019-01-08

    申请号:US15769419

    申请日:2016-10-19

    申请人: Recruit Co., Ltd.

    摘要: A suspicious person detection technology which is less likely to cause a blind spot of detection of a suspicious person is provided. A suspicious person detection system detects a suspicious person present in a predetermined area and includes a probe request detection terminal (100) configured to detect a probe request transmitted from a mobile terminal (400) to generate probe information including first identification information specific to the mobile terminal which transmits the probe information, and an analyzing apparatus (200) configured to acquire the probe information from the probe request detection terminal, and, in the case where the first identification information included in the probe information matches none of one or more pieces of second identification information set in advance, transmit suspicious person information indicating that a suspicious person is detected to a predetermined information processing apparatus (300).

    Systems and methods for semi-supervised extraction of text classification information

    公开(公告)号:US12093646B2

    公开(公告)日:2024-09-17

    申请号:US17151088

    申请日:2021-01-15

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to extracting classification information from input text. Techniques can include obtaining input text, identifying a plurality of tokens in the input text, pre-training a machine learning model, determining tagging information of the plurality of tokens using a first classification layer of the machine learning model, pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer, determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model, evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the language machine learning model, aggregating sentiments of the paired sequences associated with an attribute classifier, and storing the aggregated sentiments.

    Systems and methods for unsupervised paraphrase mining

    公开(公告)号:US11741312B2

    公开(公告)日:2023-08-29

    申请号:US17008563

    申请日:2020-08-31

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to aligning pairs of sentences. Techniques can include receiving a plurality of sentences; generating a graph for each of at least two sentences of the plurality of sentences, wherein generating a graph for each sentence of the at least two sentences comprises: identifying one or more tokens for the sentence; and connecting via edges the one or more tokens; generating a combined graph for the at least two sentences wherein generating a combined graph comprises: aligning the identified tokens of the at least two sentences of the plurality of sentences; identifying matching and non-matching tokens between the at least two sentences based on the alignment; and merging matching tokens into a combined graph node.

    SYSTEMS AND METHODS FOR MULTI-PURPOSE DATA MANAGEMENT

    公开(公告)号:US20240289629A1

    公开(公告)日:2024-08-29

    申请号:US18305657

    申请日:2023-04-24

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to data management of entity pairs. Techniques can include receiving at least two sets of data and a data management task request with each including a set of entities. Techniques can determine a location of each entity in received data sets in a representative space by determining representative structure of the set of entities. Techniques can then for an entity, a set of representative entity pairs from each set of the at least two sets of data based on how close they are in the representative space. Technique can then analyze the set of representative entity pairs to identify most similar entity pairs include in a set of candidate pairs by determining closeness of location of entities in each entity pair in the representative space. Technique can then determine matched entity pairs of the candidate pairs using a first machine learning model is trained using the candidate pairs by applying labels, and utilizing the matched pairs to perform the requested data management task.