Information processing device, learning method, and storage medium

    公开(公告)号:US11176327B2

    公开(公告)日:2021-11-16

    申请号:US16373564

    申请日:2019-04-02

    申请人: FUJITSU LIMITED

    发明人: Yuji Mizobuchi

    摘要: A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process includes learning distributed representations of words included in a word space of a first language using a learner for learning the distributed representations; classifying words included in a word space of a second language different from the first language into words common to words included in the word space of the first language and words not common to words included in the word space of the first language; and replacing distributed representations of the common words included in the word space of the second language with distributed representations of the words, corresponding to the common words, in the first language and adjusting a parameter of the learner.

    Context Driven Dynamic Actions Embedded in Messages

    公开(公告)号:US20210328952A1

    公开(公告)日:2021-10-21

    申请号:US17364970

    申请日:2021-07-01

    申请人: VMware, Inc.

    摘要: Disclosed are various approaches for dynamically creating content to present to a user based on an identified intent, or other context, associated with a message (e.g., email). A message that is received from a message server can be analyzed to identify the message content within the message prior to distributing to the recipient client device. Trained intent identification models can be applied to the identified message content to determine an intent, or other type of context, associated with the message. Upon identifying the intent, the message header can be modified to include the intent prior to forwarding the message to the recipient client device. The client device can then display a user interface including the message and a user interface element corresponding to a third-party service. The user interface element can be dynamically generated to include an action component that upon selection, triggers an action associated with the intent.

    SELF ADAPTIVE SCANNING
    73.
    发明申请

    公开(公告)号:US20210312140A1

    公开(公告)日:2021-10-07

    申请号:US16840491

    申请日:2020-04-06

    摘要: A method for classifying a digital asset, comprising: retrieving from a repository, according to an initial identifier, a digital asset; computing at least one asset score, each indicative of a degree of confidence that the digital asset has a respective offense classification selected from a plurality of offense classifications; subject to the at least one asset score being less than at least one threshold score: adding to a list of identifiers at least one identifier extracted from the digital asset; and in each of at least one iteration: retrieving from the repository at least one other digital asset, according to at least one selected identifier selected from the list of identifiers; computing a plurality of other asset scores, each associated with one other digital asset of the at least one other digital asset and indicative of another degree of confidence that the other digital asset has another respective offense classification.

    Probabilistic word embeddings for text classification

    公开(公告)号:US11120223B2

    公开(公告)日:2021-09-14

    申请号:US16444794

    申请日:2019-06-18

    申请人: SAP SE

    摘要: Disclosed are systems, methods, and non-transitory computer-readable media for probabilistic word embeddings for text classification. A text classification system receives a message including a keyword and determines an embedding probability distribution representing the keyword. The text classification system then determines an embedding value for the keyword based on the embedding probability distribution. The text classification system uses the embedding value as input into a set of mathematical functions, yielding a first set of coefficient values for the keyword. Each respective mathematical function from the set corresponds to a respective classification label from a set of classification labels and defines a continuous surface. Each respective mathematical function is determined from embedding values for a set of known keywords, distribution variance values for the set of known keywords, and a subset of coefficient values for the set of known keywords that corresponds to the respective classification label.

    TOCHENIZED CACHE
    76.
    发明申请

    公开(公告)号:US20210279290A1

    公开(公告)日:2021-09-09

    申请号:US17318324

    申请日:2021-05-12

    摘要: Methods of and systems for searching a catalog include parsing the items of the catalog into tokens, determining the frequency with which each token appears in the catalog, and storing the frequencies in a cache. Queries to the catalog are likewise parsed into tokens, and the tokens of the query string are compared to frequency values in the cache to identify a smaller search space within the catalog.

    LEARNING DEVICE, EXTRACTION DEVICE, AND LEARNING METHOD

    公开(公告)号:US20210264108A1

    公开(公告)日:2021-08-26

    申请号:US17275919

    申请日:2019-09-02

    发明人: Takeshi Yamada

    IPC分类号: G06F40/216 G06N20/00

    摘要: An extraction apparatus (10) includes: a pre-processing unit (141) configured to perform, on training data that is written in a natural language and is obtained by tagging important description portions, pre-processing for calculating pointwise mutual information indicating a degree of relevance to a tag for each word, and for deleting description portions with low relevance to the tag from the training data based on the pointwise mutual information of each word; and a learning unit (142) configured to learn the pre-processed training data and generate a list of conditional probabilities relating to the tagged description portions.

    AUTOMATED IDENTIFICATION OF CONCEPT LABELS FOR A TEXT FRAGMENT

    公开(公告)号:US20210248322A1

    公开(公告)日:2021-08-12

    申请号:US16784000

    申请日:2020-02-06

    申请人: Adobe Inc.

    摘要: A technique for intelligently identifying concept labels for a text fragment where the identified concept labels are representative of and semantically relevant to the information contained by the text fragment is provided. The technique includes determining, using a knowledge base storing information for a reference set of concept labels, a first subset of concept labels that are relevant to the information contained by the text fragment. The technique includes ordering the first subset of concept labels according to their relevance scores and performing dependency analysis on the ordered list of concept labels. Based on the dependency analysis, the technique includes identifying concept labels for a text fragment that are more independent (e.g., more distinct and non-overlapping) of each other, representative of and semantically relevant to the information represented by the text fragment.

    Model learning device, method and recording medium for learning neural network model

    公开(公告)号:US11081105B2

    公开(公告)日:2021-08-03

    申请号:US16333156

    申请日:2017-09-05

    摘要: A model learning device comprises: an initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model; a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using learning features and the first model; a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using learning features and the second model; and a modified model update part that obtains a weighted sum of a second loss function calculated from correct information and from the second output probability distribution, and a cross entropy between the first output probability distribution and the second output probability distribution, and updates the parameter of the second model so as to reduce the weighted sum.

    DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

    公开(公告)号:US20210232764A1

    公开(公告)日:2021-07-29

    申请号:US17232546

    申请日:2021-04-16

    摘要: Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.