Document classification device and trained model

    公开(公告)号:US12118308B2

    公开(公告)日:2024-10-15

    申请号:US17041299

    申请日:2019-05-29

    申请人: NTT DOCOMO, INC.

    发明人: Taishi Ikeda

    摘要: A document classification device is a device that generates a document classification model which outputs identification information for identifying a result of classification on the basis of an input document by machine learning, and includes: an acquisition unit configured to acquire learning data including a document and the identification information correlated with the document; a feature extracting unit configured to extract words included in the document and character information which is a character string including one character of characters constituting the words or a plurality of characters consecutive in the words and which is one or more pieces of information capable of being extracted from the words as features; and a model generating unit configured to perform machine learning on the basis of the feature extracted from the document and the identification information correlated with the document and to generate the document classification model.

    Text simplification with minimal hallucination

    公开(公告)号:US12118295B2

    公开(公告)日:2024-10-15

    申请号:US18045551

    申请日:2022-10-11

    申请人: ADOBE INC.

    摘要: Systems and methods for text simplification are described. Embodiments of the present disclosure identify a simplified text that includes original information from a complex text and additional information that is not in the complex text. Embodiments then compute an entailment score for each sentence of the simplified text using a neural network, wherein the entailment score indicates whether the sentence of the simplified text includes information from a sentence of the complex text corresponding to the sentence of the simplified text. Then, embodiments generate a modified text based on the entailment score, the simplified text, and the complex text, wherein the modified text includes the original information and excludes the additional information. Embodiments may then present the modified text to a user via a user interface.

    DISTRIBUTED SPOKEN LANGUAGE INTERFACE FOR CONTROL OF APPARATUSES

    公开(公告)号:US20240330590A1

    公开(公告)日:2024-10-03

    申请号:US18404666

    申请日:2024-01-04

    IPC分类号: G06F40/289

    CPC分类号: G06F40/289

    摘要: Technologies are provided for a distributed spoken language interface for speech control of multiple apparatuses. In some aspects, a first apparatus can receive an audio signal representative of speech. The first apparatus can detect, based on applying a keyphrase recognition model to the speech, a keyphrase. The keyphrase can include a first string of characters defining an identifier corresponding to at least one second apparatus and also includes a second string of characters defining a command. The first apparatus can cause, based on the identifier, a communication unit integrated in the first apparatus to send the keyphrase to the at least one second apparatus. The at least one second apparatus can receive the keyphrase, and can cause one or more components to execute the command.

    SYSTEMS AND METHODS FOR IDENTIFYING AND ANALYZING RISK EVENTS FROM DATA SOURCES

    公开(公告)号:US20240311567A1

    公开(公告)日:2024-09-19

    申请号:US18421095

    申请日:2024-01-24

    摘要: Conventional methods of analyzing social media content involves performing sentimental analysis to understand related sentiment and effects of events on communities. However, such analysis may not be completely accurate and are prone to errors. Present disclosure provides system and method that identify and analyze risk events from data collected from various sources. Key phrases obtained from sources is received, pre-processed, and clustered accordingly. The clustering is performed based on frequency of incoming words. The clustered dataset obtained is classified into one or more categories based on a polarity score. Dataset of specific category (e.g., negative category dataset) is analysed to identify events and topics which are then grouped using an associated label to obtain grouped entities. Each entity is then ranked and assigned a risk score for identifying high-risk events which are then analyzed using simulation and optimization technique(s) and an explainability text for the analyzed risk events is generated.

    Website analyzing method
    9.
    发明授权

    公开(公告)号:US12093650B2

    公开(公告)日:2024-09-17

    申请号:US17580863

    申请日:2022-01-21

    申请人: Jim Liu

    发明人: Jim Liu

    摘要: The present invention includes steps of loading a website project; counting a word count of keywords of the website project for obtaining a word sum of the website project; respectively counting a first anchor word count for each of the anchor types to obtain a first anchor word sum for each of the anchor types; respectively dividing each of the first anchor word sums of each of the anchor types by the word sum to obtain multiple first internal anchor type percentages of the multiple anchor types; loading multiple first default anchor type percentages; for each of the anchor types, when the first internal anchor type percentage is greater than or equal to the first default anchor type percentage, marking the first internal anchor type percentage, and displaying results to prevent over-modifying the keywords corresponding to the marked anchor type, thus keeping the website project from being blacklisted.

    Systems and methods for producing a semantic representation of a document

    公开(公告)号:US12093648B2

    公开(公告)日:2024-09-17

    申请号:US17178754

    申请日:2021-02-18

    申请人: Nice Ltd.

    发明人: Stephen Lauber

    摘要: A system and method for determining an embedding for a document (e.g. representing the document in vector space) by determining for the document a preliminary document embedding; determining for the document a document topic embedding based on a set of nearest topics to the preliminary document embedding; determining for each phrase in the document a topic relevancy score based on the document topic embedding and the embedding associated with the phrase; using a ranking algorithm to determine a saliency score for each phrase in the document, each saliency score based on the topic relevancy score for the phrase, and an inverse frequency score for the phrase; and calculating an embedding for the document using the saliency scores and embedding, for the phrases in the document.