Capturing and extracting fragmented data and data processing using machine learning

    公开(公告)号:US11113742B2

    公开(公告)日:2021-09-07

    申请号:US16575644

    申请日:2019-09-19

    摘要: One or more aspects of the disclosure generally relate to computing devices, computing systems, and computer software that may be used for capturing and extracting fragmented data and for data processing using machine learning. Some aspects disclosed herein are directed to, for example, a system and method comprising generating a display for receiving fragmented data associated with a user. The method may comprise sending, to a user device associated with the user, the display for receiving fragmented data. A computing device may receive, from the user device and via the display for receiving fragmented data, first fragmented data associated with the user. The computing device may extract a plurality of data entries from the first fragmented data. A request for data associated with a first data entry of the plurality of data entries may be sent to the user device. The computing device may determine a data category for each data entry of the plurality of data entries. Based on the determined data category for each data entry of the plurality of data entries, the method may comprise determining one or more of a number of entries in each data category or an amount associated with each data category.

    Neural machine translation systems
    73.
    发明授权

    公开(公告)号:US11113480B2

    公开(公告)日:2021-09-07

    申请号:US16336870

    申请日:2017-09-25

    申请人: GOOGLE LLC

    IPC分类号: G06F40/58 G06F40/44 G06N3/04

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. One of the systems includes an encoder neural network comprising: an input forward long short-term memory (LSTM) layer configured to process each input token in the input sequence in a forward order to generate a respective forward representation of each input token, an input backward LSTM layer configured to process each input token in a backward order to generate a respective backward representation of each input token and a plurality of hidden LSTM layers configured to process a respective combined representation of each of the input tokens in the forward order to generate a respective encoded representation of each of the input tokens; and a decoder subsystem configured to receive the respective encoded representations and to process the encoded representations to generate an output sequence.

    Word embeddings and virtual terms
    75.
    发明授权

    公开(公告)号:US11048884B2

    公开(公告)日:2021-06-29

    申请号:US17060198

    申请日:2020-10-01

    摘要: A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.

    Dynamic extraction of contextually-coherent text blocks

    公开(公告)号:US11031003B2

    公开(公告)日:2021-06-08

    申请号:US15990405

    申请日:2018-05-25

    摘要: Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.

    Sentence generating method and apparatus

    公开(公告)号:US10990765B2

    公开(公告)日:2021-04-27

    申请号:US16745947

    申请日:2020-01-17

    发明人: Jihyun Lee Hoshik Lee

    摘要: A sentence generating method includes: generating a corresponding word set of a source word set generated based on a source sentence; generating words by performing decoding based on feature vectors generated through encoding of the source sentence; adjusting a probability of at least one of the generated words based either one or both of the source word set and the corresponding word set; and selecting character strings from different character strings including each of the generated words based on the adjusted probability and the probability as unadjusted.

    Methods and systems for scalable machine translation

    公开(公告)号:US10902217B1

    公开(公告)日:2021-01-26

    申请号:US16117745

    申请日:2018-08-30

    申请人: Dessert Labs PBC

    摘要: The present disclosure provides methods and systems for generating a rule for machine translation. The method comprises receiving a user input indicating a change of a rule; retrieving a prior version of the rule from a storage unit upon receiving the user input; identifying one or more metrics from a plurality of metrics based at least in part on the prior version of the rule; computing, with aid of one or more processors, a value for each of the one or more metrics by evaluating each metric against a set of examples, wherein the value represents the impact of the change of the rule; and comparing the value to a pre-determined threshold to determine whether the change of the rule is acceptable.