IDENTIFICATION OF READING ORDER TEXT SEGMENTS WITH A PROBABILISTIC LANGUAGE MODEL

    公开(公告)号:US20180267956A1

    公开(公告)日:2018-09-20

    申请号:US15462684

    申请日:2017-03-17

    IPC分类号: G06F17/27

    摘要: A computer implemented method and system identifies correct structured reading-order sequence of text segments that are extracted from a file structured in a portable document format. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments are provided to the probabilistic model, where the sets of text segments comprise a first set including the first text segment and a first continuation text segment. A second set includes the first text segment and a second continuation text segment. A score is obtained for each set of text segments. The score is indicative of a likelihood of the set providing a correct structured reading-order sequence. The probabilistic language model may be generated in accordance with a Recurrent Neural Network or an n-gram model.

    CONTENT PRESENTATION BASED ON A MULTI-TASK NEURAL NETWORK

    公开(公告)号:US20170251081A1

    公开(公告)日:2017-08-31

    申请号:US15053448

    申请日:2016-02-25

    摘要: Techniques for predictively selecting a content presentation in a client-server computing environment are described. In an example, a content management system detects an interaction of a client with a server and accesses client features. Reponses of the client to potential content presentations are predicted based on a multi-task neural network. The client features are mapped to input nodes and the potential content presentations are associated with tasks mapped to output nodes of the multi-task neural network. The tasks specify usages of the potential content presentations in response to the interaction with the server. In an example, the content management system selects the content presentation from the potential content presentations based on the predicted responses. For instance, the content presentation is selected based on having the highest likelihood. The content management system provides the content presentation to the client based on the task corresponding to the content presentation.