Using visual features to identify document sections

    公开(公告)号:US10565444B2

    公开(公告)日:2020-02-18

    申请号:US15698212

    申请日:2017-09-07

    IPC分类号: G06K9/00 G06K9/62 G06K9/34

    摘要: A method, computer system, and a computer program product for identifying sections in a document based on a plurality of visual features is provided. The present invention may include receiving a plurality of documents. The present invention may also include extracting a plurality of content blocks. The present invention may further include determining the plurality of visual features. The present invention may then include grouping the extracted plurality of content blocks into a plurality of categories. The present invention may also include generating a plurality of closeness scores for the plurality of categories by utilizing a Visual Similarity Measure. The present invention may further include generating a plurality of Association Matrices on the plurality of categories for each of the received plurality of documents based on the Visual Similarity Measure. The present invention may further include merging the plurality of categories into a plurality of clusters.

    USING VISUAL FEATURES TO IDENTIFY DOCUMENT SECTIONS

    公开(公告)号:US20190073528A1

    公开(公告)日:2019-03-07

    申请号:US15698212

    申请日:2017-09-07

    IPC分类号: G06K9/00 G06K9/62

    摘要: A method, computer system, and a computer program product for identifying sections in a document based on a plurality of visual features is provided. The present invention may include receiving a plurality of documents. The present invention may also include extracting a plurality of content blocks. The present invention may further include determining the plurality of visual features. The present invention may then include grouping the extracted plurality of content blocks into a plurality of categories. The present invention may also include generating a plurality of closeness scores for the plurality of categories by utilizing a Visual Similarity Measure. The present invention may further include generating a plurality of Association Matrices on the plurality of categories for each of the received plurality of documents based on the Visual Similarity Measure. The present invention may further include merging the plurality of categories into a plurality of clusters.

    NATURAL LANGUAGE BIAS DETECTION IN CONVERSATIONAL SYSTEM ENVIRONMENTS

    公开(公告)号:US20220374604A1

    公开(公告)日:2022-11-24

    申请号:US17324043

    申请日:2021-05-18

    摘要: A method, apparatus and computer program for detecting natural language (NL) bias by a conversational system is described. Embodiments of the invention determine an NL bias in a set of training questions used to train a machine learning model used by the conversational system to select a user intent. Other embodiments of the invention determine an NL bias in a user question received by the conversation system as compared to the set of training questions. The NL bias causes the machine learning to preferentially associate user queries to a user intent. In respective embodiments, the system takes a corrective action to adjust the NL bias of the training questions or the user question.

    System and method for searching audio data

    公开(公告)号:US11210337B2

    公开(公告)日:2021-12-28

    申请号:US16161930

    申请日:2018-10-16

    摘要: An audio search system is configured to perform a native search of one or more audio input files in response to a search query. The audio search system is connected to a corpus of audio files representing words, syllables, and characters that may be found in an audio input file. The audio search system has a memory storing instructions and a processing device configured to execute the instructions to receive a search query for searching one or more audio input files, convert the search query into an audio search expression, identify one or more meta-tags in the audio search expression, select a machine learning model based on the one or more meta-tags, and use the machine learning model to search the one or more audio input files for segments of the audio input file that are results of the search query.