Detecting document text that is hard to read
    1.
    发明授权
    Detecting document text that is hard to read 有权
    检测难以阅读的文档文本

    公开(公告)号:US08990224B1

    公开(公告)日:2015-03-24

    申请号:US13674320

    申请日:2012-11-12

    Applicant: Google Inc.

    CPC classification number: G06F7/00 G06F17/214 G06F17/30 G06Q10/10 G06Q30/0201

    Abstract: A computer system is configured to determine portions of text extracted from a corresponding group of documents; process a particular portion of text by a set of filters, where the particular portion of text may correspond to a particular document, and where each of the filters may generate a respective score based on processing the particular portion of text; calculate a readability score based on the respective scores generated by the filters; determine that the readability score satisfies a threshold score; and generate or select a new portion of text, for the particular document, based on determining that the readability score satisfies the threshold score.

    Abstract translation: 计算机系统被配置为确定从相应文档组提取的文本的部分; 通过一组过滤器处理文本的特定部分,其中文本的特定部分可以对应于特定文档,并且其中每个过滤器可以基于处理文本的特定部分生成相应的分数; 基于过滤器生成的各个分数计算可读性分数; 确定可读性分数满足阈值分数; 并且基于确定可读性分数满足阈值分数,为特定文档生成或选择文本的新部分。

Patent Agency Ranking