Technique for document editorial quality assessment
    4.
    发明申请
    Technique for document editorial quality assessment 有权
    文件编辑质量评估技术

    公开(公告)号:US20060100852A1

    公开(公告)日:2006-05-11

    申请号:US10969119

    申请日:2004-10-20

    IPC分类号: G06F17/27

    CPC分类号: G06F17/271 G06F17/2785

    摘要: A computer-implemented system and method for assessing the editorial quality of a textual unit (document, paragraph or sentence) is provided. The method includes generating a plurality of training-time feature vectors by automatically extracting features from first and last versions of training documents. The method also includes training a machine-learned classifier based on the plurality of training-time feature vectors. A run-time feature vector is generated for the textual unit to be assessed by automatically extracting features from the textual unit. The run-time feature vector is evaluated using the machine-learned classifier to provide an assessment of the editorial quality of the textual unit.

    摘要翻译: 提供了一种用于评估文本单元(文档,段落或句子)的编辑质量的计算机实现的系统和方法。 该方法包括通过自动提取来自训练文档的第一和最后版本的特征来生成多个训练时特征向量。 该方法还包括基于多个训练时间特征向量训练机器学习分类器。 通过自动从文本单元中提取特征,为要评估的文本单元生成运行时特征向量。 运行时特征向量使用机器学习分类器进行评估,以提供对文本单元的编辑质量的评估。

    General purpose correction of grammatical and word usage errors
    6.
    发明授权
    General purpose correction of grammatical and word usage errors 有权
    通用修正语法和文字使用错误

    公开(公告)号:US09262397B2

    公开(公告)日:2016-02-16

    申请号:US12961516

    申请日:2010-12-07

    摘要: Architecture that detects and corrects writing errors in a human language based on the utilization of three different stages: error detection, correction candidate generation, and correction candidate ranking. The architecture is a generic framework for generating fluent alternatives to non-grammatical word sequences in a written sample. Error detection is addressed by a suite of language model related scores and other scores such as parse scores that can identify a particularly unlikely sequence of words. Correction candidate generation is addressed by a lookup in a very large corpus of “correct” English that looks for alternative arrangements of the same or similar words or subsequences of these words in the same context. Correction candidate ranking is addressed by a language model ranker.

    摘要翻译: 基于利用三个不同阶段的错误检测,校正候选者生成和校正候选排名来检测和纠正以人类语言写入错误的架构。 该架构是用于在书面样本中生成流畅的非语法词序列替代的通用框架。 错误检测通过一套语言模型相关分数和其他分数(例如可以识别特别不可能的单词序列的分析分数)来解决。 校正候选生成通过在一个非常大的“正确”英语语料库中进行查找来寻找,该语料库在相同的上下文中寻找这些单词的相同或相似单词或子序列的替代布置。 校正候选人排名由语言模型游击者处理。

    Click-through prediction for news queries
    7.
    发明授权
    Click-through prediction for news queries 有权
    新闻查询的点击式预测

    公开(公告)号:US08719298B2

    公开(公告)日:2014-05-06

    申请号:US12469692

    申请日:2009-05-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Described is estimating whether an online search query is a news-related query, and if so, outputting news-related results in association with other search results returned in response to the query. The query is processed into features, including by accessing corpora that corresponds to relatively current events, e.g., recently crawled from news and blog articles. A corpus of static reference data, such as an online encyclopedia, may be used to help determine whether the query is less likely to be about current events. Features include frequency-related data and context-related data corresponding to frequency and context information maintained in the corpora. Additional features may be obtained by processing text of the query itself, e.g., “query-only” features.

    摘要翻译: 描述了估计在线搜索查询是否是新闻相关查询,如果是,则输出与响应于该查询返回的其他搜索结果相关联的新闻相关结果。 该查询被处理成特征,包括通过访问对应于相对当前事件的语料库,例如最近从新闻和博客文章中爬行。 可以使用诸如在线百科全书的静态参考数据的语料库来帮助确定查询是否不太可能关于当前事件。 特征包括频率相关数据和对应于语料库中维护的频率和上下文信息的上下文相关数据。 可以通过处理查询本身的文本,例如“仅查询”特征来获得附加特征。

    Providing context for web articles
    8.
    发明授权
    Providing context for web articles 有权
    为网络文章提供上下文

    公开(公告)号:US08630972B2

    公开(公告)日:2014-01-14

    申请号:US12143765

    申请日:2008-06-21

    IPC分类号: G06F17/00 G06N7/00 G06N7/08

    CPC分类号: G06F17/30014

    摘要: An overwhelming number of articles are available everyday via the internet. Unfortunately, it is impossible to peruse more than a handful, and it is difficult to ascertain an article's social context. The techniques disclosed herein address this problem by harnessing implicit and explicit contextual information from social media. By extracting text surrounding a hyperlink to an article in a post and assessing the article as a function of content surrounding the hyperlink, an article's social context is determined and presented. Additionally, articles that are sufficiently similar in content may be grouped to establish a many-to-one relationship between posts and an article, creating a more accurate assessment.

    摘要翻译: 每天通过互联网可以获得绝大多数的文章。 不幸的是,不可能仔细阅读,而且很难确定文章的社会背景。 本文所揭示的技术通过利用来自社交媒体的隐含和明确的上下文信息来解决这个问题。 通过提取文章中超文本文章中的文章,并根据超链接的内容评估文章,确定并呈现文章的社会语境。 此外,内容足够相似的文章可以被分组以在帖子和文章之间建立多对一关系,从而创建更准确的评估。

    Summarization of attached, linked or related materials
    9.
    发明授权
    Summarization of attached, linked or related materials 有权
    附件,链接或相关资料汇总

    公开(公告)号:US08209617B2

    公开(公告)日:2012-06-26

    申请号:US11801810

    申请日:2007-05-11

    IPC分类号: G06F3/00

    CPC分类号: G06Q10/107 G06F17/30719

    摘要: A summarization system and method. The summarization method includes utilizing a first body of information to obtain a second body of information, which is identified (by a hyperlink, an attachment identifier, a reference, etc.) in the first body of information. A summary of the obtained second body of information is then computed. The computed summary can be displayed to a user and/or stored for later use.

    摘要翻译: 总结系统和方法。 总结方法包括利用第一信息体来获得在第一信息体中被识别(通过超链接,附件标识符,参考等)的第二信息体。 然后计算所获得的第二主体的摘要。 计算的摘要可以显示给用户和/或存储以供以后使用。

    Interface and methods for collecting aligned editorial corrections into a database
    10.
    发明申请
    Interface and methods for collecting aligned editorial corrections into a database 有权
    将对齐的编辑修正收集到数据库中的界面和方法

    公开(公告)号:US20080103759A1

    公开(公告)日:2008-05-01

    申请号:US11589126

    申请日:2006-10-27

    IPC分类号: G06F17/20

    CPC分类号: G06F17/2827 G06F17/24

    摘要: A method for providing aligned editorial corrections to a database is discussed. The method includes receiving a first text in a language and organizing the first text into one or more sentences. The method further includes editing a copy of the first text to create a second text. The second text is in the language of the first text. The method further includes aligning the sentences of the first text with corresponding sentences of the second text storing the aligned sentences on a computer readable medium. A system for providing a data structure having aligned editorial corrections is also discussed. The system includes an alignment component for receiving a first text and organizing the first text into sentences. The system also includes a user interface configured to provide a second text, wherein the second text is an edited version of the first text in the language of the first text.

    摘要翻译: 讨论了一种用于向数据库提供对齐的编辑修正的方法。 该方法包括以语言接收第一文本并将第一文本组织成一个或多个句子。 该方法还包括编辑第一文本的副本以创建第二文本。 第二个文本是第一个文本的语言。 该方法还包括将第一文本的句子与存储在计算机可读介质上的对准句子的第二文本的对应句子对齐。 还讨论了一种用于提供具有对准的编辑校正的数据结构的系统。 该系统包括用于接收第一文本并将第一文本组织成句子的对准部件。 该系统还包括被配置为提供第二文本的用户界面,其中第二文本是第一文本的语言的第一文本的编辑版本。