Learning synonymous object names from anchor texts
    31.
    发明授权
    Learning synonymous object names from anchor texts 有权
    从锚文本学习同义对象名称

    公开(公告)号:US08738643B1

    公开(公告)日:2014-05-27

    申请号:US11833180

    申请日:2007-08-02

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/2235 G06F17/30864

    摘要: A repository contains objects representing entities. The objects also include facts about the represented entities. The facts are derived from source documents. A synonymous name of an object is determined by identifying a source document from which one or more facts of the entity represented by the object were derived, identifying a plurality of linking documents that link to the source document through hyperlinks, each hyperlink having an anchor text, processing the anchor texts in the plurality of linking documents to generate a collection of synonym candidates for the entity represented by the object, and selecting a synonymous name for the entity represented by the object from the collection of synonym candidates.

    摘要翻译: 存储库包含表示实体的对象。 这些对象还包括有关被表示实体的事实。 事实来自源文件。 通过识别源文档来确定对象的同义名称,从源文档中导出由对象表示的实体的一个或多个事实,通过超链接识别链接到源文档的多个链接文档,每个超链接具有锚文本 处理所述多个链接文档中的所述锚定文本以生成由所述对象表示的所述实体的同义词候选的集合,以及从所述同义词候选的集合中选择由所述对象表示的所述实体的同义名称。

    Domain-specific sentiment classification
    32.
    发明授权
    Domain-specific sentiment classification 有权
    域特定情绪分类

    公开(公告)号:US08356030B2

    公开(公告)日:2013-01-15

    申请号:US13163623

    申请日:2011-06-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.

    摘要翻译: 创建一个域特定情绪分类器,可用于评估由领域特定文档表达的情绪的极性和程度。 建立一个独立于领域的情绪词典,一个分类器使用词典来评价领域特定文件的情感。 确定了具有正极性和负极性的高情绪文件。 高信度文件中的n-gram被过滤以去除非常常见的n-gram。 过滤的n-gram被保存为域特定的情绪词典,并被用作模型中的特征。 该模型使用一组培训文件进行培训,培训文档可以手动或自动标记为对整个情境的整体情绪,以便在域特定情绪词典中产生n-gram的情绪评分。 该词典由域特定的情感分类器使用。

    Domain-Specific Sentiment Classification
    33.
    发明申请
    Domain-Specific Sentiment Classification 有权
    域特定情绪分类

    公开(公告)号:US20110252036A1

    公开(公告)日:2011-10-13

    申请号:US13163623

    申请日:2011-06-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.

    摘要翻译: 创建一个域特定情绪分类器,可用于评估由领域特定文档表达的情绪的极性和程度。 建立一个独立于领域的情绪词典,一个分类器使用词典来评价领域特定文件的情感。 确定了具有正极性和负极性的高情绪文件。 高信度文件中的n-gram被过滤以去除非常常见的n-gram。 过滤的n-gram被保存为域特定的情绪词典,并被用作模型中的特征。 该模型使用一组培训文件进行培训,培训文档可以手动或自动标记为对整个情境的整体情绪,以便在域特定情绪词典中产生n-gram的情绪评分。 该词典由域特定情绪分类器使用。

    System and method for updating facts in a fact repository
    35.
    发明授权
    System and method for updating facts in a fact repository 有权
    用于更新事实存储库中的事实的系统和方法

    公开(公告)号:US07739212B1

    公开(公告)日:2010-06-15

    申请号:US11692475

    申请日:2007-03-28

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30631

    摘要: Metadata is used to determine rules that can be applied to facts. In one embodiment, correlations are identified among types of objects and the attributes of the facts associated with those objects. In another embodiment, correlations are identified among types of objects, the attributes of the facts associated with the objects, and the format and/or range of the values of the facts having those attributes. When a correlation exists between objects of a given type and the attributes of the facts associated with objects of that type, a rule is created for objects of that type. The rule is applied to objects of the given type.

    摘要翻译: 元数据用于确定可应用于事实的规则。 在一个实施例中,在对象的类型和与这些对象相关联的事实的属性之间识别相关性。 在另一个实施例中,在对象的类型,与对象相关联的事实的属性以及具有这些属性的事实的值的格式和/或范围之间识别相关性。 当给定类型的对象与与该类型的对象相关联的事实的属性之间存在相关性时,将为该类型的对象创建一个规则。 该规则应用于给定类型的对象。

    Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
    36.
    再颁专利
    Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases 有权
    Lempel-Ziv数据压缩技术利用预先填充了频繁字母组合,单词和/或短语的字典

    公开(公告)号:USRE41152E1

    公开(公告)日:2010-02-23

    申请号:US09952602

    申请日:2001-09-14

    IPC分类号: G06F7/00 G06F15/00 H03M7/00

    摘要: An adaptive compression technique which is an improvement to Lempel-Ziv (LZ) compression techniques, both as applied for purposes of reducing required storage space and for reducing the transmission time associated with transferring data from point to point. Pre-filled compression dictionaries are utilized to address the problem with prior Lempel-Ziv techniques in which the compression software starts with an empty compression dictionary, whereby little compression is achieved until the dictionary has been filled with sequences common in the data being compressed. In accordance with the invention, the compression dictionary is pre-filled, prior to the beginning of the data compression, with letter sequences, words and/or phrases frequent in the domain from which the data being compressed is drawn. The letter sequences, words, and/or phrases used in the pre-filled compression dictionary may be determined by statistically sampling text data from the same genre of text. Multiple pre-filled dictionaries may be utilized by the compression software at the beginning of the compression process, where the most appropriate dictionary for maximum compression is identified and used to compress the current data. These modifications are made to any of the known Lempel-Ziv compression techniques based on the variants detailed in 1977 and 1978 articles by Ziv and Lempel.

    摘要翻译: 一种适用于Lempel-Ziv(LZ)压缩技术的自适应压缩技术,既适用于减少所需存储空间,又减少与点对点传输数据相关的传输时间。 使用预填充压缩字典来解决先前的Lempel-Ziv技术的问题,其中压缩软件以空的压缩字典开始,由此在字典已经被填充在被压缩的数据中的序列之前实现很少的压缩。 根据本发明,压缩字典在数据压缩开始之前被预填充,其中绘制了被压缩数据的域中频繁的字母序列,单词和/或短语。 预填充压缩字典中使用的字母序列,单词和/或短语可以通过对来自相同文本类型的文本数据进行统计学抽样来确定。 在压缩过程开始时,压缩软件可以使用多个预填充字典,其中最适合用于最大压缩的字典被识别并用于压缩当前数据。 基于Ziv和Lempel在1977年和1978年的文章中详细描述的这些变体,对这些修改进行了任何已知的Lempel-Ziv压缩技术。

    Method and system for providing audio playback of a multi-source document
    37.
    发明授权
    Method and system for providing audio playback of a multi-source document 有权
    用于提供多源文档的音频播放的方法和系统

    公开(公告)号:US06446041B1

    公开(公告)日:2002-09-03

    申请号:US09428259

    申请日:1999-10-27

    IPC分类号: G10L1308

    摘要: A multi-source input and playback utility that accepts inputs from various sources, transcribes the inputs as text, and plays aloud user-selected portions of the text is disclosed. The user may select a portion of the text and request audio playback thereof. The utility examines each transcribed word in the selected text. If stored audio data is associated with a given word, that audio data is retrieved and played. If no audio data is associated, then a textto-speech entry or series of entries is retrieved and played instead.

    摘要翻译: 一种多源输入和播放实用程序,可以接受来自各种来源的输入,以文本形式输入输入,并播放用户选择的文本部分。 用户可以选择文本的一部分并请求其音频回放。 该实用程序检查所选文本中的每个转录词。 如果存储的音频数据与给定字相关联,则该音频数据被检索和播放。 如果没有音频数据相关联,则取代并播放文本到语音条目或一系列条目。

    Background audio recovery system
    38.
    发明授权
    Background audio recovery system 有权
    背景音频恢复系统

    公开(公告)号:US06415258B1

    公开(公告)日:2002-07-02

    申请号:US09413262

    申请日:1999-10-06

    IPC分类号: G10L1914

    CPC分类号: G06F3/167 G10L2015/223

    摘要: A background audio recovery system displays an inactive status indicator for a speech recognition program module in an application program. To prevent losses of dictated speech when a speech recognition program module is inadvertently assigned to an inactive mode, the background audio recovery system determines whether an audio input device is receiving audio. If audio is being received by the audio input device, the background audio recovery system stores the audio data for later retrieval by the user. When a user issues a command to activate the speech recognition program module, the background audio recovery system initiates a background audio program module for manipulating the stored audio data that was recorded while the speech recognition program module was assigned to an inactive mode.

    摘要翻译: 背景音频恢复系统在应用程序中显示语音识别程序模块的非活动状态指示符。 为了防止语音识别程序模块被无意地分配到非活动模式时指令语音的丢失,背景音频恢复系统确定音频输入设备是否正在接收音频。 如果音频输入设备正在接收音频,则背景音频恢复系统存储音频数据供用户稍后检索。 当用户发出激活语音识别程序模块的命令时,背景音频恢复系统启动背景音频程序模块,用于在语音识别程序模块被分配到非活动模式时操纵所记录的所存储的音频数据。