System and method for automatically ranking lines of text
    2.
    发明授权
    System and method for automatically ranking lines of text 有权
    自动排列文本行的系统和方法

    公开(公告)号:US08005845B2

    公开(公告)日:2011-08-23

    申请号:US12124086

    申请日:2008-05-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675

    摘要: Disclosed are apparatus and methods for ranking lines of text. In one embodiment, an intent of a query is ascertained. A relevance of each one of a plurality of lines of text of a document is determined based upon the intent of the query, content of the query, and content of each of the plurality of lines of text. The plurality of lines of text may then be ranked according to the determined relevance of each of the plurality of lines of text.

    摘要翻译: 公开了用于对文本排列进行排序的装置和方法。 在一个实施例中,确定查询的意图。 基于查询的意图,查询的内容以及多条文本行中的每一行的内容来确定文档的多行文本中的每一行的相关性。 然后可以根据所确定的多行文本中的每一行的确定的相关性来对多行文本进行排名。

    System and method for extracting entities of interest from text using n-gram models
    4.
    发明授权
    System and method for extracting entities of interest from text using n-gram models 有权
    使用n-gram模型从文本中提取感兴趣的实体的系统和方法

    公开(公告)号:US07493293B2

    公开(公告)日:2009-02-17

    申请号:US11421379

    申请日:2006-05-31

    IPC分类号: G06F15/18

    CPC分类号: G06F17/278

    摘要: A document (or multiple documents) is analyzed to identify entities of interest within that document. This is accomplished by constructing n-gram or bi-gram models that correspond to different kinds of text entities, such as chemistry-related words and generic English words. The models can be constructed from training text selected to reflect a particular kind of text entity. The document is tokenized, and the tokens are run against the models to determine, for each token, which kind of text entity is most likely to be associated with that token. The entities of interest in the document can then be annotated accordingly.

    摘要翻译: 分析文档(或多个文档)以识别该文档中感兴趣的实体。 这是通过构建对应于不同类型的文本实体(如化学相关词和通用英文单词)的n-gram或bi-gram模型来实现的。 这些模型可以通过选择的训练文本来构建,以反映特定类型的文本实体。 文档被标记化,并且令牌针对模型运行,以针对每个令牌确定哪种文本实体最有可能与该令牌相关联。 然后可以相应地注释文档中感兴趣的实体。

    SYSTEM AND METHOD FOR IMPROVING THE PERFORMANCE OF OPERATIONS REQUIRING PARITY READS IN A STORAGE ARRAY SYSTEM
    5.
    发明申请
    SYSTEM AND METHOD FOR IMPROVING THE PERFORMANCE OF OPERATIONS REQUIRING PARITY READS IN A STORAGE ARRAY SYSTEM 失效
    用于改善存储阵列系统中要求读取的操作性能的系统和方法

    公开(公告)号:US20080155194A1

    公开(公告)日:2008-06-26

    申请号:US12037480

    申请日:2008-02-26

    IPC分类号: G06F12/00

    CPC分类号: G06F11/1076 G06F2211/1054

    摘要: A system for improving a performance of a write process in an exemplary RAID system reduces a number of IOs required for a short write in a RAID algorithm by using a replicated-parity drive. Parity is stored on the parity portion of the disk drives. A replicated-parity drive comprises all the parity information. Parity information for each parity drive is co-located or mirrored on the replicated-parity portion of the disk drives for fast access during a read portion of the read-modify-write process. Consequently, the system accesses parity data with one seek, as opposed to P seeks in a conventional disk array system utilizing P parity drives.

    摘要翻译: 用于改进示例性RAID系统中的写入处理的性能的系统通过使用复制奇偶校验驱动器来减少RAID算法中的短写入所需的IO数量。 奇偶校验存储在磁盘驱动器的奇偶校验部分。 复制奇偶校验驱动器包括所有奇偶校验信息。 每个奇偶校验驱动器的奇偶校验信息在磁盘驱动器的复制奇偶校验部分上位于或镜像,以便在读 - 修改 - 写入过程的读取部分期间快速访问。 因此,与使用P奇偶校验驱动器的传统磁盘阵列系统中的P寻找相反,系统使用一次寻道访问奇偶校验数据。

    System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages
    7.
    发明授权
    System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages 失效
    系统,方法和服务,用于使用集中的随机游走从超链接页面集合中的主题生成样本

    公开(公告)号:US07640488B2

    公开(公告)日:2009-12-29

    申请号:US11004412

    申请日:2004-12-04

    IPC分类号: G06F17/00 G06F17/20

    CPC分类号: G06F17/30864

    摘要: A focused random walk system produces samples of on-topic pages from a collection of hyper-linked pages such as Web pages. The focused random walk system utilizes a focused random walk to produce a focused sample, which is a random sample of Web pages focused on a topic. The focused random walk system uniformly samples pages iteratively, where each iteration follows a random link from a union of the in-links and out-links of a page. The system then classifies this randomly selected link to determine whether the page is on-topic. The random walk sampling process could comprise a hard-focus method that selects only on-topic pages at each step of the focused random walk, or a soft-focus method that allows limited divergence to off-topic pages.

    摘要翻译: 集中的随机游走系统从一系列超链接页面(如网页)生成主题页面的样本。 集中的随机游走系统利用一个集中的随机游走来产生一个聚焦的样本,这是一个专注于主题的网页的随机抽样。 集中的随机游走系统统一地对页面进行一次抽样,其中每次迭代都遵循一个页面的链接和外链的联合的随机链接。 然后,系统对这个随机选择的链接进行分类,以确定该页面是否是主题的。 随机游走抽样过程可以包括仅在聚焦随机游走的每个步骤选择专题页面的硬焦点方法,或者允许有限散点到偏离主题页面的软焦点方法。

    SYSTEM AND METHOD FOR AUTOMATICALLY RANKING LINES OF TEXT
    8.
    发明申请
    SYSTEM AND METHOD FOR AUTOMATICALLY RANKING LINES OF TEXT 有权
    用于自动排列文本行的系统和方法

    公开(公告)号:US20090292683A1

    公开(公告)日:2009-11-26

    申请号:US12124086

    申请日:2008-05-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675

    摘要: Disclosed are apparatus and methods for ranking lines of text. In one embodiment, an intent of a query is ascertained. A relevance of each one of a plurality of lines of text of a document is determined based upon the intent of the query, content of the query, and content of each of the plurality of lines of text. The plurality of lines of text may then be ranked according to the determined relevance of each of the plurality of lines of text.

    摘要翻译: 公开了用于对文本排列进行排序的装置和方法。 在一个实施例中,确定查询的意图。 基于查询的意图,查询的内容以及多条文本行中的每一行的内容来确定文档的多行文本中的每一行的相关性。 然后可以根据所确定的多行文本中的每一行的确定的相关性来对多行文本进行排名。

    System and method for tolerating multiple storage device failures in a storage system with constrained parity in-degree
    9.
    发明授权
    System and method for tolerating multiple storage device failures in a storage system with constrained parity in-degree 失效
    在具有约束奇偶校验的存储系统中容忍多个存储设备故障的系统和方法

    公开(公告)号:US07519629B2

    公开(公告)日:2009-04-14

    申请号:US10956466

    申请日:2004-09-30

    IPC分类号: G06F17/00

    CPC分类号: G06F11/1076 Y10S707/99953

    摘要: A fault-tolerant system for storage arrays has constraints on the number of data from which each redundancy value is computed. The fault-tolerant system has embodiments that are supported on small array sizes to arbitrarily large array sizes, and can tolerate a large number T of failures. Certain embodiments can tolerate many instances of more than T failures. The fault-tolerant system has efficient XOR-based encoding, recovery, and updating algorithms and has simple redundancy formulas. The fault-tolerant system has improved IO seek costs for certain multiple-element sequential host updates.

    摘要翻译: 用于存储阵列的容错系统对从其计算每个冗余值的数据数量具有约束。 容错系统具有支持小阵列大小到任意大的阵列大小的实施例,并且可以容忍大量T的故障。 某些实施例可以容忍多于T个故障的许多实例。 容错系统具有高效的基于XOR的编码,恢复和更新算法,并具有简单的冗余公式。 容错系统已经提高了某些多元素顺序主机更新的IO查找成本。