Predicting data for document attributes based on aggregated data for repeated URL patterns
    1.
    发明授权
    Predicting data for document attributes based on aggregated data for repeated URL patterns 有权
    基于重复的URL模式的聚合数据预测文档属性的数据

    公开(公告)号:US08645367B1

    公开(公告)日:2014-02-04

    申请号:US12719762

    申请日:2010-03-08

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864

    摘要: One or more hierarchies of string patterns are generated a plurality of URL strings according to a pattern extraction procedure. Repeated string patterns are selected from the generated hierarchies of string patterns. A URL class is defined for each of selected repeated string patterns. Each URL class is associated with a respective group of URL strings in the plurality of URL strings, where the respective group of URL strings contains a repeated string pattern that defines the URL class. Respective aggregated data is calculated for each URL class. The respective aggregated data is based on respective data of each respective document of each URL string in the group of URL strings associated with the URL class. Respective data for a respective document referenced by a lookup-URL is predicted based on respective aggregated data of one or more of the URL classes.

    摘要翻译: 根据模式提取过程,生成多个URL字符串的一个或多个字符串模式层级。 从生成的字符串模式层次中选择重复的字符串模式。 为每个选定的重复字符串模式定义一个URL类。 每个URL类与多个URL字符串中相应的一组URL字符串相关联,其中相应的URL字符串组包含定义URL类的重复字符串模式。 针对每个URL类计算相应的聚合数据。 相应的聚合数据基于与URL类相关联的URL字符串组中的每个URL字符串的每个相应文档的相应数据。 基于查找URL引用的相应文档的相应数据是基于一个或多个URL类别的相应聚合数据来预测的。

    Scalable system for determining short paths within web link network
    2.
    发明授权
    Scalable system for determining short paths within web link network 有权
    用于确定网络链路网络内的短路径的可扩展系统

    公开(公告)号:US08825646B1

    公开(公告)日:2014-09-02

    申请号:US12537717

    申请日:2009-08-07

    IPC分类号: G06F7/00 G06F17/30

    摘要: Systems and methods for finding multiple shortest paths. A directed graph representing web resources and links are divided into shards, each shard comprising a portion of the graph representing multiple web resources. Each of the shards is assigned to a server, and a distance table is calculated in parallel for each of the web resources in each shard using a nearest seed computation in the server to which the shard was assigned.

    摘要翻译: 查找多条最短路径的系统和方法。 表示网页资源和链接的有向图被划分成分片,每个分片包括表示多个网页资源的图表的一部分。 将每个分片分配给服务器,并且使用分配了分片的服务器中的最近的种子计算,并行计算每个分片中的每个网页资源的距离表。

    System and method for electronic communication management
    3.
    发明授权
    System and method for electronic communication management 失效
    电子通讯管理系统及方法

    公开(公告)号:US07099855B1

    公开(公告)日:2006-08-29

    申请号:US09754179

    申请日:2001-01-03

    IPC分类号: G06F17/00 G06N5/02

    CPC分类号: G06Q10/00 G06N5/022 G10L15/26

    摘要: A system and method for electronic communication management comprises a universal data model, a modeling engine, and an adaptive knowledge base. The modeling engine includes a natural language processor and a statistical modeler. A communication is translated from its native format into the universal data model. The modeling engine determines the intent of the communication using the natural language processor and the statistical modeler. A response is generated, either automatically or by an agent. An audit module analyzes each response and provides feedback to the modeling engine and the adaptive knowledge base. The modeling engine uses the feedback to update models in the adaptive knowledge base. The modeling engine supports various application specific modules.

    摘要翻译: 用于电子通信管理的系统和方法包括通用数据模型,建模引擎和自适应知识库。 建模引擎包括自然语言处理器和统计建模器。 通信从其本机格式转换为通用数据模型。 建模引擎使用自然语言处理器和统计建模器来确定通信的意图。 自动或由代理产生响应。 审计模块分析每个响应,并向建模引擎和自适应知识库提供反馈。 建模引擎使用反馈来更新自适应知识库中的模型。 建模引擎支持各种应用程序特定的模块。

    Distributed parallel determination of single and multiple source shortest paths in large directed graphs
    4.
    发明授权
    Distributed parallel determination of single and multiple source shortest paths in large directed graphs 有权
    大型有向图中单源和多源最短路径的分布式并行确定

    公开(公告)号:US08631094B1

    公开(公告)日:2014-01-14

    申请号:US12537681

    申请日:2009-08-07

    CPC分类号: G06F17/10 G06F17/30958

    摘要: Systems and methods for checkpointing a computation distributed over multiple peer servers. On each server, sequentially storing checkpoints collectively representing a current state of the computation on that server as of a most recent checkpoint, each checkpoint having a checkpoint timestamp. When restarting a first server, rebuilding a most recent state of the first server from the checkpoints written by the first server through a most recent checkpoint having a most recent checkpoint timestamp, and requesting from each of the other peer servers updates from the most recent checkpoint timestamp time of the first server. On each server, in response to a first request for updates as of a particular time, deriving the requested updates from the state data in the server uncommitted to a checkpoint and the state data in checkpoints of the server that have a timestamp no earlier than the particular time of the first request, and providing the requested updates to the first server.

    摘要翻译: 用于检查点对多个对等服务器上分布的计算的系统和方法。 在每个服务器上,每个检查点按照最新检查点的顺序存储检查点,共同表示该服务器上的计算的当前状态,每个检查点具有检查点时间戳。 当重新启动第一个服务器时,从第一个服务器通过具有最新检查点时间戳的最新检查点写入的检查点重建第一个服务器的最新状态,并从每个其他对等服务器请求从最近的检查点 第一台服务器的时间戳时间。 在每个服务器上,响应于特定时间的第一次更新请求,从未提交到检查点的服务器中的状态数据中导出所请求的更新,以及服务器的检查点中的状态数据,该时间戳不早于 第一请求的特定时间,以及向第一服务器提供所请求的更新。

    System and method for electronic communication management
    7.
    发明授权
    System and method for electronic communication management 有权
    电子通讯管理系统及方法

    公开(公告)号:US07644057B2

    公开(公告)日:2010-01-05

    申请号:US10839829

    申请日:2004-05-05

    IPC分类号: G06F17/00

    摘要: A system and method for classifying text includes a pre-processor, a knowledge base, and a statistical engine. The pre-processor identifies concepts in the text and creates a structured text object that contains the concepts. The structured text object is then passed to a statistical engine, which applies statistical information provided in nodes of a knowledge base to the structured text object in order to calculate a set of match scores, each match score representing the relevance of the text to an associated one of a plurality of predefined categories. The pre-processor may be implemented in the form of an interpreter which selects and executes a script that includes language- and scenario-specific instructions for performing linguistic and semantic analysis of the text.

    摘要翻译: 用于分类文本的系统和方法包括预处理器,知识库和统计引擎。 预处理器识别文本中的概念,并创建包含概念的结构化文本对象。 然后将结构化文本对象传递给统计引擎,统计引擎将知识库节点中提供的统计信息应用于结构化文本对象,以便计算一组匹配分数,每个匹配分数表示文本与关联文本的相关性 多个预定类别之一。 预处理器可以以解释器的形式实现,该解释器选择和执行脚本,该脚本包括用于执行文本的语言和语义分析的语言和场景特定的指令。

    System and method for increasing email productivity

    公开(公告)号:US09699129B1

    公开(公告)日:2017-07-04

    申请号:US10610964

    申请日:2003-06-30

    IPC分类号: G06F15/16 H04L12/58

    摘要: A system and method for increasing email productivity based on an analysis of the content of received email messages. The system includes a content analysis engine that analyzes the content of a received email message using natural language processing techniques. A prioritization module produces a priority score and a priority level for the message using a prioritization knowledge base. A message sorting module produces a set of suggested folders for the message using a sorting knowledge base. A junkmail module produces a junkman score for the message using a junkmail knowledge base. The prioritization knowledge base, the sorting knowledge base, and the junkmail knowledge base are updated with feedback from the user for each received email message, which allows the system to learn in real-time the user's preferences.

    Producing a ranking for pages using distances in a web-link graph
    9.
    发明授权
    Producing a ranking for pages using distances in a web-link graph 有权
    使用网络链接图中的距离生成页面排名

    公开(公告)号:US09165040B1

    公开(公告)日:2015-10-20

    申请号:US11546755

    申请日:2006-10-12

    申请人: Nissan Hajaj

    发明人: Nissan Hajaj

    IPC分类号: G06F7/00 G06F17/30

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for producing a ranking for pages on the web. In one aspect, a system receives a set of pages to be ranked, wherein the set of pages are interconnected with links. The system also receives a set of seed pages which include outgoing links to the set of pages. The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances. The system then produces a ranking for the set of pages based on the ranking scores for the set of pages.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于产生网页上的页面的排名。 在一个方面,系统接收要被分级的一组页面,其中所述页面集合与链接互连。 该系统还接收一组种子页面,其中包括到该页面集合的传出链接。 然后,系统将根据链接的属性和连接到页面的页面的属性为链接分配长度。 该系统接下来基于页面之间的链接的长度来计算从该组种子页面到页面集合中的每个页面的最短距离。 接下来,系统基于所计算的最短距离来确定该组页面中的每个页面的排名分数。 然后,该系统基于该组页面的排名得分来为该组页面产生排名。

    System and method for classifying text
    10.
    发明授权
    System and method for classifying text 失效
    用于分类文本的系统和方法

    公开(公告)号:US07752159B2

    公开(公告)日:2010-07-06

    申请号:US11843909

    申请日:2007-08-23

    IPC分类号: G06F19/00

    摘要: A system and method for classifying text includes a pre-processor, a knowledge base, and a statistical engine. The pre-processor identifies concepts in the text and creates a structured text object that contains the concepts. The structured text object is then passed to a statistical engine, which applies statistical information provided in nodes of a knowledge base to the structured text object in order to calculate a set of match scores, each match score representing the relevance of the text to an associated one of a plurality of predefined categories. The pre-processor may be implemented in the form of an interpreter which selects and executes a script that includes language- and scenario-specific instructions for performing linguistic and semantic analysis of the text.

    摘要翻译: 用于分类文本的系统和方法包括预处理器,知识库和统计引擎。 预处理器识别文本中的概念,并创建包含概念的结构化文本对象。 然后将结构化文本对象传递给统计引擎,统计引擎将知识库节点中提供的统计信息应用于结构化文本对象,以便计算一组匹配分数,每个匹配分数表示文本与关联文本的相关性 多个预定类别之一。 预处理器可以以解释器的形式实现,该解释器选择和执行脚本,该脚本包括用于执行文本的语言和语义分析的语言和场景特定的指令。