METHOD AND SYSTEM FOR WEB EXTRACTION
    1.
    发明申请
    METHOD AND SYSTEM FOR WEB EXTRACTION 审中-公开
    网络提取的方法和系统

    公开(公告)号:US20120005207A1

    公开(公告)日:2012-01-05

    申请号:US12828305

    申请日:2010-07-01

    CPC classification number: G06F16/9535

    Abstract: A method includes generating, a plurality of sets of pairs of records from a set of records, for each attribute-position pair in the set of records. Each attribute-position pair being indicative of a position of an attribute in a record. Further, the method includes forming, electronically, a plurality of groups, each group comprising two attribute-position pairs having different attributes. Further, the method also includes determining, electronically for each group, number of pairs of records that are common in the two attribute-position pairs of that group. Furthermore, the method includes extracting results based on a first group of the plurality of groups if the number of pairs of records that are common in the two attribute-position pairs of the first group is greater than a second threshold, is highest among the plurality of groups, and no group having three or more attribute-position pairs with different attributes is possible.

    Abstract translation: 一种方法包括针对该组记录中的每个属性位置对,从一组记录生成多组记录对。 每个属性位置对指示记录中属性的位置。 此外,该方法包括以电子方式形成多个组,每个组包括具有不同属性的两个属性位置对。 此外,该方法还包括以电子方式确定每组的在该组的两个属性位置对中共有的记录对数。 此外,该方法包括:如果第一组的两个属性位置对中共同的记录对数大于第二阈值,则基于多个组中的第一组来提取结果,在多个组中是最高的 的组,并且没有具有三个或更多个具有不同属性的属性位置对的组是可能的。

    METHOD AND SYSTEM FOR DETERMINING SIMILARITY SCORE
    2.
    发明申请
    METHOD AND SYSTEM FOR DETERMINING SIMILARITY SCORE 有权
    用于确定相似度的方法和系统

    公开(公告)号:US20110225173A1

    公开(公告)日:2011-09-15

    申请号:US12721577

    申请日:2010-03-11

    CPC classification number: G06K9/3266 G06K9/723 G06K2209/01

    Abstract: A method includes generating, electronically, one or more matching patterns for one or more pairs of attribute values. Each pair includes two attribute values. The two attribute values include a first attribute value from a first record and a second attribute value from a second record. The first attribute value and the second attribute value satisfy a first criterion. Further, the method includes identifying, electronically, matching segment between the first attribute value and the second attribute value of a first pair. The method also includes repeating identifying for each pair. Moreover, the method includes computing a similarity score for the first pair using one of the first pair and the matching segment based on the one or more matching patterns and matching segments of the one or more pairs satisfying a second criterion. The method also includes repeating computing for each pair.

    Abstract translation: 一种方法包括以电子方式生成一对或多对属性值的一个或多个匹配模式。 每对包含两个属性值。 两个属性值包括来自第一记录的第一属性值和来自第二记录的第二属性值。 第一属性值和第二属性值满足第一标准。 此外,该方法包括识别电子地匹配第一属性值与第一对的第二属性值之间的片段。 该方法还包括每对重复识别。 此外,该方法包括基于一个或多个匹配模式和满足第二标准的一个或多个对中的匹配片段,使用第一对和匹配片段中的一个来计算第一对的相似性得分。 该方法还包括对每对重复计算。

    Method of aggregate statistic computation
    3.
    发明授权
    Method of aggregate statistic computation 有权
    聚合统计计算方法

    公开(公告)号:US07738404B2

    公开(公告)日:2010-06-15

    申请号:US11656465

    申请日:2007-01-23

    Abstract: A method of grouping nodes within a distributed network is provided. The example method includes performing a leader node self determination operation by which each node within the distributed network determines whether to become a leader node or a non-leader node, each leader node being the leader of a group including at least one node. Next, requests are sent, from each leader node, requesting at least one non-leader node to join the group associated with the leader node. First received requests are accepted, at each non-leader node, such that accepting non-leader nodes transition from a non-leader node to a dependent node dependent upon the requesting leader node. A next set of requests are sent, from each remaining non-leader node, requesting to join the group associated with at least one leader node. A determination is made, at each requested leader node, as to whether to accept the non-leader node into the group associated with the requested leader node. Based on the determination, at each requested leader node, the non-leader node is either accepted into the group associated with the requested leader node, or is alternatively rejected from the group.

    Abstract translation: 提供了一种在分布式网络内对节点进行分组的方法。 示例性方法包括执行前导节点自我确定操作,通过该前导节点自我确定操作,分布式网络内的每个节点确定是否成为领导节点或非前导节点,每个首领节点是包括至少一个节点的组的首领。 接下来,从每个领导节点发送请求,请求至少一个非前导节点加入与该领导节点相关联的组。 在每个非前导节点处接收第一接收的请求,使得接受非前导节点从非前导节点到依赖于请求的领导节点的依赖节点的转换。 从每个剩余的非前导节点发送下一组请求,请求加入与至少一个前导节点相关联的组。 在每个请求的领导节点处,确定是否将非前导节点接受到与所请求的领导节点相关联的组中。 基于确定,在每个请求的领导节点处,非前导节点被接受到与所请求的领导节点相关联的组中,或者被替代地从组中被拒绝。

    Streaming algorithms for robust, real-time detection of DDoS attacks
    4.
    发明授权
    Streaming algorithms for robust, real-time detection of DDoS attacks 有权
    用于强大,实时检测DDoS攻击的流式算法

    公开(公告)号:US07669241B2

    公开(公告)日:2010-02-23

    申请号:US10954901

    申请日:2004-09-30

    CPC classification number: H04L29/06027 H04L63/1458 H04L65/607

    Abstract: A distinct-count estimate is obtained in a guaranteed small footprint using a two level hash, distinct count sketch. A first hash fills the first-level hash buckets with an exponentially decreasing number of data-elements. These are then uniformly hashed to an array of second-level-hash tables, and have an associated total-element counter and bit-location counters. These counters are used to identify singletons and so provide a distinct-sample and a distinct-count. An estimate of the total distinct-count is obtained by dividing by the distinct-count by the probability of mapping a data-element to that bucket. An estimate of the total distinct-source frequencies of destination address can be found in a similar fashion. By further associating the distinct-count sketch with a list of singletons, a total singleton count and a heap containing the destination addresses ordered by their distinct-source frequencies, a tracking distinct-count sketch may be formed that has considerably improved query time.

    Abstract translation: 使用两级散列,不同的计数草图在保证的小尺寸中获得不同的计数估计。 第一个散列填充了数据元素数量级数下降的第一级哈希桶。 然后将它们均匀地散列到二级哈希表的阵列,并具有关联的全元计数器和位位计数器。 这些计数器用于识别单例,因此提供了不同的样本和不同的数字。 通过将distinct-count除以将数据元素映射到该存储桶的概率,可以获得总区分计数的估计。 可以以类似的方式找到目的地地址的不同源频率的总体估计。 通过进一步将不同数量的草图与单例列表相关联,总共单例数和包含由其不同源频率排​​序的目的地地址的堆,可以形成具有显着改进的查询时间的跟踪不同计划草图。

    Tracking set-expression cardinalities over continuous update streams
    5.
    发明授权
    Tracking set-expression cardinalities over continuous update streams 有权
    跟踪连续更新流中的设置表达式基数

    公开(公告)号:US07596544B2

    公开(公告)日:2009-09-29

    申请号:US11025355

    申请日:2004-12-29

    CPC classification number: G06F17/30469 Y10S707/99932

    Abstract: A method of estimating set-expression cardinalities over data streams with guaranteed small maintenance time per data-element update. The method only examines each data element once and uses a limited amount of memory. The time-efficient stream synopsis extends 2-level hash-sketches by randomly, but uniformly, pre-hashing data-elements prior to logarithmically hashing them to a first-level hash-table. This generates a set of independent 2-level hash-sketches. The set-union cardinality can be estimated by determining the smallest hash-bucket index j at which only a predetermined fraction of the b hash-buckets has a non-empty union |A∪B|. Once a set-union cardinality is estimated, general set-expression cardinalities may be estimated by counting witness elements for the set-expression, i.e., those first-level hash-buckets that are both a singleton for the set-expression and a set-union singleton. The set-expression cardinality is the set-union cardinality times the number of witness elements divided by the number of hash-buckets.

    Abstract translation: 一种估计数据流上的设置表达式基数的方法,每个数据元素更新保证小的维护时间。 该方法仅检查每个数据元素一次并使用有限的内存。 时间有效的流摘要通过随机,但统一地将数据元素进行对数散列之前的第一级散列表来扩展二级散列草图。 这产生一组独立的2级散列草图。 可以通过确定最小的哈希桶索引j来估计设置联合的基数,其中只有预定的b个哈希桶的一部分具有非空联合|A∪B|。 一旦估计了一个组合基数,就可以通过对集表达式的见证元素进行计数来估计一般的集合表示基数,即那些既是集合表达式的单例的一级哈希数据包, 联合单身人士 set-expression的基数是set-union的基数乘以证人的数量除以哈希桶的数量。

    Method and Apparatus for Efficient Aggregate Computation over Data Streams
    6.
    发明申请
    Method and Apparatus for Efficient Aggregate Computation over Data Streams 有权
    用于数据流高效汇总计算的方法和装置

    公开(公告)号:US20090006346A1

    公开(公告)日:2009-01-01

    申请号:US11770926

    申请日:2007-06-29

    Abstract: Improved techniques are disclosed for processing data stream queries wherein a data stream is obtained, a set of aggregate queries to be executed on the data stream is obtained, and a query plan for executing the set of aggregate queries on the data stream is generated. In a first method, the generated query plan includes generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model. In a second method, the generated query plan includes identifying similar filters in two or more aggregate queries of the set of aggregate queries and combining the similar filters into a single filter such that the single filter is usable to pre-filter data input to the two or more aggregate queries.

    Abstract translation: 公开了用于处理数据流查询的改进技术,其中获得数据流,获得要在数据流上执行的一组聚合查询,并且生成用于在数据流上执行聚合查询集合的查询计划。 在第一种方法中,生成的查询计划包括生成至少一个中间聚合查询,其中中间聚合查询组合来自聚合查询集合的聚合查询的子集,以便在执行之前从数据流预聚合数据 聚合查询的子集,使得生成的查询计划基于给定的成本模型被优化用于计算费用。 在第二种方法中,所生成的查询计划包括在集合查询集合的两个或多个聚合查询中识别类似的过滤器,并将类似的过滤器组合成单个过滤器,使得单个过滤器可用于预先过滤输入到两个 或更多聚合查询。

    Method of aggregate statistic computation
    7.
    发明申请
    Method of aggregate statistic computation 有权
    聚合统计计算方法

    公开(公告)号:US20080175169A1

    公开(公告)日:2008-07-24

    申请号:US11656465

    申请日:2007-01-23

    Abstract: A method of grouping nodes within a distributed network is provided. The example method includes performing a leader node self determination operation by which each node within the distributed network determines whether to become a leader node or a non-leader node, each leader node being the leader of a group including at least one node. Next, requests are sent, from each leader node, requesting at least one non-leader node to join the group associated with the leader node. First received requests are accepted, at each non-leader node, such that accepting non-leader nodes transition from a non-leader node to a dependent node dependent upon the requesting leader node. A next set of requests are sent, from each remaining non-leader node, requesting to join the group associated with at least one leader node. A determination is made, at each requested leader node, as to whether to accept the non-leader node into the group associated with the requested leader node. Based on the determination, at each requested leader node, the non-leader node is either accepted into the group associated with the requested leader node, or is alternatively rejected from the group.

    Abstract translation: 提供了一种在分布式网络内对节点进行分组的方法。 示例性方法包括执行前导节点自我确定操作,通过该前导节点自我确定操作,分布式网络内的每个节点确定是否成为领导节点或非前导节点,每个首领节点是包括至少一个节点的组的首领。 接下来,从每个领导节点发送请求,请求至少一个非前导节点加入与该领导节点相关联的组。 在每个非前导节点处接收第一接收的请求,使得接受非前导节点从非前导节点到依赖于请求的领导节点的依赖节点的转换。 从每个剩余的非前导节点发送下一组请求,请求加入与至少一个前导节点相关联的组。 在每个请求的领导节点处,确定是否将非前导节点接受到与所请求的领导节点相关联的组中。 基于确定,在每个请求的领导节点处,非前导节点被接受到与所请求的领导节点相关联的组中,或者被替代地从组中被拒绝。

    System and method for optimally configuring border gateway selection for transit traffic flows in a computer network
    8.
    发明授权
    System and method for optimally configuring border gateway selection for transit traffic flows in a computer network 有权
    用于最佳配置计算机网络中的过境业务流的边界网关选择的系统和方法

    公开(公告)号:US07197040B2

    公开(公告)日:2007-03-27

    申请号:US10186761

    申请日:2002-07-01

    CPC classification number: H04L47/10 H04L45/04 H04L45/30 H04L45/38 H04L47/125

    Abstract: A system for, and method of, configuring border gateway selection for transit traffic flows in a computer network. In one embodiment, the system includes: (1) a border gateway modeler that builds a model of cooperating border gateways, the model including capacities of the border gateways and (2) a traffic flow optimizer, associated with the border gateway modeler, that initially assigns traffic to the border gateways in accordance with a generalized assignment problem and subsequently reassigns the traffic to the border gateways based on cost until the capacities are respected.

    Abstract translation: 一种用于在计算机网络中为过境业务流配置边界网关选择的系统和方法。 在一个实施例中,系统包括:(1)构建协作边界网关模型的边界网关建模器,该模型包括边界网关的容量,以及(2)与边界网关建模器相关联的业务流优化器, 根据广义分配问题向边界网关分配流量,随后根据成本将流量重新分配给边界网关,直到容量得到尊重。

    Document descriptor extraction method
    9.
    发明授权
    Document descriptor extraction method 有权
    文件描述提取方法

    公开(公告)号:US07080314B1

    公开(公告)日:2006-07-18

    申请号:US09595719

    申请日:2000-06-16

    CPC classification number: G06F17/2247

    Abstract: The present invention discloses a document descriptor extraction method and system. The document descriptor extraction method and system creates a document descriptor by generalizing input sequences within a document; factoring the input sequences and generalized input sequences; and selecting a document descriptor from the input sequences, generalized sequences, and factored sequences, preferably using minimum descriptor length (MDL) principles. Novel algorithms are employed to perform the generalizing, factoring, and selecting.

    Abstract translation: 本发明公开了一种文档描述符提取方法和系统。 文档描述符提取方法和系统通过对文档内的输入序列进行泛化来创建文档描述符; 分解输入序列和广义输入序列; 以及优选地使用最小描述符长度(MDL)原理从输入序列,广义序列和因子序列中选择文档描述符。 采用新颖的算法进行泛化,分解和选择。

    Method for identifying outliers in large data sets
    10.
    发明授权
    Method for identifying outliers in large data sets 失效
    识别大型数据集中异常值的方法

    公开(公告)号:US06643629B2

    公开(公告)日:2003-11-04

    申请号:US09442912

    申请日:1999-11-18

    CPC classification number: G06F17/3061 G06F2216/03 Y10S706/925

    Abstract: A new method for identifying a predetermined number of data points of interest in a large data set. The data points of interest are ranked in relation to the distance to their neighboring points. The method employs partition-based detection algorithms to partition the data points and then compute upper and lower bounds for each partition. These bounds are then used to eliminate those partitions that do contain the predetermined number of data points of interest. The data points of interest are then computed from the remaining partitions that were not eliminated. The present method eliminates a significant number of data points from consideration as the points of interest, thereby resulting in substantial savings in computational expense compared to conventional methods employed to identify such points.

    Abstract translation: 一种用于在大数据集中识别预定数量的感兴趣的数据点的新方法。 感兴趣的数据点与其相邻点的距离相关。 该方法采用基于分区的检测算法对数据点进行分区,然后计算每个分区的上限和下限。 然后使用这些边界来消除那些包含预定数量的感兴趣的数据点的那些分区。 然后从尚未消除的剩余分区计算感兴趣的数据点。 本方法从考虑中消除了大量数据点作为感兴趣的点,从而与用于识别这些点的常规方法相比,大大节省了计算费用。

Patent Agency Ranking