Method and device for deduplicating web page

    公开(公告)号:US10346257B2

    公开(公告)日:2019-07-09

    申请号:US14581464

    申请日:2014-12-23

    Abstract: A method and a device is described for de-duplicating a web page. The method includes: extracting at least one core sentence from a target web page; mapping each core sentence to a unique numeric value to form a first numeric value set; determining an intersection set of the first numeric value set and each second numeric value set, and the number of numeric values included in each intersection set, and determining a maximum number of numeric values included in each intersection set; and when a ratio of the maximum number to a total number of numeric values in the first numeric value set is greater than a set threshold, processing the target web page as a duplicate web page. In embodiments of the present invention, during web page de-duplication processing, accuracy can be improved, an anti-noise capability can be enhanced, and a calculating scale can be reduced.

    METHOD AND DEVICE FOR DEDUPLICATING WEB PAGE
    5.
    发明申请
    METHOD AND DEVICE FOR DEDUPLICATING WEB PAGE 审中-公开
    用于重新排列网页的方法和装置

    公开(公告)号:US20150142760A1

    公开(公告)日:2015-05-21

    申请号:US14581464

    申请日:2014-12-23

    CPC classification number: G06F11/1453 G06F16/958

    Abstract: A method and a device is described for de-duplicating a web page. The method includes: extracting at least one core sentence from a target web page; mapping each core sentence to a unique numeric value to form a first numeric value set; determining an intersection set of the first numeric value set and each second numeric value set, and the number of numeric values included in each intersection set, and determining a maximum number of numeric values included in each intersection set; and when a ratio of the maximum number to a total number of numeric values in the first numeric value set is greater than a set threshold, processing the target web page as a duplicate web page. In embodiments of the present invention, during web page de-duplication processing, accuracy can be improved, an anti-noise capability can be enhanced, and a calculating scale can be reduced.

    Abstract translation: 描述了一种用于去重复网页的方法和设备。 该方法包括:从目标网页提取至少一个核心句子; 将每个核心句子映射到唯一的数值以形成第一个数值集合; 确定第一数值集和每个第二数值集的交集,以及包括在每个交集中的数值的数量,并确定包括在每个交集中的最大数量的数值; 并且当所述最大数量与所述第一数值集合中的数值总数的比率大于设定的阈值时,将所述目标网页处理为重复的网页。 在本发明的实施例中,在网页重复数据删除处理期间,可以提高精度,可以提高抗噪声能力,并且可以减少计算标度。

    Network Handover Method, Terminal, Controller, Gateway, and System
    6.
    发明申请
    Network Handover Method, Terminal, Controller, Gateway, and System 有权
    网络切换方法,终端,控制器,网关和系统

    公开(公告)号:US20150023321A1

    公开(公告)日:2015-01-22

    申请号:US14487583

    申请日:2014-09-16

    Abstract: A network handover method, a terminal, a controller, a gateway, and a system. The method includes: when a terminal accesses a first network, converting data including an initial TCP four-tuple into data including a first TCP four-tuple, and sending the data to a gateway; and after handing over from the first network to a second network, converting data including the initial TCP four-tuple into data including a second TCP four-tuple, and sending the data including the second TCP four-tuple to the gateway. This achieves objectives of preventing a connection from being interrupted before and after the network handover, improving use experience, and reducing complexity of a network handover process.

    Abstract translation: 网络切换方法,终端,控制器,网关和系统。 该方法包括:当终端访问第一网络时,将包括初始TCP四元组的数据转换成包括第一TCP四元组的数据,并将数据发送到网关; 并且在从第一网络切换到第二网络之后,将包括初始TCP四元组的数据转换成包括第二TCP四元组的数据,并将包括第二TCP四元组的数据发送到网关。 这实现了防止在网络切换之前和之后连接中断的目的,改善使用体验,并降低网络切换过程的复杂性。

Patent Agency Ranking