Extraction of Content from a Web Page
    1.
    发明申请
    Extraction of Content from a Web Page 审中-公开
    从网页提取内容

    公开(公告)号:US20130283148A1

    公开(公告)日:2013-10-24

    申请号:US13817656

    申请日:2010-10-26

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2247 G06F16/986

    摘要: A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity-grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content.

    摘要翻译: 提供了一种用于从网页提取主要内容的系统和方法。 在网页上执行网页分割以提供关联分组的段。 计算至少一个亲和力分组段的描述性特征。 基于所计算的描述特征,至少一个亲和度分组的段被分类为主体段。 基于所计算的描述特征,附加的亲和组合段被分类为文档功能。 分类的亲和度分组段根据其分类的文档功能进行组装以提供主要内容。

    Segmenting a Web Page into Coherent Functional Blocks
    2.
    发明申请
    Segmenting a Web Page into Coherent Functional Blocks 审中-公开
    将网页分割成相干功能块

    公开(公告)号:US20130275854A1

    公开(公告)日:2013-10-17

    申请号:US13635410

    申请日:2010-04-19

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2247 G06F17/2705

    摘要: Segmenting a web page (110) into coherent function blocks (705-1 to 705-8) includes parsing content from the web page (110) into multiple coherent, collectively exhaustive nodes (405-1 to 405-37); calculating at least one matrix (500, 600, 605-1 to 605-4) of affinity values between each of the nodes (405-1 to 405-37); and clustering the nodes (405-1 to 405-37) into functional blocks (705-1 to 705-8) based on the affinity values in the at least one matrix (500, 600, 605-1 to 605-4).

    摘要翻译: 将网页(110)分段成相干功能块(705-1至705-8)包括将来自网页(110)的内容解析为多个相干,共同穷举的节点(405-1至405-37); 计算每个节点(405-1至405-37)之间的亲和度值的至少一个矩阵(500,600,605-1至605-4); 以及基于所述至少一个矩阵(500,600,605-1至605-4)中的所述亲和度值将所述节点(405-1至405-37)聚类成功能块(705-1至705-8)。

    Obtaining Rendering Co-ordinates Of Visible Text Elements
    4.
    发明申请
    Obtaining Rendering Co-ordinates Of Visible Text Elements 审中-公开
    获取可见文本元素的渲染坐标

    公开(公告)号:US20130159889A1

    公开(公告)日:2013-06-20

    申请号:US13808856

    申请日:2010-07-07

    IPC分类号: G06F3/0481

    摘要: A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps: a) using a computer device, wrapping each of the plurality of text nodes in a pair of mark-up language tags; b) using said computer device, obtaining the co-ordinates of a bounding rectangle for each text node using the mark-up language tags; c) using said computer device, attaching an attribute specifying the co-ordinates of the bounding rectangle to each text node; and d) using said computer device, determining whether each text node is invisible, and if it is, excluding it from an output data structure comprising the plurality of text nodes and attached attributes.

    摘要翻译: 公开了一种用于获得网页上的可视文本元素的渲染坐标的计算机实现的方法。 网页由包括多个文本节点的输入数据结构表示,每个文本节点表示网页上的文本元素。 该方法包括以下步骤:a)使用计算机设备,将多个文本节点中的每一个包裹在一对标记语言标签中; b)使用所述计算机设备,使用所述标记语言标签获得每个文本节点的边界矩形的坐标; c)使用所述计算机设备,将指定所述边界矩形的坐标的属性附加到每个文本节点; 以及d)使用所述计算机设备,确定每个文本节点是否不可见,并且如果是,则将其从包括所述多个文本节点和附加属性的输出数据结构中排除。

    Selecting web page content based on user permission for collecting user-selected content
    5.
    发明授权
    Selecting web page content based on user permission for collecting user-selected content 有权
    根据用户收集用户选择内容的用户权限选择网页内容

    公开(公告)号:US09448695B2

    公开(公告)日:2016-09-20

    申请号:US13817725

    申请日:2010-12-14

    IPC分类号: G06F3/0484 G06Q30/02

    CPC分类号: G06F3/0484 G06Q30/0269

    摘要: A method, system, and computer program product for selecting web page content based on user permission for collecting user-selected content within web pages (FIG. 4, 400) may comprise accessing web page data associated with a currently viewed web page (FIG. 4, 400), the web page data comprising a popular selection of content on the currently viewed web page (FIG. 4, 408) (505), with an electronic client device, presenting the popular selection of content of the currently viewed web page (FIG. 4, 400) to a user (535), and prompting the user to agree to the use of the user's selected content within a number of web pages in exchange for use of the popular selection of content on the web page (FIG. 4, 400). The web page content is selected, based on the user's response.

    摘要翻译: 用于基于在网页内收集用户选择的内容的用户许可来选择网页内容的方法,系统和计算机程序产品(图4,400)可以包括访问与当前浏览的网页相关联的网页数据(图4)。 4,400),网页数据包括当前浏览的网页(图4,408)(505)上的流行的内容选择与电子客户端设备,呈现当前浏览网页的内容的流行选择 (图4,400)发送给用户(535),并且提示用户同意在多个网页内使用用户选择的内容,以换取在网页上使用流行的内容选择(图4 4,400)。 基于用户的响应选择网页内容。

    Method and system for providing print content to a client
    6.
    发明授权
    Method and system for providing print content to a client 有权
    用于向客户端提供打印内容的方法和系统

    公开(公告)号:US09152357B2

    公开(公告)日:2015-10-06

    申请号:US13032824

    申请日:2011-02-23

    IPC分类号: G06F3/12

    摘要: A request for print content is received at a network server system. The request includes variable user input. Webpage content is obtained based at least in part on the variable user input. A subset of the webpage content is identified as print content. A print-ready layout of the print content is formed and the print content in the print-ready layout is provided, via network connection, to a client in response to the request.

    摘要翻译: 在网络服务器系统处接收到对打印内容的请求。 该请求包括可变用户输入。 至少部分基于可变用户输入获得网页内容。 网页内容的一部分被标识为打印内容。 形成打印内容的打印准备布局,并且响应于该请求,通过网络连接向客户端提供打印就绪布局中的打印内容。

    Creating Applications for Popular Web Page Content
    7.
    发明申请
    Creating Applications for Popular Web Page Content 审中-公开
    创建热门网页内容的应用程序

    公开(公告)号:US20130275859A1

    公开(公告)日:2013-10-17

    申请号:US13817731

    申请日:2010-12-14

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2247 G06Q30/02

    摘要: A method of creating an application for the popular selection of content on a web page (FIG. 4, 400) may comprise collecting web page data associated with a web page (FIG. 4, 400), the web page data comprising a selection of content on the web page (FIG. 4, 400) (Block 505), with a processor, determining among the selection of content of the web page, which content is popular (Block 510), and creating an application based on the popular selection of content of the web page (Block 515).

    摘要翻译: 创建用于在网页上流行的内容选择的应用的方法(图4,400)可以包括收集与网页相关联的网页数据(图4,400),所述网页数据包括选择 网页上的内容(图4,400)(框505)(方框505),与处理器,在网页的内容的选择中确定哪个内容是流行的(框510),以及基于流行的选择来创建应用 的网页内容(框515)。

    Selecting Web Page Content Based on User Permission for Collecting User-Selected Content
    8.
    发明申请
    Selecting Web Page Content Based on User Permission for Collecting User-Selected Content 有权
    基于收集用户选择内容的用户权限选择网页内容

    公开(公告)号:US20130275889A1

    公开(公告)日:2013-10-17

    申请号:US13817725

    申请日:2010-12-14

    IPC分类号: G06F3/0484

    CPC分类号: G06F3/0484 G06Q30/0269

    摘要: A method, system, and computer program product for selecting web page content based on user permission for collecting user-selected content within web pages (FIG. 4, 400) may comprise accessing web page data associated with a currently viewed web page (FIG. 4, 400), the web page data comprising a popular selection of content on the currently viewed web page (FIG. 4, 408) (505), with an electronic client device, presenting the popular selection of content of the currently viewed web page (FIG. 4, 400) to a user (535), and prompting the user to agree to the use of the user's selected content within a number of web pages in exchange for use of the popular selection of content on the web page (FIG. 4, 400). The web page content is selected, based on the user's response.

    摘要翻译: 用于基于在网页内收集用户选择的内容的用户许可来选择网页内容的方法,系统和计算机程序产品(图4,400)可以包括访问与当前浏览的网页相关联的网页数据(图4)。 4,400),网页数据包括当前浏览的网页(图4,408)(505)上的流行的内容选择与电子客户端设备,呈现当前浏览网页的内容的流行选择 (图4,400)发送给用户(535),并且提示用户同意在多个网页内使用用户选择的内容,以换取在网页上使用流行的内容选择(图4 4,400)。 基于用户的响应选择网页内容。

    METHOD AND SYSTEM FOR PROVIDING PRINT CONTENT TO A CLIENT
    9.
    发明申请
    METHOD AND SYSTEM FOR PROVIDING PRINT CONTENT TO A CLIENT 有权
    向客户提供打印内容的方法和系统

    公开(公告)号:US20120212772A1

    公开(公告)日:2012-08-23

    申请号:US13032824

    申请日:2011-02-23

    IPC分类号: G06F3/12

    摘要: A request for print content is received at a network server system. The request includes variable user input. Webpage content is obtained based at least in part on the variable user input. A subset of the webpage content is identified as print content. A print-ready layout of the print content is formed and the print content in the print-ready layout is provided, via network connection, to a client in response to the request.

    摘要翻译: 在网络服务器系统处接收到对打印内容的请求。 该请求包括可变用户输入。 至少部分基于可变用户输入获得网页内容。 网页内容的一部分被标识为打印内容。 形成打印内容的打印准备布局,并且响应于该请求,通过网络连接向客户端提供打印就绪布局中的打印内容。

    Transformation of a Document into Interactive Media Content
    10.
    发明申请
    Transformation of a Document into Interactive Media Content 审中-公开
    将文档转换为互动媒体内容

    公开(公告)号:US20130205202A1

    公开(公告)日:2013-08-08

    申请号:US13817643

    申请日:2011-07-31

    IPC分类号: G06F17/22

    CPC分类号: G06F17/2264 G06F17/218

    摘要: Systems and methods are provided for transforming a document into interactive media content. A system can include a memory for storing computer executable instructions and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions can include an engine to generate a dynamic composition of the text blocks and visual blocks of the document, based on semantic features of the text blocks and the visual blocks, to provide the interactive media content.

    摘要翻译: 提供了将文档转换为交互式媒体内容的系统和方法。 系统可以包括用于存储计算机可执行指令的存储器和用于访问存储器并执行计算机可执行指令的处理单元。 计算机可执行指令可以包括基于文本块和视觉块的语义特征来生成文档的文本块和可视块的动态组合的引擎,以提供交互式媒体内容。