Method and System for Form-Filling Crawl and Associating Rich Keywords
    1.
    发明申请
    Method and System for Form-Filling Crawl and Associating Rich Keywords 有权
    填写查询和关联丰富关键字的方法和系统

    公开(公告)号:US20110087646A1

    公开(公告)日:2011-04-14

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供技术用于从通常通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的定位的网页获得的本地产品信息的有效定位,处理和检索。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    Method and system for form-filling crawl and associating rich keywords
    2.
    发明授权
    Method and system for form-filling crawl and associating rich keywords 有权
    表单填充方法和系统抓取和关联丰富的关键字

    公开(公告)号:US08793239B2

    公开(公告)日:2014-07-29

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供了技术,用于有效地定位,处理和检索从通常可通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的网页获得的本地产品信息。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    Extracting rich temporal context for business entities and events
    3.
    发明授权
    Extracting rich temporal context for business entities and events 有权
    为业务实体和事件提取丰富的时间背景

    公开(公告)号:US08606564B2

    公开(公告)日:2013-12-10

    申请号:US12917389

    申请日:2010-11-01

    IPC分类号: G06F17/27 G06F17/30

    摘要: Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.

    摘要翻译: 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。

    EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS
    4.
    发明申请
    EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS 有权
    为商业实体和活动提供丰富的时间背景

    公开(公告)号:US20120109637A1

    公开(公告)日:2012-05-03

    申请号:US12917389

    申请日:2010-11-01

    IPC分类号: G06F17/27 G06F17/30

    摘要: Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.

    摘要翻译: 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。