Method and system for form-filling crawl and associating rich keywords
    4.
    发明授权
    Method and system for form-filling crawl and associating rich keywords 有权
    表单填充方法和系统抓取和关联丰富的关键字

    公开(公告)号:US08793239B2

    公开(公告)日:2014-07-29

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供了技术,用于有效地定位,处理和检索从通常可通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的网页获得的本地产品信息。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    Method and System for Form-Filling Crawl and Associating Rich Keywords
    5.
    发明申请
    Method and System for Form-Filling Crawl and Associating Rich Keywords 有权
    填写查询和关联丰富关键字的方法和系统

    公开(公告)号:US20110087646A1

    公开(公告)日:2011-04-14

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供技术用于从通常通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的定位的网页获得的本地产品信息的有效定位,处理和检索。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    SELECTIVELY ADDING SOCIAL DIMENSION TO WEB SEARCHES
    6.
    发明申请
    SELECTIVELY ADDING SOCIAL DIMENSION TO WEB SEARCHES 有权
    选择性地增加网络搜索的社会尺寸

    公开(公告)号:US20110264648A1

    公开(公告)日:2011-10-27

    申请号:US12764818

    申请日:2010-04-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: Embodiments are directed towards managing a display of search results by employing a query-classification for a search query to selectively display trust search results that are displayed distinct from non-trust search results. A search query is classified into a query-class. A search is then performed over non-trust sources, and selectively over trust data sources to obtain non-trust and trust search results, respectively. The trust search results are rank ordered based on various categories of search criteria, including, for example, explicit and implicit relationships. Based on the query-class, a different number of trust search results may be displayed. Further, a position for which the trust search results may be displayed may be based on the query-class. Moreover, the non-trust search results displayed distinct or separate from the trust search results to readily distinguish a type of source of the search results.

    摘要翻译: 实施例旨在通过对搜索查询采用查询分类来选择性地显示与非信任搜索结果不同的显示信任搜索结果来管理搜索结果的显示。 搜索查询分为查询类。 然后,通过非信任源执行搜索,并选择性地超过信任数据源,以分别获取非信任和信任搜索结果。 信任搜索结果基于各种类别的搜索标准进行排序,包括例如明确和隐含的关系。 基于查询类,可以显示不同数量的信任搜索结果。 此外,可以显示信任搜索结果的位置可以基于查询类。 此外,非信任搜索结果与信任搜索结果不同或不同,以便容易地区分搜索结果的来源类型。

    APPARATUS AND METHODS FOR OPERATOR TRAINING IN INFORMATION EXTRACTION
    7.
    发明申请
    APPARATUS AND METHODS FOR OPERATOR TRAINING IN INFORMATION EXTRACTION 有权
    信息提取中操作员培训的装置和方法

    公开(公告)号:US20100227301A1

    公开(公告)日:2010-09-09

    申请号:US12398126

    申请日:2009-03-04

    IPC分类号: G09B19/00

    CPC分类号: G09B19/00

    摘要: Disclosed are methods and apparatus for extracting information from one or more documents. A training and execution plan is received, and such plan specifies invocation of a trainer operator for initiating training of a trainee operator based on a set of training documents so as to generate a new trained operator that is to then be invoked so as to extract information from one or more unknown documents. The trainee operator is configured to extract information from one or more unknown documents, and each training document is associated with classified information. After receipt of the training and execution plan, the trainer operator is automatically executed to train the trainee operator based on the specified training documents so as to generate a new trained operator for extracting information from documents. The new trained operator is a new version of the trainee operator. After receipt of the training and execution plan, both the trainee operator are automatically retained for later use in extracting information from one or more unknown documents and the new trained operator for later use in extracting information from one or more unknown documents. After receipt of the training and execution plan, the new trained operator is automatically executed on one or more unknown documents so as to extract information from such one or more unknown documents.

    摘要翻译: 公开了用于从一个或多个文档中提取信息的方法和装置。 接收到训练和执行计划,并且该计划规定了基于一组训练文件来引导训练者操作员启动对训练操作员的训练,以便产生一个新的经过训练的操作者,然后被调用以便提取信息 来自一个或多个未知文件。 受训操作员被配置为从一个或多个未知文档中提取信息,并且每个训练文档与分类信息相关联。 在收到培训和执行计划后,培训师操作员将根据指定的培训文件自动执行培训受训操作员,以便生成一个新的训练有素的操作员,从文档中提取信息。 新受过训练的操作员是受训操作员的新版本。 在接收到训练和执行计划之后,训练者操作员将被自动保留以便以后用于从一个或多个未知文件中提取信息,并且新训练的操作者用于随后用于从一个或多个未知文档中提取信息。 在接收到训练和执行计划之后,新的受过训练的操作者被自动执行一个或多个未知文件,以从这样的一个或多个未知文件中提取信息。

    Selectively adding social dimension to web searches
    8.
    发明授权
    Selectively adding social dimension to web searches 有权
    选择性地将社交维度添加到网络搜索

    公开(公告)号:US08880520B2

    公开(公告)日:2014-11-04

    申请号:US12764818

    申请日:2010-04-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: Embodiments are directed towards managing a display of search results by employing a query-classification for a search query to selectively display trust search results that are displayed distinct from non-trust search results. A search query is classified into a query-class. A search is then performed over non-trust sources, and selectively over trust data sources to obtain non-trust and trust search results, respectively. The trust search results are rank ordered based on various categories of search criteria, including, for example, explicit and implicit relationships. Based on the query-class, a different number of trust search results may be displayed. Further, a position for which the trust search results may be displayed may be based on the query-class. Moreover, the non-trust search results displayed distinct or separate from the trust search results to readily distinguish a type of source of the search results.

    摘要翻译: 实施例旨在通过对搜索查询采用查询分类来选择性地显示与非信任搜索结果不同的显示信任搜索结果来管理搜索结果的显示。 搜索查询分为查询类。 然后,通过非信任源执行搜索,并选择性地超过信任数据源,以分别获取非信任和信任搜索结果。 信任搜索结果基于各种类别的搜索标准进行排序,包括例如明确和隐含的关系。 基于查询类,可以显示不同数量的信任搜索结果。 此外,可以显示信任搜索结果的位置可以基于查询类。 此外,非信任搜索结果与信任搜索结果不同或不同,以便容易地区分搜索结果的来源类型。

    Apparatus and methods for operator training in information extraction
    9.
    发明授权
    Apparatus and methods for operator training in information extraction 有权
    信息提取操作员训练的装置和方法

    公开(公告)号:US08412652B2

    公开(公告)日:2013-04-02

    申请号:US12398126

    申请日:2009-03-04

    IPC分类号: G06F15/18

    CPC分类号: G09B19/00

    摘要: After receipt of a training and execution plan, a trainer operator is automatically trained based on specified training documents so as to generate a new trained operator for extracting information from documents. The new trained operator is a new version of the trainee operator. Both trainee operators are automatically retained for later use in extracting information from one or more unknown documents. After receipt of the training and execution plan, the new trained operator is automatically executed on one or more unknown documents so as to extract information from such one or more unknown documents.

    摘要翻译: 在接收到训练和执行计划之后,训练员操作员将根据指定的培训文件自动进行培训,以便生成一个新的训练有素的操作员,从文档中提取信息。 新受过训练的操作员是受训操作员的新版本。 两名学员操作员都会自动保留以供以后使用,从一个或多个未知文件中提取信息。 在接收到训练和执行计划之后,新的受过训练的操作者被自动执行一个或多个未知文件,以从这样的一个或多个未知文件中提取信息。