摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.
摘要:
Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.