摘要:
Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain. Extraction of the structured data instances is accomplished by (i) using the domain-specific concept labeler to annotate a subset of nodes of the tree instances; and (ii) using a locally adaptive concept annotator to extract the structured data instances based on the annotated segments and the local properties associated with such annotated segments. The extracted structured data instance is stored as structured output records in a database.
摘要:
A technique is described that reduces the complexity and resource consumption associated with performing record expiry in a distributed database system. In accordance with the technique, a record is checked to see if it has expired only when it has been accessed for a read or a write. If at the time of a read a record is determined to have expired, then it is not served. If at the time of a write a record is determined to have expired, then the write is treated as an insertion of a new record, and steps are taken to treat the insertion consistently with regard to the previous expired version. A background process is used to delete records that have not been written to or actively deleted by a client after expiration.
摘要:
Methods and apparatus are provided for improved schema mapping of source documents to target documents. A list of matches are generated between at least one source table and at least one target table. One or more of the matches are annotated with a logical condition providing a context in which the match applies. Matches can be annotated with a logical condition, for example, by generating a set of candidate view conditions, C, to be applied to the one or more source tables. A schema match algorithm can generate the list of matches. Candidate logical conditions can be identified, for example, by (i) creating a set of views for categorical attributes in the tables and adding a view for each partitioning of the attribute values; (ii) using a classifier built on target attribute values; or (iii) evaluating internal features of a source table.
摘要:
Radio Frequency Identification (RFID) tags are used for automatically determining the connectivity or alignment between physical components, including, for example, connectivity of network cables and device ports, as well as alignment of components assembled by automated manufacturing systems. In one embodiment of the invention, accurate determinations of the physical three-dimensional locations of cables and equipment are employed to determine which cables are plugged into which device ports of which pieces of equipment. In another embodiment of the invention, multiple RFID tags are used to determine the appropriate alignment between components being assembled by an automated manufacturing system.
摘要:
For use with a database of data records stored in a memory, a system and method for increasing a memory capacity and a memory database employing the system or the method. The system includes: (1) a time stamping controller that assigns a time stamp to transactions to be performed on the database, the time stamp operates to preserve an order of the transactions, (2) a versioning controller that creates multiple versions of ones of the data records affected by the transactions that are update transactions and (3) an aging controller, which is associated with each of the time stamping and versioning controllers, that monitors a measurable characteristic of the memory and deletes ones of the multiple versions of the ones of the data records in response to the time stamp and the measurable characteristic thereby to increase memory capacity.