摘要:
A system and method is described for large scale entity-specific classification of each entity-specific set of candidates in a collection of candidates for each specific entity in a collection of entities. The collection of entities may comprise a specific category or domain of entities (e.g. schools, restaurants, manufacturers, products, events, people). Candidates may comprise webpages or other resources with resource identifiers. Entity specific sets of candidates may be found by leveraging search engine query results and user interaction therewith for queries based on entity-specific attributes. The relationship(s) or class(es) for which candidate resources are being classified relative to a specific entity may comprise an authoritative, official home page (OHP), or other class (e.g. fan page, review, aggregator) relative to a specific entity. A feature generator generates entity-specific features for candidates. In accordance with its features, one or more classifiers rank each candidate for a specific class for a specific entity.
摘要:
A system is disclosed for reconciling opinions generated by agents with respect to one or more predicates. The disclosed system may use observed variables and a probabilistic model including latent parameters to estimate a truth score associated with each of the predicates. The truth score, as well as one or more of the latent parameters of the probabilistic model, may be estimated based on the observed variables. The truth score generated by the disclosed system may enable publishers to reliably represent the truth of a predicate to interested users.
摘要:
A method of detecting and recovering from data corruption of a database is characterized by the step of protecting data of the database with codewords, one codeword for each region of the database; and verifying that a codeword matches associated data before the data is read from the database to prevent transaction-carried corruption. A deferred maintenance scheme is recommended for the codewords protecting the database such that the method of detecting and recovering from data corruption of a database may comprise the steps of protecting data of the database with codewords, one codeword for each region of the database; and asynchronously maintaining the codewords to improve concurrency of the database. Moreover, the database may be audited by using the codewords and noting them in a table and protecting regions of the database with latches. Once codeword values are computed and checked against noted values in memory, a flush can cause codewords from outstanding log records to be applied to the stored codeword table.
摘要:
For use with a database of data records organized into components, the database stored in a memory, a processing system for, and method of, physically versioning the database. In one embodiment, the processing system includes: (1) a component copier that creates a physical copy of an original component to be affected by an update transaction to be applied to the database, and that causes pointers in nodes of the physical copy to point to other nodes in the physical copy, (2) a data updater, associated with the component copier, that applies the update transaction to the physical copy to create therefrom a new physical version, the original component remaining unaffected by the update transaction and (3) a pointer updater, associated with the data updated, that employs an atomic word write to revise a component pointer, associated with the database, to cause the pointer to point to the new physical version.
摘要:
Methods and apparatus are provided for inferring regular expressions that parse and extract information from line-oriented data. A regular expression is generated that matches a line of text by: evaluating a plurality of characters of the line of text to identify one or more domains associated with each of the plurality of characters; assigning a run-length to each of the identified domains; populating a data structure having a data position corresponding to each of the characters with the identified domains and corresponding run-lengths; and generating the regular expression based on the data structure.
摘要:
A repeatable cryptographic key is generated based on varying parameters which represent physical measurements. Locations within a share table, which locations store valid and invalid cryptographic shares, are identified as a function of received varying parameters. The share table is configured such that locations which are expected to be identified by legitimate access attempts contain valid cryptographic shares, and locations which are not expected to be identified by legitimate access attempts contain invalid cryptographic shares. The share table configuration may be modified based on prior history of legitimate access attempts. In various embodiments, the stored shares may be encrypted or compressed. A keystroke feature authentication embodiment uses the inventive techniques to implement an authentication system which authenticates based on an entered password and the manner in which (e.g. keystroke dynamics) the keystroke is entered. Another embodiment uses the inventive techniques to protect sensitive database information which is accessible using DNA measurements.
摘要:
A method of detecting and recovering from data corruption of a database is characterized by the step of logging information about reads of a database in memory to detect errors in data of the database, wherein said errors in data of said database arise from one of bad writes of data to the database, of erroneous input of data to the database by users and of logical errors in code of a transaction. The read logging method may be implemented in a plurality of database recovery models including a cache-recovery model, a prior state model a redo-transaction model and a delete transaction model. In the delete transaction model, it is assumed that logical information is not available to allow a redo of transactions after a possible error and the effects of transactions that read corrupted data are deleted from history and any data written by a transaction reading Ararat data is treated as corrupted.
摘要:
For use with a central database associated with a server of a network, the central database having distributed counterparts stored in volatile memories of clients of the network to allow operations to be performed locally thereon, the central database further having multiple checkpoints and a stable log stored in the server for tracking operations on the central database to allow corresponding operations to be made to the multiple checkpoints, the stable log having tails stored in the volatile memories to track operations on corresponding ones of the distributed counterparts, the distributed counterparts to corruption, a system for, and method of, restoring a distributed counterpart stored in one of the volatile memories. The system includes: (1) a checkpoint determination controller that determines which of the multiple checkpoints is a most recently completed checkpoint and copies the most recently completed checkpoint to the one of the volatile memories to serve as an unrevised database for reconstructing the distributed counterpart and (2) an operation application controller that retrieves selected ones of the operations from the stable log and a tail corresponding to the distributed counterpart and applies the operations to the unrevised database thereby to restore the distributed counterpart.
摘要:
Methods and apparatus are provided for identifying constraint violation repairs in data that is comprised of a plurality of records, where each record has a plurality of cells. A database is processed, based on a plurality of constraints that data in the database must satisfy. At least one constraint violation to be resolved is identified based on a cost of repair and the corresponding records to be resolved and equivalent cells are identified in the data that violate the identified at least one constraint violation. A value for each of the equivalent cells can optionally be determined, and the determined value can be assigned to each of the equivalent cells. The at least one constraint violation selected for resolution may be, for example, the constraint violation with a lowest cost. The cost of repairing a constraint is based on a distance metric between the attributes values.
摘要:
Methods and apparatus are provided for incremental update of an XML tree defined from a recursive XML view of a relational database. A method comprises the steps of detecting at least one change to the relational database; providing one or more queries to the relational database to map the change to the relational database into changes to the XML tree, wherein at least one component of a definition of the one or more queries is executed a plurality of times in traversing a path through the XML tree; and applying the mapped change to the XML tree. A bud-cut method and a reduction approach are presented.