摘要:
A system and method for processing a document to generate a set of related documents. A system is provided that includes a textual analytics system that analyzes unstructured data contained in a source document and extracts a set of structured information about the source document; and a compare system that identifies a set of related documents by comparing the set of structured information with metadata indexed from a set of publications.
摘要:
Disclosed are apparatus and methods for ranking lines of text. In one embodiment, an intent of a query is ascertained. A relevance of each one of a plurality of lines of text of a document is determined based upon the intent of the query, content of the query, and content of each of the plurality of lines of text. The plurality of lines of text may then be ranked according to the determined relevance of each of the plurality of lines of text.
摘要:
Disclosed herein are systems and methods for identifying phrases using break points. Break points can be identified using stop words identified in content. Identified phrases can be used to generate a summary of the content.
摘要:
A document (or multiple documents) is analyzed to identify entities of interest within that document. This is accomplished by constructing n-gram or bi-gram models that correspond to different kinds of text entities, such as chemistry-related words and generic English words. The models can be constructed from training text selected to reflect a particular kind of text entity. The document is tokenized, and the tokens are run against the models to determine, for each token, which kind of text entity is most likely to be associated with that token. The entities of interest in the document can then be annotated accordingly.
摘要:
A system for improving a performance of a write process in an exemplary RAID system reduces a number of IOs required for a short write in a RAID algorithm by using a replicated-parity drive. Parity is stored on the parity portion of the disk drives. A replicated-parity drive comprises all the parity information. Parity information for each parity drive is co-located or mirrored on the replicated-parity portion of the disk drives for fast access during a read portion of the read-modify-write process. Consequently, the system accesses parity data with one seek, as opposed to P seeks in a conventional disk array system utilizing P parity drives.
摘要:
This patent application pertains to answer model comparison. One implementation can determine a first frequency at which an individual answer category appears in an individual slot on a query results page when utilizing a first model. The method can ascertain a second frequency at which the individual answer category appears in the individual slot on the query results page when utilizing a second model. The method can calibrate the second model so that the second frequency approaches the first frequency.
摘要:
A focused random walk system produces samples of on-topic pages from a collection of hyper-linked pages such as Web pages. The focused random walk system utilizes a focused random walk to produce a focused sample, which is a random sample of Web pages focused on a topic. The focused random walk system uniformly samples pages iteratively, where each iteration follows a random link from a union of the in-links and out-links of a page. The system then classifies this randomly selected link to determine whether the page is on-topic. The random walk sampling process could comprise a hard-focus method that selects only on-topic pages at each step of the focused random walk, or a soft-focus method that allows limited divergence to off-topic pages.
摘要:
Disclosed are apparatus and methods for ranking lines of text. In one embodiment, an intent of a query is ascertained. A relevance of each one of a plurality of lines of text of a document is determined based upon the intent of the query, content of the query, and content of each of the plurality of lines of text. The plurality of lines of text may then be ranked according to the determined relevance of each of the plurality of lines of text.
摘要:
A fault-tolerant system for storage arrays has constraints on the number of data from which each redundancy value is computed. The fault-tolerant system has embodiments that are supported on small array sizes to arbitrarily large array sizes, and can tolerate a large number T of failures. Certain embodiments can tolerate many instances of more than T failures. The fault-tolerant system has efficient XOR-based encoding, recovery, and updating algorithms and has simple redundancy formulas. The fault-tolerant system has improved IO seek costs for certain multiple-element sequential host updates.
摘要:
A system for improving a performance of a write process in an exemplary RAID system reduces a number of IOs required for a short write in a RAID algorithm by using a replicated-parity drive. Parity is stored on the parity portion of the disk drives. A replicated-parity drive comprises all the parity information. Parity information for each parity drive is co-located or mirrored on the replicated-parity portion of the disk drives for fast access during a read portion of the read-modify-write process. Consequently, the system accesses parity data with one seek, as opposed to P seeks in a conventional disk array system utilizing P parity drives.