摘要:
Methods and apparatus are provided for automatically detecting spelling errors in one or more documents, such as documents being processed for the creation of a lexicon According to one aspect of the invention, a spelling error is detected in one or more documents by determining if at least one given word in the one or more documents satisfies a predefined misspelling criteria, wherein the predefined misspelling criteria comprises the at least one given word having a frequency below a predefined low threshold and the at least one given word being within a predefined edit distance of one or more other words in the one or more documents having a frequency above a predefined high threshold; and identifying a given word as a potentially misspelled word if the given word satisfies the predefined misspelling criteria.
摘要:
Methods and apparatus are provided for automatically detecting spelling errors in one or more documents, such as documents being processed for the creation of a lexicon According to one aspect of the invention, a spelling error is detected in one or more documents by determining if at least one given word in the one or more documents satisfies a predefined misspelling criteria, wherein the predefined misspelling criteria comprises the at least one given word having a frequency below a predefined low threshold and the at least one given word being within a predefined edit distance of one or mote other words in the one or more documents having a frequency above a predefined high threshold; and identifying a given word as a potentially misspelled word if the given word satisfies the predefined misspelling criteria
摘要:
Methods and apparatus are provided for performing spelling corrections using one or more variant hash tables. The spelling of at least one candidate word is corrected by obtaining at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein the variants are obtained by applying one or more of a deletion, insertion, replacement, and transposition operation on the correctly spelled words; obtaining from the candidate word one or more lookup variants using one or more of the deletion, insertion, replacement, and transposition operations; evaluating one or more of the candidate word and the lookup variants against the at least one variant dictionary hash table; and indicating a candidate correction if there is at least one match in the at least one variant dictionary hash table.
摘要:
Methods and apparatus are provided for performing spelling corrections using one or more variant hash tables. The spelling of at least one candidate word is corrected by obtaining at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein the variants are obtained by applying one or more of a deletion, insertion, replacement, and transposition operation on the correctly spelled words; obtaining from the candidate word one or more lookup variants using one or more of the deletion, insertion, replacement, and transposition operations; evaluating one or more of the candidate word and the lookup variants against the at least one variant dictionary hash table; and indicating a candidate correction if there is at least one match in the at least one variant dictionary hash table.
摘要:
Faults and errors are diagnosed from a repository of directed graphs. Subsets of all the possible questions and answers in the fault diagnosis process are encoded as directed graphs. Downloading subsets from a repository to a remote user substantially reduces the number of transmissions between the user and the repository.
摘要:
A page-ranking method includes mining a portion of content of a user workstation which is connectable to a network to detect references to pages of the network. The pages may be ranked based on the detected references.
摘要:
A very fast method for correcting the spelling of a word or phrase in a document proceeds in two steps: first applying a very fast approximate method for eliminating most candidate words from consideration (without computing the exact edit distance between the given word whose spelling is to be corrected and any candidate word), followed by a “slow method” which computes the exact edit distance between the word whose spelling is to be corrected and each of the few remaining candidate words. The combination results in a method that is almost as fast as the fast approximate method and as exact as the slow method.
摘要:
A page-ranking method includes mining a portion of content of a user workstation which is connectable to a network to detect references to pages of the network. The pages may be ranked based on the detected references.
摘要:
Faults and errors are diagnosed from a repository of directed graphs. Subsets of all the possible questions and answers in the fault diagnosis process are encoded as directed graphs. Downloading subsets from a repository to a remote user substantially reduces the number of transmissions between the user and the repository.