摘要:
Techniques are presented to determine user-interest sensitive condensations of a passage. One or more passages are selected and user interest information, condensation transformations and optional meaning distortion constraints are identified. The foci of user interest within the selected passages are determined based the similarity of the elements in the selected passages to elements in the user interest information. The condensation transformations are applied to the selected passages to preferentially retain user foci while eliding less salient information. The resultant condensate provides signals the user-interest sensitive meaning of the passage. Meaning distortions constraints are optionally applied in conjunction with the condensation transformations or in creating the condensation transformations to reduce the likelihood of distorting the meaning of the passage.
摘要:
Techniques are provided to construct and use user-interest sensitive indicators of search results. A set of documents is determined based on one or more search terms. Passages within each selected document are identified based on the search terms. Condensation transformations applied to the passages to preferentially retain elements of the passage based on the search terms and user interest information. The resultant indicator is provides a user-interest sensitive signal of the meaning of the passage.
摘要:
Techniques are provided to construct and use user-interest sensitive indicators of search results. A set of documents is determined based on one or more search terms. Passages within each selected document are identified based on the search terms. Condensation transformations applied to the passages to preferentially retain elements of the passage based on the search terms and user interest information. The resultant indicator is provides a user-interest sensitive signal of the meaning of the passage.
摘要:
A top-down technique for character text recognition of an image comprises a left-to-right analysis of each image line. A current image portion is selected. Possible text prefixes are selected from a dictionary. The upper and lower text contours of the text prefixes are compared with a bitmap of the current image portion. A distance value is generated, indicating the quality of the comparison. The prefixes are then added to an agenda of prefixes. Based on the distance value, corresponding to the similarity of the upper shapes and lower shapes of the possible prefix to the bitmap of the image portion, a list of the text prefixes generating the best distance values is selected from the agenda. From the selected list, a new list of extended text prefixes is obtained from the dictionary and added to the agenda. The process is repeated until the current image portion ends. At this point, the possible text prefix having the best total distance value is selected as the list of text characters corresponding to the image portion. The total distance value is the sum of all of the distance values of the text characters forming the text prefix. Possible text words are selected from the agenda based on beam searching techniques against either a threshold or by limiting the number of possible text prefixes selected to a predetermined number of the currently most probable text prefixes.
摘要:
A data structure for use in hyphenation is created by including hyphen codes at the acceptable hyphenation points of words and then collapsing the words into a minimal state determinized FSM data structure. The transitions of the data structure are sorted so that a hyphen code that has alternatives is positioned before its alternatives. The data structure is then encoded for compactness. In searching with a word, if a mismatch occurs in the branch of the data structure that depends from a hyphen code, the search continues with its alternatives, because a match could be found in a branch depending from one of the alternatives. The data structure may be accessed with a hyphenated word to check hyphenation or spelling. It may be accessed with an unhyphenated word to retrieve its hyphenation points. It may be accessed with a number corresponding to a word to retrieve that word with its hyphenation points. Retrieved hyphenation points may be used in selecting where to hyphenate a word that has more than one hyphenation point, as in justification of text.
摘要:
A data storage medium stores string data that can be used in character recognition and instructions for accessing the string data. The string data includes data units that can be accessed by a processor in executing the instructions. The processor can use character data indicating characters of a string to access a sequence of the data units that ends with an ending subsequence. The ending subsequence includes acceptance information indicating whether a string whose sequence of data units ends with the ending subsequence is an acceptable string. If so, the ending subsequence also includes category set information indicating a set of categories for strings whose sequences end with the ending subsequence. The categories can include words, numbers, compound words, and so forth. The acceptance information can include a bit in a character label data unit that includes information indicating the character type of an ending character. The acceptance information can also include an acceptance data unit whose value indicates an acceptable string ending. The acceptance data unit can be followed by category data units, each with a value indicating a category. The category data units can be used to obtain a bit vector for a string, each bit of which indicates whether the string is in one of the categories. For compactness, all or part of an ending subsequence can be shared by plural acceptable strings. Looping can be used to represent a category with a potentially infinite number of strings, such as numbers.
摘要:
Unification of a disjunctive system is performed based on context identifiers within data structures that correspond to disjunctions. Each context identifier is a logical combination of choices, with each choice identifying one of the disjuncts of a disjunction in the system. Each choice can include a disjunction identifier and a choice identifier identifying one of the disjuncts of the identified disjunction. The logical combination of choices in a context identifier thus corresponds to a combination of disjuncts, all of which could be from different disjunctions. If two data units have context identifiers identifying contexts that are genuine alternatives, those data units are not unified. Data units that have context identifiers that are not genuine alternatives are unified. A set of context-value pairs, referred to as a disjunctive value, can be unified with another disjunctive value by considering all combinations of pairs of context identifiers that include one context identifier from each disjunctive value. The number of combinations of context identifiers in each disjunctive value is reduced by combining context-value pairs: Pairs with equal value tokens are combined by merging their context identifiers and unifying the value tokens. Pairs with f-structures as values are combined by merging context identifiers and unifying the f-structures. If it is necessary to insert a pointer, the pointer is inserted so that it initially leads to a disjunctive value, with the source of the pointer indicating which of the context-value pairs in the disjunctive value is to be accessed.
摘要:
The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel finite state transducer as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database.
摘要:
An FSM data structure is encoded by generating a transition unit of data corresponding to each transition which leads ultimately to a final state of the FSM. Information about the states is included in the transition units, so that the encoded data structure can be written without state units of data. The incoming transition units to a final state each contain an indication of finality. The incoming transition units to a state which has no outgoing transition units each contain a branch ending indication. The outgoing transition units of each state are ordered into a comparison sequence for comparison with a received element, and all but the last outgoing transition unit contain an alternative indication of a subsequent alternative outgoing transition. The indications are incorporated with the label of each transition unit into a single byte, and the remaining byte values are allocated among a number of pointer data units, some of which begin full length pointers and some of which begin pointer indexes to tables where pointers are entered. The pointers may be used where a state has a large number of incoming transitions or where the block of transition units depending from a state is broken down to speed access. The first outgoing transition unit of a state is positioned immediately after one of the incoming transitions so that it may be found without a pointer. Each alternative outgoing transition unit is stored immediately after the block beginning with the previous outgoing transition unit so that it may be found by proceeding through the transition units until the number of alternative bits and the number of branch ending bits balance.
摘要:
A lightweight compact, portable, advertising display apparatus provides an attractive, rotatable, miniature billboard assembly for displaying advertising material to viewing areas about the advertising display apparatus. The advertising display apparatus includes a plastic shaft with a convenient handle and a special coupling head which fits into corresponding keyholes in the container assembly to provide for easy assembly and disassembly of the unit when inserting and removing different advertising material.