摘要:
An approach is provided that returns a simplified set of text to a user of a natural language processing (NLP) system with the simplified set of text having a complexity appropriate to the reading level of the user. The approach receives a word that belongs to a first natural language and retrieves a first set of complexity data pertaining to the word in the first natural language. The approach translates the word to one or more translated words, with each of the translated words corresponding to one or more second natural languages. The approach then retrieves sets of complexity data, with the sets of complexity data corresponding to a different translated word. The approach determines a complexity of the word in the first natural language based on an analysis of the first and second sets of complexity data.
摘要:
A computer-implemented method, carried out by one or more processors, for consolidating an index entry of a dictionary. In an embodiment, the method comprises the steps of receiving, by one or more processors, a set of parameters, wherein the set of parameters indicates at least a length of prefix and a length of hash value; receiving, by one or more processors, a first term for entry into an index; converting, by one or more processors, the first term for entry into the index according to the set of parameters; and responsive to determining the converted first term is not present in the index, storing, by one or more processors, the first converted term into the index.
摘要:
An approach includes a method implemented in a computer infrastructure having computer executable code tangibly embodied in a computer readable storage medium having programming instructions. The approach further includes the programming instructions configured to receive a bilingual text which comprises a first set of characters in a Latin-based language and a second set of characters in a non Latin-based language. The approach further includes the programming instructions configured to convert the second set of characters in the non Latin-based language in the bilingual text to a third set of characters in the Latin-based language based on a lookup table. The approach further includes the programming instructions configured to add a prefix character and a postfix character to each converted word in the third set of characters. The approach further includes the programming instructions configured to output an encoded representation of the bilingual text.
摘要:
An information management system creates data structures based entirely on the content of source files, then compares these data structures to discover synergies and commonalities. In one embodiment, the system accepts a first collection of source files, and extracts text from each source file. The text is compared to tags in one or more dictionaries, which comprise hierarchical listing of tags. Tags matching the text are associated with each source file. The system then generates a virtual relational network in which each source file having matching tags is a node. Tags associated with two or more source files are links between the nodes. This virtual relational network may be compared with another virtual relational network to discover common nodes or links. Source files later added to a collection are massively linked by associating all tags from all source files with the newly added source file, and vice versa.
摘要:
Nutritional information of a recipe is gathered to determine a nutritional value table of a food recipe. A computing device may extract and analyze unstructured text of a food recipe to obtain a plurality of ingredients and a quantity of the plurality of ingredients. The computing device may access dietary preferences of a user. The nutritional information of the food recipe may be calculated using the nutritional value of each of the ingredients and complied into a nutritional value table. The recipe may be determined if the recipe corresponds with the dietary preferences of the user. If a recipe does not correspond with the dietary preferences, then an ingredient causes the recipe to not correspond with the dietary preferences is removed from the recipe creating an altered recipe. The nutritional value table of the altered recipe is displayed to the user.
摘要:
An approach is provided in which a conversion manager receives a conversion request that identifies a conversion mode corresponding to a first category and a second category. The conversion manager identifies one or more first terms corresponding to the first category that are included in a page of text. As such, the conversion manager selects one or more second terms corresponding to the second category and, in turn, replaces the first terms with their corresponding second terms.
摘要:
An encoding unit encodes first encoding each of first words in a target file utilizing a first code allocation rule, each of the first words having an appearance frequency larger than an appearance frequency of a word positioned at a given ordinal rank in word frequency information, the word frequency information being information of word frequencies in a plurality of files that the target file is included, the first code allocation rule being generated from the word frequency information, and the encoding unit encodes at least a second word in the target file into a code with a first code length utilizing a second code allocation rule, the second word having appearance frequency smaller than the appearance frequency of the word positioned at the given ordinal rank in the word frequency information, the second code allocation rule being different from the first code allocation rule.
摘要:
Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the grammar constructs. Additional test cases are generated by replacing starting phrases in selected fragments with substitute phrases from dictionaries associated with the input types (a dictionary can include a false synonym for an input type for purposes of negative testing). The two generating approaches can be combined, i.e., performing one or more successive (different) grammar tree transformations to yield a sentence which is then subjected to phrase substitution.
摘要:
Abbreviations can be handled by a computer system that receives a message that specifies a recipient and a sender. A first text portion is identified from the message as being associated with an abbreviation. A sender and receiver profile are used to identify a set of one or more solutions for the first text portion. The solutions are scored based upon the online content exposure information in the profile of the recipient. Based upon the scoring, a particular solution of the one or more solutions is identified for use. The text body of the message is modified to include the particular solution, and the modified message is transmitted to the recipient.
摘要:
Abbreviations can be handled by a computer system that receives a message that specifies a recipient and a sender. A first text portion is identified from the message as being associated with an abbreviation. A sender and receiver profile are used to identify a set of one or more solutions for the first text portion. The solutions are scored based upon the online content exposure information in the profile of the recipient. Based upon the scoring, a particular solution of the one or more solutions is identified for use. The text body of the message is modified to include the particular solution, and the modified message is transmitted to the recipient.