摘要:
Described within are systems and methods for disambiguating entities, by generating entity profiles and extracting information from multiple documents to generate a set of entity profiles, determining equivalence within the set of entity profiles using similarity matching algorithms, and integrating the information in the correlated entity profiles. Additionally, described within are systems and methods for representing entities in a document in a Resource Description Framework and leveraging the features to determine the similarity between a plurality of entities. An entity may include a person, place, location, or other entity type.
摘要:
Described within are systems and methods for transliterating and translating source non-Romanized language text strings from a plurality of electronic sources to Romanized target language text strings by converting the source non-Romanized language text strings to a standard document encoding format, splitting the source non-Romanized language text strings into smaller units, transforming the smaller units into entity profiles, processing the entities profiles with data from external databases, translating the entities in the entity profiles into a Romanized target language, and outputting the entities into a plurality of data formats for external systems.