-
公开(公告)号:US07222297B2
公开(公告)日:2007-05-22
申请号:US10044913
申请日:2002-01-15
IPC分类号: G06N3/00
CPC分类号: G06F17/30528 , G06F17/30011 , G06F17/30598 , G06F17/30616 , G06N5/00 , G06Q10/10 , G06Q30/02 , G06Q30/0281 , G06Q50/01 , Y10S707/99936 , Y10S707/99945
摘要: A system, method, and processor readable medium for normalizing documents using extensible markup language (XML). The system may determine a type of object repository storing at least one object. The object may include metadata. The system may then identify the object stored in the object repository. At least one portion of the one object may be extracted from the repository, wherein the portion is extracted in extensible markup language (XML) format. Preferably, some of the metadata is preserved. The metadata preserved may include at least one of author, title, subject, date created, date modified, list of modifiers, and link list information. The portion may then be transmitted to a processor. The processor may perform one or more processes on the portion. A mapping may be performed that maps at least one field in the object with a field designation identifier. The processor may include at least one of a full-text engine, a metrics engine, and a taxonomy engine.
摘要翻译: 一种用于使用可扩展标记语言(XML)对文档进行规范化的系统,方法和处理器可读介质。 系统可以确定存储至少一个对象的对象库的类型。 该对象可以包括元数据。 然后,系统可以识别存储在对象库中的对象。 可以从存储库中提取一个对象的至少一部分,其中以可扩展标记语言(XML)格式提取部分。 优选地,一些元数据被保留。 保存的元数据可以包括作者,标题,主题,创建日期,修改日期,修饰符列表和链接列表信息中的至少一个。 然后该部分可以被传送到处理器。 处理器可以在该部分上执行一个或多个进程。 可以执行用场指定标识符映射对象中的至少一个字段的映射。 处理器可以包括全文引擎,度量引擎和分类引擎中的至少一个。