System and method for adaptive sentence boundary disambiguation
    11.
    发明授权
    System and method for adaptive sentence boundary disambiguation 有权
    自适应句边界消歧的系统和方法

    公开(公告)号:US08131546B1

    公开(公告)日:2012-03-06

    申请号:US11965934

    申请日:2007-12-28

    申请人: Keith Zoellner

    发明人: Keith Zoellner

    IPC分类号: G10L17/00

    摘要: Embodiments disclosed herein provide a system and method useful for pre-processing non-sentence text extracted from business documents (e.g., malformed bulleted lists, runaway sentence identification, spatially separated data, etc.). One embodiment includes two heuristic algorithms: one searches for sentences in a document and another looks for non-sentences (e.g., lists, tables, tabs, names of people, addresses, etc.) in the same document. In one embodiment, when malformed text is encountered, a particular character (e.g., “?”) is inserted to signify to a natural language processing layer that this set of “words” represent a logical construct and should be evaluated independent of other sentences. Embodiments disclosed herein allow non-sentence text, which is linguistically dry but contextually rich, be included in the natural language processing. Embodiments disclosed herein also facilitate to reduce false positive concept extraction assertions by the natural language processing layer.

    摘要翻译: 本文公开的实施例提供了一种用于预处理从业务文档提取的非句子文本(例如,格式错误的项目符号列表,失控的句子标识,空间分离的数据等)的系统和方法。 一个实施例包括两个启发式算法:一个在文档中搜索句子,另一个查找同一文档中的非句子(例如,列表,表格,选项卡,人名,地址等)。 在一个实施例中,当遇到畸形文本时,插入特定字符(例如“?”)以表示自然语言处理层,该组“单词”表示逻辑构造,并且应独立于其他句子进行评估。 本文公开的实施例允许在语言上干燥但语境丰富的非句子文本被包括在自然语言处理中。 本文公开的实施例还有助于减少自然语言处理层的假阳性概念提取声明。

    System and method for involving users in object management
    12.
    发明授权
    System and method for involving users in object management 有权
    涉及用户对象管理的系统和方法

    公开(公告)号:US07844582B1

    公开(公告)日:2010-11-30

    申请号:US11262411

    申请日:2005-10-28

    CPC分类号: G06F17/3012 G06F21/554

    摘要: Systems and methods for identifying objects in a managed storage environment with a user and involving the user with policy implementations or decisions associated with these objects are disclosed. These systems and methods may allow a single user identity for the managed storage environment to be assigned to a user and associated with a set of user identities, each of the set of user identities corresponding to the user's identity with respect to a particular domain. Previous or subsequent to establishing a user's enterprise wide identity, data and metadata may be obtained about objects residing in one or more of the domains in the enterprise as described in detail above. Objects within these domains can then be associated with a user using the set of user identities and a report generated for the user based upon these objects, including the policies associated with these objects.

    摘要翻译: 公开了一种用于使用用户识别托管存储环境中的对象并使用户涉及与这些对象相关联的策略实现或决定的系统和方法。 这些系统和方法可以允许将托管存储环境的单个用户身份分配给用户并与一组用户身份相关联,所述一组用户身份中的每一个相对于特定域对应于用户的身份。 在上面或之后建立用户的企业范围的身份,可以获得关于驻留在企业中的一个或多个域中的对象的数据和元数据,如上面详细描述的。 然后,可以使用该组用户身份和基于这些对象为用户生成的报告与这些对象相关联的对象,包括与这些对象相关联的策略。

    Method and apparatus for harvesting file system metadata
    13.
    发明授权
    Method and apparatus for harvesting file system metadata 有权
    用于收集文件系统元数据的方法和装置

    公开(公告)号:US07801894B1

    公开(公告)日:2010-09-21

    申请号:US11262283

    申请日:2005-10-28

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30091

    摘要: A harvester is disclosed for harvesting metadata of managed objects (files and directories) across file systems which are generally not interoperable in an enterprise environment. Harvested metadata may include 1) file system attributes such as size, owner, recency; 2) content-specific attributes such as the presence or absence of various keywords (or combinations of keywords) within documents as well as concepts comprised of natural language entities; 3) synthetic attributes such as mathematical checksums or hashes of file contents; and 4) high-level semantic attributes that serve to classify and categorize files and documents. The classification itself can trigger an action in compliance with a policy rule. Harvested metadata are stored in a metadata repository to facilitate the automated or semi-automated application of policies.

    摘要翻译: 披露收割机用于在企业环境中通常不能互操作的文件系统上收集托管对象(文件和目录)的元数据。 收获的元数据可能包括1)文件系统属性,如大小,所有者,新近度; 2)内容特定的属性,例如文档中各种关键字(或关键字的组合)的存在或不存在以及由自然语言实体组成的概念; 3)合成属性,如数学校验和或文件内容的散列; 和4)用于对文件和文档进行分类和分类的高级语义属性。 分类本身可以触发符合策略规则的操作。 收获的元数据存储在元数据存储库中,以便于自动或半自动应用策略。

    System and method for classifying objects
    14.
    发明授权
    System and method for classifying objects 失效
    用于分类对象的系统和方法

    公开(公告)号:US07610285B1

    公开(公告)日:2009-10-27

    申请号:US11524831

    申请日:2006-09-21

    IPC分类号: G06F17/30

    摘要: Embodiments of a classification pipeline disclosed herein have the ability to both collect data as it occurs and dynamically redact it, allowing ongoing statistics to be gathered and maintained while simultaneously constraining the total amount of storage capacity that must be dedicated to such a purpose. Various types of information can be extracted from or obtained on the object through the classification pipeline. In one embodiment, the classification pipeline comprises a plurality of layers implemented as a set of services available to network clients through a Web interface or an Applications Programming Interface (API). Each client can subscribe to one or more layers of the classification pipeline at their leisure and tailor their classification pipeline configuration through the interface. The classification pipeline can be configured to collaborate with other software to provide a consistent snapshot of the state of a network environment based on data collected at the time.

    摘要翻译: 本文公开的分类流程的实施例具有在数据发生时收集数据的能力,并且动态地对其进行修改,从而允许收集和维护正在进行的统计信息,同时限制必须专用于此目的的总存储容量。 可以通过分类管道从物体中提取或获取各种类型的信息。 在一个实施例中,分类流水线包括通过Web接口或应用编程接口(API)来实现为可用于网络客户端的一组服务的多个层。 每个客户端可以随意订阅分层管道的一层或多层,并通过接口定制其分类流水线配置。 分类流水线可以配置为与其他软件协作,以便根据当时收集的数据提供网络环境状态的一致性快照。