Method and system for automatically merging files into a single instance store
    121.
    发明授权
    Method and system for automatically merging files into a single instance store 有权
    将文件自动合并到单个实例存储中的方法和系统

    公开(公告)号:US06389433B1

    公开(公告)日:2002-05-14

    申请号:US09354660

    申请日:1999-07-16

    IPC分类号: G06F1200

    摘要: A method and system that operates as a background process automatically identify and merge duplicate files into a single instance files, wherein the duplicate files become independent links to the single instance files. A groveler maintains a database of information about the files on a volume, including a file size and checksum (signature) based on the file contents. The groveler periodically acts in the background to scan the USN log, a log that dynamically records file system activity. New or modified files detected in the USN log are queued as work items, each work item representing a file. The volume may be scanned to add work items to the queue, which takes place initially or when there is a potential problem with the USN log. The groveler periodically removes items from the queue, calculates the signature of the corresponding file contents, and uses the signature and file size to query the database for matching files. The groveler then compares any matching files with the file corresponding to the work item for an exact duplicate, and if found, calls a single instance store facility to merge the files and create independent links to those files.

    摘要翻译: 作为后台进程运行的方法和系统自动将重复文件识别并合并到单个实例文件中,其中重复文件成为单个实例文件的独立链接。 Groveler维护关于卷上的文件的信息的数据库,包括基于文件内容的文件大小和校验和(签名)。 Groveler定期在后台执行扫描USN日志,该日志是动态记录文件系统活动的日志。 在USN日志中检测到的新的或修改的文件被排队为工作项,每个工作项表示一个文件。 可以扫描卷以将工作项目添加到队列中,这最初发生或当USN日志存在潜在问题时。 groveler定期从队列中删除项目,计算相应文件内容的签名,并使用签名和文件大小来查询数据库的匹配文件。 然后,groveler将任何匹配的文件与与工作项相对应的文件进行比较,如果找到,则调用单个实例存储工具来合并文件,并创建到这些文件的独立链接。

    On-line mining of quantitative association rules
    122.
    发明授权
    On-line mining of quantitative association rules 失效
    定量关联规则的在线挖掘

    公开(公告)号:US6092064A

    公开(公告)日:2000-07-18

    申请号:US964064

    申请日:1997-11-04

    IPC分类号: G06F19/00 G06F17/30

    摘要: A computer method of online mining of quantitative association rules consisting of two stages, a preprocessing stage followed by an online rule generation stage. The required computational effort is reduced by the pre-processing stage, defined by pre-processing data to organize the relationship between antecedent attributes to create a heirarchially arranged multidimensional indexing structure. The resulting structure facilitates the performance of the second stage, online processing, which involves the generation of quantitative association rules. The second stage, online rule generation, utilizes the multidimensional index structure created by the preprocessing stage by first finding the areas in the data which correspond to the rules and then uses a merging step to create a merged tree in order to carefully combine interesting regions in order to give a heirarchical representation of the rule set. The merged tree is then used in order to actually generate the rules.

    摘要翻译: 一种在线挖掘定量关联规则的计算机方法,包括两个阶段,一个预处理阶段,随后是在线规则生成阶段。 通过预处理阶段来减少所需的计算量,该预处理阶段通过预处理数据来定义,以组织先行属性之间的关系,以创建一个历史性地排列的多维索引结构。 所产生的结构有助于第二阶段的在线处理,其涉及产生定量关联规则的性能。 第二阶段,在线规则生成,利用由预处理阶段创建的多维索引结构,首先查找与规则相对应的数据中的区域,然后使用合并步骤创建合并树,以便仔细地组合有趣区域 命令给出规则集的历史代表性。 然后使用合并的树来实际生成规则。

    Mapping words, phrases using sequential-pattern to find user specific
trends in a text database
    123.
    发明授权
    Mapping words, phrases using sequential-pattern to find user specific trends in a text database 失效
    使用顺序模式映射单词,短语以在文本数据库中查找用户特定的趋势

    公开(公告)号:US6006223A

    公开(公告)日:1999-12-21

    申请号:US909911

    申请日:1997-08-12

    IPC分类号: G06F17/30

    摘要: A method and apparatus for mining text databases, employing sequential pattern phrase identification and shape queries, to discover trends. The method passes over a desired database using a dynamically generated shape query. Documents within the database are selected based on specific classifications and user defined partitions. Once a partition is specified, transaction IDs are assigned to the words in the text documents depending on their placement within each document. The transaction IDs encode both the position of each word within the document as well as representing sentence, paragraph, and section breaks, and are represented in one embodiment as long integers with the sentence boundaries. A maximum and minimum gap between words in the phrases and the minimum support all phrases must meet for the selected time period may be specified. A generalized sequential pattern method is used to generate those phrases in each partition that meet the minimum support threshold. The shape query engine takes the set of phrases for the partition of interest and selects those that match a given shape query. A query may take the form of requesting a trend such as "recent upwards trend", "recent spikes in usage", "downward trends", and "resurgence of usage". Once the phrases matching the shape query are found, they are presented to the user.

    摘要翻译: 一种用于挖掘文本数据库的方法和装置,采用顺序模式短语识别和形状查询来发现趋势。 该方法使用动态生成的形状查询传递所需的数据库。 基于特定分类和用户定义的分区来选择数据库中的文档。 一旦指定分区,根据每个文档中的位置,将交易ID分配给文本文档中的单词。 交易ID对文档中的每个单词的位置进行编码,并且代表句子,段落和分节符,并且在一个实施例中表示为具有句子边界的长整数。 可以指定短语中的单词和最小支持所有短语之间的最大和最小间隔必须满足所选时间段。 通用序列模式方法用于在满足最小支持阈值的每个分区中生成那些短语。 形状查询引擎获取感兴趣分区的一组短语,并选择与给定形状查询匹配的那些。 一个查询可能采取的形式是要求一个趋势,如“最近的上涨趋势”,“最近的使用高峰”,“下降趋势”和“再次使用”。 一旦找到匹配形状查询的短语,就将它们呈现给用户。

    Method and apparatus for data access and update in a shared file
environment
    124.
    发明授权
    Method and apparatus for data access and update in a shared file environment 失效
    在共享文件环境中进行数据访问和更新的方法和装置

    公开(公告)号:US5790848A

    公开(公告)日:1998-08-04

    申请号:US384706

    申请日:1995-02-03

    申请人: Scott Wlaschin

    发明人: Scott Wlaschin

    IPC分类号: G06F17/30 G06F13/00

    摘要: A distributed storage system provides a method and apparatus for storing, retrieving, and sharing data items across multiple physical storage devices that may not always be connected with one another. The distributed storage system of the present invention comprises one or more `partitions` on distinct storage devices, with each partition comprising of a group of associated data files. Partitions can be of various types. Journal partitions may be written to by a user and contain the user's updates to shared files. In the preferred embodiment, journal partitions reside on a storage device associated with a client computer in a client-server architecture. Other types of partitions, library and archive partitions, may reside on storage devices associated with a server computer in a client-server architecture. The files on the journal partitions of the various clients may, at various times, be merged into a file resident within the library partition. If two or more clients attempt to update or alter data related to the same file, the system resolves the conflict between the clients to determine which updates, if any, should be stored in the library partition. The merge operation may occur at various time intervals or be event driven. The archive partition stores files from the library partition.

    摘要翻译: 分布式存储系统提供了一种用于在跨多个物理存储设备上存储,检索和共享数据项的方法和装置,其可能不总是彼此连接。 本发明的分布式存储系统在不同的存储设备上包括一个或多个“分区”,每个分区由一组关联的数据文件组成。 分区可以是各种类型。 日记分区可能由用户写入,并包含用户对共享文件的更新。 在优选实施例中,日志分区以客户机 - 服务器架构驻留在与客户端计算机相关联的存储设备上。 其他类型的分区,库和归档分区可以驻留在与客户端 - 服务器架构中与服务器计算机相关联的存储设备上。 各种客户端的日记分区上的文件可以在不同时间被合并到库分区中驻留的文件中。 如果两个或多个客户端尝试更新或更改与同一文件相关的数据,则系统会解决客户端之间的冲突,以确定哪些更新(如果有的话)应存储在库分区中。 合并操作可以以不同的时间间隔发生或者是事件驱动的。 归档分区存储库分区中的文件。

    Use of symmetric multiprocessors for multiple hypothesis tracking
    125.
    发明授权
    Use of symmetric multiprocessors for multiple hypothesis tracking 失效
    使用对称多处理器进行多重假设跟踪

    公开(公告)号:US5765166A

    公开(公告)日:1998-06-09

    申请号:US636435

    申请日:1996-04-23

    IPC分类号: G01S13/72 G06F17/30

    摘要: A parallel processing approach for use in multiple hypothesis tracking applications that provides partitioning and load balancing to achieve greater processing efficiency. The present invention comprises a plurality of processors that are each coupled to a shared memory, and which communicate to a central database stored in the shared memory. The central database is organized as a collection of radar tracks. Radar data is supplied to the processors as an input data stream organized in terms of radar tracks. The parallel processors are configured so that the next available processor retrieves the next successive measurement data point from the input data stream, updates tracks in the database using each retrieved measurement data point, wherein all processors operate independently without external synchronization, partitions the database into noninteracting clusters, wherein partitioning is executed in parallel by the plurality of processors which operate independently without external synchronization, retrieves the next successive cluster, forms and selects hypotheses based on the retrieved cluster, and updates the database based on the selected hypotheses. The present invention achieves an efficient implementation of multiple hypothesis tracking to provide for real-time multiprocessing. Parallelization of non-interactive and interactive multiple hypothesis tracking functions is readily achieved using the present invention. A parallel processing method for use in multiple hypothesis tracking applications is also disclosed.

    摘要翻译: 用于多个假设跟踪应用程序的并行处理方法,可提供分区和负载平衡,以实现更高的处理效率。 本发明包括多个处理器,每个处理器都耦合到共享存储器,并且与存储在共享存储器中的中央数据库通信。 中央数据库被组织为雷达轨道的集合。 雷达数据作为以雷达轨道组织的输入数据流提供给处理器。 并行处理器被配置为使得下一个可用处理器从输入数据流检索下一个连续的测量数据点,使用每个检索到的测量数据点更新数据库中的轨迹,其中所有处理器独立地操作而无需外部同步,将数据库分割为非交互 集群,其中通过独立运行的多个处理器并行执行分区而不进行外部同步,检索下一个连续的集群,基于所检索的集群来形成和选择假设,并且基于所选择的假设更新数据库。 本发明实现了多重假设跟踪的有效实现以提供实时多处理。 使用本发明容易实现非交互式和交互式多重假设跟踪功能的并行化。 还公开了一种用于多个假设跟踪应用的并行处理方法。