Method for analyzing data and performing lexical analysis
    1.
    发明授权
    Method for analyzing data and performing lexical analysis 有权
    分析数据和进行词法分析的方法

    公开(公告)号:US08099722B2

    公开(公告)日:2012-01-17

    申请号:US11776299

    申请日:2007-07-11

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F9/45 G06F9/445

    摘要: A system and method provide the ability to construct lexical analyzers on the fly in an efficient and pervasive manner. The system and method split the table describing the automata into two distinct tables and splits the lexical analyzer into two phases, one for each table. The two phases consist of a single transition algorithm and a range transition algorithm, both of which are table driven and permit the dynamic modification of those tables during operation. A third ‘entry point’ table may also be used to speed up the process of finding the first table element from state 0 for any given input character.

    摘要翻译: 一种系统和方法能够以高效和普遍的方式在飞行中构建词法分析器。 系统和方法将描述自动机的表分成两个不同的表,并将词法分析器分为两个阶段,每个表分别为一个。 这两个阶段由单个转换算法和范围转换算法组成,两者都是表驱动的,并且允许在操作期间对这些表进行动态修改。 也可以使用第三个“入口点”表来加速从任何给定输入字符的状态0找到第一个表格元素的过程。

    System and method for parsing data
    2.
    发明授权
    System and method for parsing data 有权
    用于解析数据的系统和方法

    公开(公告)号:US07210130B2

    公开(公告)日:2007-04-24

    申请号:US10357324

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F9/45

    摘要: A dynamically extensible approach to parsing textual input consisting of a predictive parser and associated predictive parser generator is provided. The combination, together with a plug-in/resolver architecture, provides the ability to handle a set of languages that is vastly larger than that conventionally handled by predictive parsing techniques. The generator accepts extended BNF language specifications containing embedded reverse polish plug-in call specifications giving the plug-in number to be called as well as an arbitrary textual parameter to be passed to the plug-in. The parser supports the ability to register a ‘resolver’ function as well as one or more custom reverse-polish plug-in handlers which are passed the textual parameter(s) specified in the extended BNF as well as having full control over the parsing and evaluation stacks. The ‘resolver’ is with a ‘no action’ parameter when the parser first encounters a token in the input stream and may modify the token as necessary. The resolver is also called when the parser must evaluate or assigu an entry on the evaluation stack at which time it can implement additional behaviors depending on the language or environment. Finally the ‘resolver’ is called when the parse terminates. The ‘resolver’ is the primary mechanism whereby more complex languages can be handled and is also a key part of connecting to external systems or storage when the parser is used in an interpreted context. The reverse polish plug-in functions are provided with an API to allow full control over and access to the parser stacks and can rapidly be configured to implement almost any language constructs.

    摘要翻译: 提供了一种用于解析由预测解析器和相关联的预测解析器生成器组成的文本输入的动态可扩展方法。 该组合以及插件/解析器架构提供了处理比通过预测性解析技术常规处理的语言大得多的一组语言的能力。 发生器接受扩展BNF语言规范,其中包含嵌入式反向抛光插件调用规范,给出要调用的插件号以及要传递给插件的任意文本参数。 解析器支持注册“解析器”功能以及一个或多个自定义反向抛光插件处理程序的功能,这些处理程序通过扩展BNF中指定的文本参数以及完全控制解析和 评估堆栈 当解析器首次在输入流中遇到令牌时,“解析器”具有“无动作”参数,并可根据需要修改令牌。 当解析器必须评估或标注评估堆栈上的条目时,也会调用解析器,此时可以根据语言或环境实现其他行为。 最后,解析器终止时调用“解析器”。 “解析器”是可以处理更复杂语言的主要机制,并且当解析器在解释上下文中使用时也是连接到外部系统或存储的关键部分。 反向抛光插件功能提供了一个API,可以完全控制和访问解析器堆栈,并可以快速配置为实现几乎任何语言结构。

    System for exchanging binary data
    3.
    发明授权
    System for exchanging binary data 有权
    用于交换二进制数据的系统

    公开(公告)号:US07158984B2

    公开(公告)日:2007-01-02

    申请号:US10357325

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F17/30

    摘要: A strongly-typed, distributed, run-time system capable of describing and manipulating arbitrarily complex, non-flat, binary data derived from type descriptions in a standard (or slightly extended) programming language, including handling of type inheritance. The system is composed of four primary components. First, a plurality of databases having binary type and field descriptions. Second, a run-time modifiable type compiler that is capable of generating type databases either via explicit API calls or by compilation of unmodified header files or individual type definitions in a standard programming language. Third, a complete API suite for access to type information as well as full support for reading and writing types, type relationships and inheritance, and type fields, given knowledge of the unique numeric type ID and the field name/path. Finally, a hashing process for converting type names to unique type IDs (which may also incorporate a number of logical flags relating to the nature of the type). Further extensions and improvements are also provided as described herein.

    摘要翻译: 一种强类型,分布式的运行时系统,能够描述和操作从标准(或稍微扩展)的编程语言(包括处理类型继承)中的类型描述导出的任意复杂的非平面二进制数据。 该系统由四个主要组件组成。 首先,具有二进制类型和字段描述的多个数据库。 第二,运行时可修改类型的编译器,能够通过显式API调用或通过编译未修改的头文件或以标准编程语言的单个类型定义生成类型数据库。 第三,一个完整的API套件,用于访问类型信息,以及完全支持读写类型,类型关系和继承以及类型字段,给定了唯一数字类型ID和字段名称/路径的知识。 最后,用于将类型名称转换为唯一类型ID的散列过程(其也可以包含与类型的性质有关的多个逻辑标志)。 还提供了如本文所述的进一步的扩展和改进。

    SYSTEM AND METHOD FOR ANALYZING DATA
    4.
    发明申请
    SYSTEM AND METHOD FOR ANALYZING DATA 有权
    用于分析数据的系统和方法

    公开(公告)号:US20080016503A1

    公开(公告)日:2008-01-17

    申请号:US11776299

    申请日:2007-07-11

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F9/45

    摘要: A system and method provide the ability to construct lexical analyzers on the fly in an efficient and pervasive manner. The system and method split the table describing the automata into two distinct tables and splits the lexical analyzer into two phases, one for each table. The two phases consist of a single transition algorithm and a range transition algorithm, both of which are table driven and permit the dynamic modification of those tables during operation. A third ‘entry point’ table may also be used to speed up the process of finding the first table element from state 0 for any given input character.

    摘要翻译: 一种系统和方法能够以高效和普遍的方式在飞行中构建词法分析器。 系统和方法将描述自动机的表分成两个不同的表,并将词法分析器分为两个阶段,每个表分别为一个。 这两个阶段由单个转换算法和范围转换算法组成,两者都是表驱动的,并且允许在操作期间对这些表进行动态修改。 也可以使用第三个“入口点”表来加速从任何给定输入字符的状态0找到第一个表格元素的过程。

    System and method for managing collections of data on a network
    5.
    发明授权
    System and method for managing collections of data on a network 有权
    用于管理网络上的数据集合的系统和方法

    公开(公告)号:US07308449B2

    公开(公告)日:2007-12-11

    申请号:US10357304

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F17/30

    摘要: The present invention enables the creation, management, retrieval, distribution and massively large collections of information that can be shared across a distributed network without building absolute references or even pre-existing knowledge of the data and data structures being stored in such an environment. The system includes the following components: (1) a ‘flat’ data model wherein arbitrarily complex structures can be instantiated within a single memory allocation (including both the aggregation arrangements and the data itself, as well as any cross references between them via ‘relative’ references); (2) a run-time type system capable of defining and accessing binary strongly-typed data; (3) a set of ‘containers’ within which information encoded according to the system can be physically stored and preferably include a memory resident form, a file-based form, and a server-based form; (4) a client-server environment that is tied to the types system and capable of interpreting and executing all necessary collection manipulations remotely; (5) a basic aggregation structure providing as a minimum a ‘parent’, ‘nextChild’, ‘previousChild’, ‘firstChild’, and ‘lastChild’ links or equivalents; and (6) a data attachment structure (whose size may vary) to which strongly typed data can be attached and which is associated in some manner with (and possibly identical to) a containing aggregation node in the collection. Additional extensions and modifications to the system are also specified herein.

    摘要翻译: 本发明能够创建,管理,检索,分发和大量的可以在分布式网络上共享的信息集合,而不建立绝对参考,甚至是在这样的环境中存储的数据和数据结构的预先存在的知识。 该系统包括以下组件:(1)“平面”数据模型,其中任意复杂的结构可以在单个存储器分配(包括聚合布置和数据本身以及它们之间的任何交叉引用)之间通过“相对 '参考); (2)能够定义和访问二进制强类型数据的运行时类型系统; (3)一组“容器”,根据系统编码的信息可以物理存储在其中,并且优选地包括存储器驻留形式,基于文件的形式和基于服务器的形式; (4)与类型系统相关联并能够远程解释和执行所有必要的收集操作的客户端 - 服务器环境; (5)基本聚合结构,提供“父”,“nextChild”,“previousChild”,“firstChild”和“lastChild”链接或等同物的最小值; 和(6)可以附加强类型数据的数据附加结构(其大小可以变化),并且以某种方式与集合中的包含聚合节点(并且可能相同)相关联。 本文还规定了对系统的附加扩展和修改。

    LANGUAGE INDEPENDENT STEMMING
    6.
    发明申请
    LANGUAGE INDEPENDENT STEMMING 有权
    语言独立的STEMIING

    公开(公告)号:US20080228748A1

    公开(公告)日:2008-09-18

    申请号:US11687402

    申请日:2007-03-16

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30613 G06F17/3066

    摘要: A stemming framework for combining stemming algorithms together in a multilingual environment to obtain improved stemming behavior over any individual stemming algorithm, together with a new language independent stemming algorithm based on shortest path techniques. The stemmer essentially treats the stemming problem as a simple instance of the shortest path problem where the cost for each path can be computed from its word component and its number of characters. The goal of the stemmer is to find the shortest path to construct the entire word. The stemmer uses dynamic dictionaries constructed as lexical analyzer state transition tables to recognize the various allowable word parts for any given language in order to obtain maximum speed. The stemming framework provides the necessary logic to combine multiple stemmers in parallel and to merge their results to obtain the best behavior. Mapping dictionaries handle irregular plurals, tense, phrase mapping and proper name recognition.

    摘要翻译: 一种用于在多语言环境中将词干算法组合在一起的干预框架,以便通过任何单独的词干化算法获得改进的词干行为,以及基于最短路径技术的新的语言独立词干算法。 干扰器基本上将干扰问题视为最短路径问题的简单实例,其中每个路径的成本可以从其单词分量及其字符数量来计算。 干扰者的目标是找到构建整个单词的最短路径。 句柄使用构成为词法分析器状态转换表的动态词典来识别任何给定语言的各种允许的单词部分,以获得最大速度。 干扰框架提供了并行组合多个干扰源并合并其结果以获得最佳行为的必要逻辑。 映射字典处理不规则复数,时态,短语映射和正确的名称识别。

    Method for analyzing data and performing lexical analysis
    7.
    发明授权
    Method for analyzing data and performing lexical analysis 有权
    分析数据和进行词法分析的方法

    公开(公告)号:US07328430B2

    公开(公告)日:2008-02-05

    申请号:US10357326

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F9/45 G06F9/44

    摘要: A system and method provide the ability to construct lexical analyzers on the fly in an efficient and pervasive manner. The system and method split the table describing the automata into two distinct tables and splits the lexical analyzer into two phases, one for each table. The two phases consist of a single transition algorithm and a range transition algorithm, both of which are table driven and permit the dynamic modification of those tables during operation. A third ‘entry point’ table may also be used to speed up the process of finding the first table element from state 0 for any given input character.

    摘要翻译: 一种系统和方法能够以高效和普遍的方式在飞行中构建词法分析器。 系统和方法将描述自动机的表分成两个不同的表,并将词法分析器分为两个阶段,每个表分别为一个。 这两个阶段由单个转换算法和范围转换算法组成,两者都是表驱动的,并且允许在操作期间对这些表进行动态修改。 也可以使用第三个“入口点”表来加速从任何给定输入字符的状态0找到第一个表格元素的过程。

    Data flow scheduling environment with formalized pin-base interface and input pin triggering by data collections
    8.
    发明授权
    Data flow scheduling environment with formalized pin-base interface and input pin triggering by data collections 有权
    数据流调度环境,具有形式化的针脚接口和数据采集的输入引脚触发

    公开(公告)号:US07308674B2

    公开(公告)日:2007-12-11

    申请号:US10357285

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F9/44

    摘要: A system and method for implementing a data-flow based system includes three basic components: a data-flow based scheduling environment that balances the needs of data initiated program execution as a result of flows with other practical considerations such as user responsiveness, event driven invocation, user interface considerations, and the need to also support control-flow based paradigms where required; a visual programming language, based on the flow of strongly-typed run-time accessible data and data collections between small control-flow based locally and network distributed functional building-blocks, known as widgets; and a formalized pin-based interface to allow access to data-flow contents from the executing code within the widgets. The pins on the widgets include both pins used to control execution of a widget as well as pins used to receive data input from a data flow. The system and method further include a debugging environment that enables visual debugging of one or more widgets (or collections of widgets). Data control techniques include the concepts of “OR” and “AND” consumption thereby permitting either consumption immediately or only after all widget inputs have received the token. Additional extensions to this framework will also be described that relate to the environment, the programming language and the interface.

    摘要翻译: 用于实现基于数据流的系统的系统和方法包括三个基本组件:基于数据流的调度环境,其平衡作为流的结果的数据发起的程序执行的需要,其他实际考虑因素,例如用户响应性,事件驱动调用 ,用户界面注意事项,还需要在需要时支持基于控制流的范例; 可视化编程语言,基于强类型的运行时可访问数据和基于小型控制流的本地和网络分布式功能构建块(称为小部件)之间的数据收集的流程; 以及形式化的基于引脚的接口,以允许从窗口小部件内的执行代码访问数据流内容。 小部件上的引脚包括用于控制小部件的执行的引脚以及用于接收从数据流输入的数据的引脚。 该系统和方法进一步包括调试环境,该调试环境允许对一个或多个小部件(或小部件集合)进行可视化调试。 数据控制技术包括“OR”和“AND”消耗的概念,从而可以立即或仅在所有窗口小部件输入都接收到令牌之后进行消费。 还将描述与该环境,编程语言和接口相关的此框架的附加扩展。

    System and method for mining data

    公开(公告)号:US20060235811A1

    公开(公告)日:2006-10-19

    申请号:US11455304

    申请日:2006-06-16

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F15/18

    摘要: A system and method for extracting data, hereinafter referred to as MitoMine, that produces a strongly-typed ontology defined collection referencing (and cross referencing) all extracted records. The input to the mining process can be any data source, such as a text file delimited into a set of possibly dissimilar records. MitoMine contains parser routines and post processing functions, known as ‘munchers’. The parser routines can be accessed either via a batch mining process or as part of a running server process connected to a live source. Munchers can be registered on a per data-source basis in order to process the records produced, possibly writing them to an external database and/or a set of servers. The present invention also embeds an interpreted ontology based language within a compiler/interpreter (for the source format) such that the statements of the embedded language are executed as a result of the source compiler ‘recognizing’ a given construct within the source and extracting the corresponding source content. In this way, the execution of the statements in the embedded program will occur in a sequence that is dictated wholly by the source content. This system and method therefore make it possible to bulk extract free-form data from such sources as CD-ROMs, the web etc. and have the resultant structured data loaded into an ontology based system.

    System and method for managing memory

    公开(公告)号:US07103749B2

    公开(公告)日:2006-09-05

    申请号:US10357288

    申请日:2003-02-03

    申请人: John Fairweather

    发明人: John Fairweather

    IPC分类号: G06F12/00

    摘要: A new memory tuple is described that creates both a handle as well as a reference to an item within the handle. The reference is created using an offset value that defines the physical offset of the data within the memory block. Thereafter, if references are passed in terms of their offset value, this value will be the same in any copy of the handle regardless of the machine. In a distributed computing environment, equivalence between handles is established in a single transaction between two communicating machines. Thereafter, the two machines can communicate about specific handle contents simply by using offsets.