Method and system for searching and retrieving documents
    1.
    发明授权
    Method and system for searching and retrieving documents 失效
    搜索和检索文档的方法和系统

    公开(公告)号:US07209913B2

    公开(公告)日:2007-04-24

    申请号:US10034477

    申请日:2001-12-28

    IPC分类号: G06F17/30 G06F7/00

    摘要: A system (100) for searching and retrieving documents includes a database (106), a memory device (108), a user interface device (102) and a controller (104). The database (106) stores documents. The memory device (108) stores software, tokens and an index. The software performs methods according to a background routine (118) and a foreground routine (116). Each token (e.g., speed) has related expressions (e.g., miles per hour, mph, kilometers per hour, and kph) assigned thereto that define the token. The index has documents, having an occurrence of one of the related expressions for one of the tokens, assigned to the one of the tokens. The user interface device (102) accepts and sends search queries having a token and receives information related to the documents, having an occurrence of the related expressions for the token, responsive to a user interface process (120). The controller (104) is electrically coupled to the memory device (108), the user interface device (102) and the database (106). The controller (104) manages communications between the memory device (108) and the user interface device (102) responsive to the foreground routine (116) in the software to respond to the search queries having the token. The controller (104) also manages communications between the memory device (108) and the database (106) responsive to the background routine (118) in the software to create the index.

    摘要翻译: 用于搜索和检索文档的系统(100)包括数据库(106),存储设备(108),用户界面设备(102)和控制器(104)。 数据库(106)存储文档。 存储装置(108)存储软件,令牌和索引。 软件根据背景例程(118)和前景例程(116)执行方法。 每个令牌(例如,速度)具有分配给它们的定义令牌的相关表达式(例如,每小时英里,每小时英里,公里/小时和kph)。 该索引具有文件,其中一个令牌的一个相关表达式出现,分配给其中一个令牌。 响应于用户界面过程(120),用户界面设备(102)接受并发送具有令牌的搜索查询并且接收与文档相关的信息,具有令牌的相关表达的出现。 控制器(104)电耦合到存储器设备(108),用户接口设备(102)和数据库(106)。 响应于软件中的前景例程(116),控制器(104)管理存储器设备(108)和用户接口设备(102)之间的通信,以响应具有令牌的搜索查询。 响应于软件中的背景例程(118),控制器(104)还管理存储器设备(108)和数据库(106)之间的通信以创建索引。

    System and method for gathering, indexing, and supplying publicly available data charts

    公开(公告)号:US06996268B2

    公开(公告)日:2006-02-07

    申请号:US10034317

    申请日:2001-12-28

    IPC分类号: G06K9/62

    摘要: A system, method and search engine for searching images for data contained therein. Training images are provided and image attributes are extracted from the training images. Attributes extracted from training images include image features characteristic of a particular numerically generated image type, such as horizontal lines, vertical lines, percentage white area, circular arcs and text. Then, the training images are classified according to extracted attributes and a particular classifier is selected for each group of training images. Classifiers can include classification trees, discriminant functions, regression trees, support vector machines, neural nets and hidden Markov models. Available images are collected from remotely connected computers, e.g., over the Internet. Collected images are indexed and provided for interrogation by users. As a user enters queries, indexed images are identified and returned to the user. The user may provide additional data as supplemental data to the extracted image data. A chart, representative of the supplemented data, may be generated and provided to the user in response to a particular query.

    System and method for gathering, indexing, and supplying publicly available data charts
    3.
    发明授权
    System and method for gathering, indexing, and supplying publicly available data charts 有权
    收集,索引和提供公开数据图表的系统和方法

    公开(公告)号:US08799772B2

    公开(公告)日:2014-08-05

    申请号:US11160964

    申请日:2005-07-18

    IPC分类号: G06F17/30 G06F17/40

    摘要: A system, method and search engine for searching images for data contained therein. Training images are provided and image attributes are extracted from the training images. Attributes extracted from training images include image features characteristic of a particular numerically generated image type, such as horizontal lines, vertical lines, percentage white area, circular arcs and text. Then, the training images are classified according to extracted attributes and a particular classifier is selected for each group of training images. Classifiers can include classification trees, discriminant functions, regression trees, support vector machines, neural nets and hidden Markov models. Available images are collected from remotely connected computers, e.g., over the Internet. Collected images are indexed and provided for interrogation by users. As a user enters queries, indexed images are identified and returned to the user. The user may provide additional data as supplemental data to the extracted image data. A chart, representative of the supplemented data, may be generated and provided to the user in response to a particular query.

    摘要翻译: 一种用于搜索图像中包含的数据的系统,方法和搜索引擎。 提供训练图像,并从训练图像中提取图像属性。 从训练图像提取的属性包括特定数字生成的图像类型的特征的图像特征,例如水平线,垂直线,白色百分比百分比,圆弧和文本。 然后,根据提取的属性对训练图像进行分类,并为每组训练图像选择特定的分类器。 分类器可以包括分类树,判别函数,回归树,支持向量机,神经网络和隐马尔可夫模型。 从远程连接的计算机(例如,通过因特网)收集可用的图像。 收集的图像被索引并提供给用户的询问。 当用户输入查询时,索引的图像被识别并返回给用户。 用户可以向提取的图像数据提供附加数据作为补充数据。 可以生成表示补充数据的图表,并且响应于特定查询而将其提供给用户。

    User-Guided Regular Expression Learning
    4.
    发明申请
    User-Guided Regular Expression Learning 有权
    用户引导正则表达式学习

    公开(公告)号:US20100205201A1

    公开(公告)日:2010-08-12

    申请号:US12369216

    申请日:2009-02-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30985 G06F17/30648

    摘要: A method, device, and computer program product are provided for regular expression learning is provided. An initial regular expression may be received from a user. The initial regular expression is executed over a database. Positive matches and negative matches are labeled. The initial regular expression and the labeled positive and negative matches are input in a transformation process. The transformation process may iteratively execute character class restrictions, quantifier restrictions, negative lookaheads on the initial regular expression to transform the initial regular expression into the pool of candidate regular expressions. The transformation process may execute, one at a time, the character class restrictions, quantifier restrictions, the negative lookaheads. A candidate regular expression is selected from the pool of candidate regular expressions, where the selected candidate regular expression has a best F-Measure out of the pool of candidate regular expressions.

    摘要翻译: 提供了一种用于正则表达式学习的方法,设备和计算机程序产品。 可以从用户接收到初始正则表达式。 初始正则表达式通过数据库执行。 正面比赛和负面比赛被标记。 在转换过程中输入初始正则表达式和标记的正和负匹配。 转换过程可以迭代地执行字符类限制,量词限制,初始正则表达式的负面前瞻,以将初始正则表达式转换为候选正则表达式的池。 转换过程可以一次一个地执行字符类限制,量词限制,否定前瞻。 从候选正则表达式的池中选择候选正则表达式,其中所选择的候选正则表达式在候选正则表达式池中具有最佳的F-Measure。

    Asynchronous Hidden Markov Model Method and System
    5.
    发明申请
    Asynchronous Hidden Markov Model Method and System 审中-公开
    异步隐马尔科夫模型的方法与系统

    公开(公告)号:US20080215299A1

    公开(公告)日:2008-09-04

    申请号:US12105430

    申请日:2008-04-18

    IPC分类号: G06F17/18

    CPC分类号: G10L15/144 G06K9/6297

    摘要: A system, method and program storage device implementing a method for modeling a data generating process, wherein the modeling comprises observing a data sequence comprising irregularly sampled data, obtaining an observation sequence based on the observed data sequence, assigning a time index sequence to the data sequence, obtaining a hidden state sequence of the data sequence, and decoding the data sequence based on a combination of the time index sequence and the hidden state sequence to model the data sequence. The method further comprises assigning a probability distribution over time stamp values of the observation sequence, wherein the decoding comprises using a Hidden Markov Model. The method further comprises using an expectation maximization methodology to learn the Hidden Markov Model.

    摘要翻译: 一种实现数据生成过程建模方法的系统,方法和程序存储设备,其中所述建模包括观察包含不规则采样数据的数据序列,基于观察到的数据序列获得观测序列,向数据分配时间索引序列 获取数据序列的隐藏状态序列,并且基于时间索引序列和隐藏状态序列的组合对数据序列进行解码以建模数据序列。 该方法还包括分配观测序列的时间戳值的概率分布,其中解码包括使用隐马尔科夫模型。 该方法还包括使用期望最大化方法来学习隐马尔可夫模型。

    Online analytic processing in the presence of uncertainties
    6.
    发明申请
    Online analytic processing in the presence of uncertainties 审中-公开
    在线分析处理存在不确定性

    公开(公告)号:US20070233651A1

    公开(公告)日:2007-10-04

    申请号:US11395403

    申请日:2006-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24556

    摘要: Disclosed are embodiments of a method for online analytic processing of queries and, and more particularly, of a method that extends the on-line analytic processing (OLAP) data model to represent data ambiguity, such as imprecision and uncertainty, in data values. Specifically, the embodiments of the method incorporate a statistical model that allows for uncertain measures to be modeled as conditional probabilities. Additionally, an embodiment of the method further identifies natural query properties (e.g., consistency and faithfulness) and uses them to shed light on alternative query semantics. Lastly, an embodiment of the method further introduces an allocation-based approach to the semantics of aggregation queries over such data.

    摘要翻译: 公开了用于查询的在线分析处理的方法的实施例,并且更具体地,扩展在线分析处理(OLAP)数据模型以表示数据值中的数据模糊性,诸如不精确性和不确定性的方法的实施例。 具体地说,该方法的实施例包括允许将不确定度量建模为条件概率的统计模型。 此外,该方法的一个实施例进一步标识自然查询属性(例如,一致性和忠实性),并使用它们来阐明替代查询语义。 最后,该方法的一个实施例进一步引入基于分配的方法来处理关于这种数据的聚合查询的语义。

    System and method for storing text annotations with associated type information in a structured data store
    7.
    发明申请
    System and method for storing text annotations with associated type information in a structured data store 有权
    用于在结构化数据存储中存储具有关联类型信息的文本注释的系统和方法

    公开(公告)号:US20070168380A1

    公开(公告)日:2007-07-19

    申请号:US11334255

    申请日:2006-01-17

    IPC分类号: G06F7/00

    摘要: A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.

    摘要翻译: 文本注释结构化存储系统将具有关联类型信息的文本注释存储在结构化数据存储中。 本系统以可索引和可​​查询的格式将批注持久存储在结构化数据存储中。 示例性结构化数据存储包括XML数据库和关系数据库。 系统利用类型系统中的类型信息来开发结构化数据模型中的相应模式。 该系统包括用于将注释映射到XML数据模型和关系数据模型的技术。 系统在持久存储的架构中捕获类型系统的各种功能,例如复杂类型和继承。 特别地,存储库提供对从任何类型开始的分层式系统的路径导航的支持。

    Method to hierarchical pooling of opinions from multiple sources
    8.
    发明授权
    Method to hierarchical pooling of opinions from multiple sources 有权
    从多个来源层次分组意见的方法

    公开(公告)号:US07130777B2

    公开(公告)日:2006-10-31

    申请号:US10723471

    申请日:2003-11-26

    IPC分类号: G06F17/10

    CPC分类号: G06Q30/02 G06Q30/0282

    摘要: Disclosed is a system, method, and program storage device of aggregating opinions comprising consolidating a plurality of expressed opinions on various dimensions of topics as discrete probability distributions, generating an aggregate opinion as a single point probability distribution by minimizing a sum of weighted divergences between a plurality of the discrete probability distributions, and presenting the aggregate opinion as a Bayesian network, wherein the divergences comprise Kullback-Liebler distance divergences, and wherein the expressed opinions are generated by experts and comprise opinions on sentiments of products and services. Moreover, the aggregate opinion predicts success of the products and services. Furthermore, the experts are arranged in a hierarchy of knowledge, wherein the knowledge comprises the various dimensions of topics for which opinions may be expressed upon.

    摘要翻译: 公开了一种集合意见的系统,方法和程序存储装置,包括将关于主题的各个维度的多个表达的意见合并为离散概率分布,通过最小化一个点概率分布的加权差异之和来生成聚合意见作为单点概率分布 多个离散概率分布,并将总体意见呈现为贝叶斯网络,其中分歧包括Kullback-Liebler距离差异,并且其中所表达的意见由专家产生并且包括对产品和服务的感觉的意见。 此外,总体意见预测产品和服务的成功。 此外,专家们被安排在知识层次中,其中知识包括可以表达意见的主题的各个维度。

    Surfaid predictor: web-based system for predicting surfer behavior
    9.
    发明授权
    Surfaid predictor: web-based system for predicting surfer behavior 有权
    Surfaid预测器:用于预测冲浪者行为的基于网络的系统

    公开(公告)号:US06338066B1

    公开(公告)日:2002-01-08

    申请号:US09160828

    申请日:1998-09-25

    IPC分类号: G06F15163

    摘要: Given a log of previous web-surfer behavior listing the order in which each surfer downloaded specific items at the web site, and given a meaningful classification of those same items, future surfer behavior is predicted by the present invention. The algorithm utilizes a quantitative model relating items downloaded prior to some specified event to items downloaded after that same event. When the model is applied to a new surfer's session prior to an analogous event, the present invention predicts the likely behavior of the surfer subsequent to that event. The predicted behavior is then further analyzed to derive a quantitative value for the utility of the expected behavior. By randomly selecting sample sessions from a web log, multiple models of surfer behavior can be generated. The multiple models can then be applied to a new surfer's session to produce a predicted behavior/utility distribution and thus a confidence interval for the predicted behavior/utility.

    摘要翻译: 给出以前的网络冲浪者行为的日志,列出每个冲浪者在网站上下载特定项目的顺序,并给出对这些相同项目的有意义的分类,本发明预测未来的冲浪者行为。 该算法使用定量模型,将与某个指定事件之前下载的项目相关联的项目与相同事件之后下载的项目相关联。 当模型在类似事件之前应用于新的冲浪者的会话时,本发明预测该事件之后的冲浪者的可能行为。 然后进一步分析预测行为,以获得预期行为效用的定量值。通过从Web日志中随机选择样本会话,可以生成多种冲突行为模型。 然后可以将多个模型应用于新的冲浪者会话,以产生预测的行为/效用分布,并因此产生预测行为/效用的置信区间。

    Method and apparatus for cluster exploration and visualization
    10.
    发明授权
    Method and apparatus for cluster exploration and visualization 失效
    集群探索和可视化的方法和装置

    公开(公告)号:US6100901A

    公开(公告)日:2000-08-08

    申请号:US102087

    申请日:1998-06-22

    IPC分类号: G06T5/00 G06T11/20

    摘要: A method and apparatus for visualizing a multi-dimensional data set in which the multi-dimensional data set is clustered into k clusters, with each cluster having a centroid. Then, either two distinct current centroids or three distinct non-collinear current centroids are selected. A current 2-dimensional cluster projection is generated based on the selected current centroids. In the case when two distinct current centroids are selected, two distinct target centroids are selected, with at least one of the two target centroids being different from the two current centroids. In the case when three distinct current centroids are selected, three distinct non-collinear target centroids are selected, with at least one of the three target centroids being different from the three current centroids. An intermediate 2-dimensional cluster projection is generated based on a set of interpolated centroids, with each interpolated centroid corresponding to a current centroid and to a target centroid associated with the current centroid. Each interpolated centroid is interpolated between the corresponding current centroid and the target centroid associated with the current centroid. Alternatively, the intermediate 2-dimensional cluster projection is generated based on an interpolated 2-dimensional nonlinear cluster projection that is based on the selected current centroids and the selected target centroids.

    摘要翻译: 一种可视化多维数据集的方法和装置,其中多维数据集被聚集成k个簇,每个簇具有质心。 然后,选择两个不同的当前质心或三个不同的非共线电流质心。 基于所选择的当前质心生成当前的二维聚类投影。 在选择两个不同的当前质心的情况下,选择两个不同的目标质心,其中两个目标质心中的至少一个与两个当前质心不同。 在选择三个不同的当前质心的情况下,选择三个不同的非共线目标质心,三个目标质心中的至少一个与三个当前质心不同。 基于一组内插质心生成中间二维聚类投影,其中每个内插质心对应于当前质心,并且与当前质心相关联的目标质心。 每个插值质心在相应的当前质心和与当前质心相关联的目标质心之间插值。 或者,基于基于所选择的当前质心和所选择的目标质心的内插的二维非线性聚类投影来生成中间二维聚类投影。