System and method for constraint-based rule mining in large, dense data-sets
    4.
    发明授权
    System and method for constraint-based rule mining in large, dense data-sets 有权
    在大型,密集数据集中基于约束的规则挖掘的系统和方法

    公开(公告)号:US06278997B1

    公开(公告)日:2001-08-21

    申请号:US09245319

    申请日:1999-02-05

    IPC分类号: G06F1730

    摘要: A dense data-set mining system and method is provided that directly exploits all user-specified constraints including minimum support, minimum confidence, and a new constraint, known as minimum gap, which prunes any rule having conditions that do not contribute to its predictive accuracy. The method maintains efficiency even at low supports on data that is dense in the sense that many items appear with high frequency (e.g. relational data).

    摘要翻译: 提供了一种密集的数据集挖掘系统和方法,其直接利用所有用户指定的约束,包括最小支持,最小置信度和称为最小间隙的新约束,该约束修剪具有条件的任何规则,这些条件对其预测准确性无贡献 。 即使在许多项目以高频率出现(例如关系数据)的意义上,密集的数据的低支持性也能够保持效率。

    Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
    5.
    发明授权
    Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values 失效
    基于使用渔民价值作为歧视价值的培训文件分类的特征的多级分类法

    公开(公告)号:US06233575B1

    公开(公告)日:2001-05-15

    申请号:US09102861

    申请日:1998-06-23

    IPC分类号: G06F1730

    摘要: A system, process, and article of manufacture for organizing a large text database into a hierarchy of topics and for maintaining this organization as documents are added and deleted and as the topic hierarchy changes. Given sample documents belonging to various nodes in the topic hierarchy, the tokens (terms, phrases, dates, or other usable feature in the document) that are most useful at each internal decision node for the purpose of routing new documents to the children of that node are automatically detected. Using feature terms, statistical models are constructed for each topic node. The models are used in an estimation technique to assign topic paths to new unlabeled documents. The hierarchical technique, in which feature terms can be very different at different nodes, leads to an efficient context-sensitive classification technique. The hierarchical technique can handle millions of documents and tens of thousands of topics. A resulting taxonomy and path enhanced retrieval system (TAPER) is used to generate context-dependent document indexing terms. The topic paths are used, in addition to keywords, for better focused searching and browsing of the text database.

    摘要翻译: 将大型文本数据库组织到主题层次结构中并将该组织作为文档进行维护的系统,过程和制品被添加和删除,并且随着主题层级的改变。 给定属于主题层次结构中各种节点的示例文档,在每个内部决策节点最有用的令牌(文档中的术语,短语,日期或其他可用功能),以将新文档路由到该文档的子项 节点被自动检测。 使用特征项,为每个主题节点构建统计模型。 这些模型用于估计技术,以将主题路径分配给新的未标记的文档。 特征项在不同节点上可能非常不同的分层技术导致了一种有效的上下文相关分类技术。 分层技术可以处理数百万个文档和数万个主题。 所得到的分类和路径增强检索系统(TAPER)用于生成与上下文相关的文档索引条款。 除了关键字之外,还使用主题路径,以便更好地集中搜索和浏览文本数据库。

    Determining query intent
    6.
    发明授权
    Determining query intent 有权
    确定查询意图

    公开(公告)号:US08612432B2

    公开(公告)日:2013-12-17

    申请号:US12816389

    申请日:2010-06-16

    IPC分类号: G06F7/00 G06F17/30 G06F15/18

    CPC分类号: G06F17/30979

    摘要: A tree structure has a node associated with each category of a hierarchy of item categories. Child nodes of the tree are associated with sub-categories of the categories associated with parent nodes. Training data including received queries and indicators of a selected item category for each received query is combined with the tree structure by associating each query with the node corresponding to the selected category of the query. When a query is received, a classifier is applied to the nodes to generate a probability that the query is intended to match an item of the category associated with the node. The classifier is applied until the probability is below a threshold. One or more categories associated with the nodes that are closest to the intent of the received query are selected and indicators of items of those categories that match the received query are output.

    摘要翻译: 树结构具有与项目类别的层次结构的每个类别相关联的节点。 树的子节点与与父节点相关联的类别的子类别相关联。 通过将每个查询与对应于所选择的查询类别的节点相关联,将包括接收到的查询和针对每个接收到的查询的所选项目类别的指示符的训练数据与树结构组合。 当接收到查询时,分类器被应用于节点以产生查询旨在匹配与节点相关联的类别的项目的概率。 应用分类器直到概率低于阈值。 选择与接收到的查询的意图最接近的节点相关联的一个或多个类别,并输出与接收到的查询匹配的那些类别的项目的指示符。

    Methods and systems for visually distinguishing user attribute similarities and differences
    7.
    发明授权
    Methods and systems for visually distinguishing user attribute similarities and differences 有权
    用于视觉区分用户属性相似性和差异的方法和系统

    公开(公告)号:US08413060B1

    公开(公告)日:2013-04-02

    申请号:US12000846

    申请日:2007-12-18

    申请人: Rakesh Agrawal

    发明人: Rakesh Agrawal

    IPC分类号: G06F3/00 G06F15/16

    CPC分类号: H04L51/04

    摘要: Methods, computer-readable storage media, and systems are provided to facilitate visually distinguishing common attributes of users an electronic communication network or messaging service. In particular, user profile attributes are compared between a first and second user, and similar attributes are visually highlighted by assigning, for example, a distinct font, font size, color, font effect, and/or other visual effect to the user's screen name to designate which attributes are similar. In addition, or alternatively, when the first user views a user profile of the second user, common user attributes are visually highlighted. In one embodiment, the font, font size, color, and/or font effect assigned to the highlighted attribute indicates a degree of similarity of the attribute. Such implementations may allow users to more easily recognize and interact with others that have similar interests and attributes.

    摘要翻译: 提供方法,计算机可读存储介质和系统以便于在电视通信网络或消息服务的视觉上区分用户的公共属性。 特别地,在第一和第二用户之间比较用户简档属性,并且通过向用户的屏幕名称分配例如不同的字体,字体大小,颜色,字体效果和/或其他视觉效果来视觉突出类似的属性 指定哪些属性相似。 另外或替代地,当第一用户查看第二用户的用户简档时,公共用户属性被视觉上突出显示。 在一个实施例中,分配给突出显示的属性的字体,字体大小,颜色和/或字体效果指示属性的相似程度。 这样的实现可以允许用户更容易地识别和与具有相似兴趣和属性的其他人交互。

    Middleware for query processing across a network of RFID databases
    8.
    发明授权
    Middleware for query processing across a network of RFID databases 失效
    用于RFID数据库网络查询处理的中间件

    公开(公告)号:US08244747B2

    公开(公告)日:2012-08-14

    申请号:US11566931

    申请日:2006-12-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30448 G06F17/30557

    摘要: An implementation wherein RFID data is shared across independent organizations has been addressed. RFID data is usually spread across different parties, e.g. enterprises in a supply chain and thus, efficient query processing across all parties is required. Traceability is emerging as one of the key applications of RFID technology. A generic data model is introduced for querying RFID data across a network of independently operated data sources. The model can be used to facilitate traceability query processing and give a set of representative traceability queries. A newly designed process-and-forward approach is implemented for executing traceability queries.

    摘要翻译: 解决了RFID数据在独立组织之间共享的实现。 RFID数据通常分布在不同方面,例如 供应链中的企业,因此需要各方有效的查询处理。 可追溯性正在成为RFID技术的关键应用之一。 引入了通用数据模型,用于通过独立运行的数据源网络查询RFID数据。 该模型可用于促进可追溯性查询处理,并提供一组具有代表性的可追溯性查询。 实施新设计的进程和转发方法来执行可追溯性查询。

    OBJECT CLASSIFICATION USING TAXONOMIES
    9.
    发明申请
    OBJECT CLASSIFICATION USING TAXONOMIES 有权
    使用TAXONOMIES的对象分类

    公开(公告)号:US20100185577A1

    公开(公告)日:2010-07-22

    申请号:US12414065

    申请日:2009-03-30

    IPC分类号: G06N5/02

    CPC分类号: G06N99/005

    摘要: As provided herein objects from a source catalog, such as a provider's catalog, can be added to a target catalog, such as an enterprise master catalog, in a scalable manner utilizing catalog taxonomies. A baseline classifier determines probabilities for source objects to target catalog classes. Source objects can be assigned to those classes with probabilities that meet a desired threshold and meet a desired rate. A classification cost for target classes can be determined for respective unassigned source objects, which can comprise determining an assignment cost and separation cost for the source objects for respective desired target classes. The separation and assignment costs can be combined to determine the classification cost, and the unassigned source objects can be assigned to those classes having a desired classification cost.

    摘要翻译: 如本文所提供的,可以使用目录分类法将来自源目录的诸如提供者目录的对象以可扩展的方式添加到目标目录,例如企业主目录。 基准分类器确定源对象到目标目录类的概率。 可以将源对象分配给具有满足期望阈值且满足期望速率的概率的那些类。 可以针对相应的未分配的源对象来确定目标类别的分类成本,其可以包括确定用于各个期望目标类别的源对象的分配成本和分离成本。 分离和分配成本可以组合以确定分类成本,并且未分配的源对象可以被分配给具有期望的分类成本的那些类。

    CUSTOMIZED SEARCH
    10.
    发明申请
    CUSTOMIZED SEARCH 有权
    自定义搜索

    公开(公告)号:US20100114925A1

    公开(公告)日:2010-05-06

    申请号:US12253658

    申请日:2008-10-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30477

    摘要: Techniques are disclosed herein for providing a custom search engine. In one aspect, a first search query is received from a requestor. First search results contain search result items that match the first search query are obtained. A least one sub-query is generated from the first search results. The generating is based on rules for a particular custom search engine. Second search results that match the sub-query are then obtained. A search result set is formed from a corpus that includes the first search results and the second search results. The generating of the search result set is based on the rules for the particular custom search engine. The search result set is provided to the requester. In one aspect an interface for designing a custom search engine is provided. The interface allows the designer to specify the layout of a search results page.

    摘要翻译: 本文公开了用于提供定制搜索引擎的技术。 在一个方面,从请求者接收第一搜索查询。 首先搜索结果包含与第一个搜索查询匹配的搜索结果项。 从第一搜索结果生成至少一个子查询。 生成基于特定自定义搜索引擎的规则。 然后获得与子查询匹配的第二搜索结果。 搜索结果集由包含第一搜索结果和第二搜索结果的语料库形成。 搜索结果集的生成基于特定自定义搜索引擎的规则。 搜索结果集提供给请求者。 在一个方面,提供了一种用于设计定制搜索引擎的界面。 该界面允许设计者指定搜索结果页面的布局。