Incremental Visualization for Structured Data in an Enterprise-level Data Store
    1.
    发明申请
    Incremental Visualization for Structured Data in an Enterprise-level Data Store 有权
    企业级数据存储中结构化数据的增量可视化

    公开(公告)号:US20130268520A1

    公开(公告)日:2013-10-10

    申请号:US13439563

    申请日:2012-04-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30477 G06F17/30554

    摘要: The subject disclosure is directed towards simulating query execution to provide incremental visualization for a global data set. A data store may be configured for searching at least a portion of a global data set being stored at an enterprise-level data store. In response to a user-issued query, partial query results are provided to a front-end interface for display to the user. The front-end interface also provides statistical information corresponding to the partial query results in relation to the global data set, which may be used to determine when a current set of query results becomes acceptable as a true/accurate estimate.

    摘要翻译: 主题公开涉及模拟查询执行以为全局数据集提供增量可视化。 数据存储可以被配置用于搜索存储在企业级数据存储中的全局数据集的至少一部分。 响应于用户发出的查询,部分查询结果被提供给前端接口以供用户显示。 前端接口还提供与全局数据集有关的部分查询结果对应的统计信息,可用于确定当前查询结果集合何时可接受为真实/准确的估计。

    Incremental visualization for structured data in an enterprise-level data store
    2.
    发明授权
    Incremental visualization for structured data in an enterprise-level data store 有权
    企业级数据存储中结构化数据的增量可视化

    公开(公告)号:US08983936B2

    公开(公告)日:2015-03-17

    申请号:US13439563

    申请日:2012-04-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30477 G06F17/30554

    摘要: The subject disclosure is directed towards simulating query execution to provide incremental visualization for a global data set. A data store may be configured for searching at least a portion of a global data set being stored at an enterprise-level data store. In response to a user-issued query, partial query results are provided to a front-end interface for display to the user. The front-end interface also provides statistical information corresponding to the partial query results in relation to the global data set, which may be used to determine when a current set of query results becomes acceptable as a true/accurate estimate.

    摘要翻译: 主题公开涉及模拟查询执行以为全局数据集提供增量可视化。 数据存储可以被配置用于搜索存储在企业级数据存储中的全局数据集的至少一部分。 响应于用户发出的查询,部分查询结果被提供给前端接口以供用户显示。 前端接口还提供与全局数据集有关的部分查询结果对应的统计信息,可用于确定当前查询结果集合何时可接受为真实/准确的估计。

    Visualization of changing confidence intervals
    3.
    发明授权
    Visualization of changing confidence intervals 有权
    改变置信区间的可视化

    公开(公告)号:US09436740B2

    公开(公告)日:2016-09-06

    申请号:US13439650

    申请日:2012-04-04

    摘要: Incremental query results and confidence interval values associated with respective incremental query results may be obtained. Visualization shape objects indicating uncertainty values may be determined, based on mapping values of respective incremental query results and confidence interval values to points in the associated visualization shape objects, the uncertainty values visualized based on proportional shapes of the visualization shape objects. At least one visualization comparison object representing a comparison of a plurality of distributions associated with the obtained incremental query results and confidence interval values may be determined. Display of the plurality of visualization shape objects and the at least one visualization comparison object may be initiated.

    摘要翻译: 可以获得与各个增量查询结果相关联的增量查询结果和置信区间值。 可视化形状对象指示不确定性值可以基于相应的增量查询结果和置信区间值到相关联的可视化形状对象中的点的映射值,基于可视化形状对象的比例形状可视化的不确定性值。 可以确定表示与所获得的增量查询结果和置信区间值相关联的多个分布的比较的至少一个可视化比较对象。 可以开始显示多个可视化形状对象和至少一个可视化比较对象。

    Reducing human overhead in text categorization
    4.
    发明授权
    Reducing human overhead in text categorization 有权
    在文本分类中减少人为的开销

    公开(公告)号:US07894677B2

    公开(公告)日:2011-02-22

    申请号:US11350701

    申请日:2006-02-09

    IPC分类号: G06K9/64

    CPC分类号: G06K9/6282

    摘要: A unique multi-stage classification system and method that facilitates reducing human resources or costs associated with text classification while still obtaining a desired level of accuracy is provided. The multi-stage classification system and method involve a pattern-based classifier and a machine learning classifier. The pattern-based classifier is trained on discriminative patterns as identified by humans rather than machines which allow a smaller training set to be employed. Given humans' superior abilities to reason over text, discriminative patterns can be more accurately and more readily identified by them. Unlabeled items can be initially processed by the pattern-based classifier and if no pattern match exists, then the unlabeled data can be processed by the machine learning classifier. By employing the classifiers in this manner, less human involvement is required in the classification process. Even more, classification accuracy is maintained and/or improved.

    摘要翻译: 提供了一种独特的多级分类系统和方法,其有助于减少与文本分类相关联的人力资源或成本,同时仍然获得期望的精度水平。 多级分类系统和方法涉及基于模式的分类器和机器学习分类器。 对基于模式的分类器进行人类识别的识别模式的培训,而不是允许使用较小训练集的机器。 鉴于人类超越文本的优越能力,歧视性模式可以更准确,更容易地被识别。 未标记的项目可以由基于模式的分类器最初处理,如果不存在模式匹配,那么未标记的数据可以由机器学习分类器处理。 通过以这种方式使用分类器,在分类过程中需要较少的人参与。 更重要的是,维护和/或改进分类精度。

    ESTIMATING DOCUMENT SIMILARITY USING BIT-STRINGS
    6.
    发明申请
    ESTIMATING DOCUMENT SIMILARITY USING BIT-STRINGS 有权
    使用BIT-STRES估计文件的相似性

    公开(公告)号:US20120213313A1

    公开(公告)日:2012-08-23

    申请号:US13031265

    申请日:2011-02-21

    IPC分类号: H04L27/00

    CPC分类号: G06F17/30619 G06K9/00483

    摘要: Each of a plurality of documents is divided into samples. Small bit-strings are generated for selected samples from each of the documents and used to create a sketch for each document. Because the bit-strings are small (e.g., only one, two, or three bits in length), the generated sketches are smaller than the sketches generated using previous methods for generating sketches, and therefore use less storage space. The generated sketches are compared to determine documents that are near-duplicates of one another.

    摘要翻译: 将多个文档中的每一个分成样本。 为每个文档的选定样本生成小位字符串,并用于为每个文档创建草图。 由于位串很小(例如,长度只有一个,两个或三个位),生成的草图小于使用先前生成草图的方法生成的草图,因此使用较少的存储空间。 生成的草图被比较以确定彼此几乎重复的文档。

    Sponsored search data structure
    7.
    发明授权
    Sponsored search data structure 有权
    赞助搜索数据结构

    公开(公告)号:US08606627B2

    公开(公告)日:2013-12-10

    申请号:US12137567

    申请日:2008-06-12

    IPC分类号: G06Q40/00 G07G1/14

    摘要: A system that facilitates selecting advertisements that match a search query is described herein. The system includes a search query receiver component that receives a search query including keywords. The system also includes a match component that uses an associative data structure to identify in the associative data structure one or more data nodes that are associated in the associative data structure with respective unique keys corresponding to respective one or more hashes of combinations of the keywords in the search query. For each identified data node, the match component selects advertisements associated with bid phrases stored in the identified data node that respectively only include keywords included in the search query.

    摘要翻译: 这里描述了便于选择与搜索查询匹配的广告的系统。 该系统包括接收包括关键字的搜索查询的搜索查询接收器组件。 该系统还包括匹配组件,其使用关联数据结构来在关联数据结构中标识在关联数据结构中关联的一个或多个数据节点以及相应的唯一密钥,该唯一密钥对应于关键字的组合的相应一个或多个哈希值 搜索查询。 对于每个识别的数据节点,匹配组件选择与标识数据节点中存储的分别仅包括在搜索查询中的关键字相关联的出价短语相关联的广告。

    Determination of landmarks
    8.
    发明授权
    Determination of landmarks 有权
    确定地标

    公开(公告)号:US09189488B2

    公开(公告)日:2015-11-17

    申请号:US13081497

    申请日:2011-04-07

    IPC分类号: G06F17/30 G06F21/10

    CPC分类号: G06F17/30156 G06F21/10

    摘要: Hash values corresponding to a file are processed in windows to determine a minimum hash value for each window. Each window may begin at a minimum hash value determined for a previous window and end after a fixed number of hash values. If a hash value is less than a threshold hash value, it is added to a buffer that is used to store the hash values in sorted order for a current window. If a hash value is greater than the threshold, it is added to another buffer whose hash values are not stored in sorted order. At the end of the current window, the minimum hash value in the first buffer is selected as the landmark for the window. If the first buffer is empty, then the hash values in the other buffer are sorted and the minimum hash value is selected as the landmark for the window.

    摘要翻译: 在窗口中处理与文件相对应的哈希值,以确定每个窗口的最小哈希值。 每个窗口可以以对于前一窗口确定的最小散列值开始,并在固定数量的散列值之后结束。 如果哈希值小于阈值哈希值,则将其添加到缓冲区中,该缓冲区用于按当前窗口的排序顺序存储哈希值。 如果哈希值大于阈值,则将其添加到另一个缓冲区,其哈希值不按排序顺序存储。 在当前窗口的末尾,第一个缓冲区中的最小哈希值被选为窗口的里程碑。 如果第一个缓冲区为空,则另一个缓冲区中的哈希值被排序,并选择最小哈希值作为窗口的标志。

    Estimating document similarity using bit-strings
    9.
    发明授权
    Estimating document similarity using bit-strings 有权
    使用位串来估计文档相似度

    公开(公告)号:US08594239B2

    公开(公告)日:2013-11-26

    申请号:US13031265

    申请日:2011-02-21

    IPC分类号: H04L27/00

    CPC分类号: G06F17/30619 G06K9/00483

    摘要: Each of a plurality of documents is divided into samples. Small bit-strings are generated for selected samples from each of the documents and used to create a sketch for each document. Because the bit-strings are small (e.g., only one, two, or three bits in length), the generated sketches are smaller than the sketches generated using previous methods for generating sketches, and therefore use less storage space. The generated sketches are compared to determine documents that are near-duplicates of one another.

    摘要翻译: 将多个文档中的每一个分成样本。 为每个文档的选定样本生成小位字符串,并用于为每个文档创建草图。 由于位串很小(例如,长度只有一个,两个或三个位),生成的草图小于使用先前生成草图的方法生成的草图,因此使用较少的存储空间。 生成的草图被比较以确定彼此几乎重复的文档。

    DETERMINATION OF LANDMARKS
    10.
    发明申请
    DETERMINATION OF LANDMARKS 有权
    确定地名

    公开(公告)号:US20120259897A1

    公开(公告)日:2012-10-11

    申请号:US13081497

    申请日:2011-04-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30156 G06F21/10

    摘要: Hash values corresponding to a file are processed in windows to determine a minimum hash value for each window. Each window may begin at a minimum hash value determined for a previous window and end after a fixed number of hash values. If a hash value is less than a threshold hash value, it is added to a buffer that is used to store the hash values in sorted order for a current window. If a hash value is greater than the threshold, it is added to another buffer whose hash values are not stored in sorted order. At the end of the current window, the minimum hash value in the first buffer is selected as the landmark for the window. If the first buffer is empty, then the hash values in the other buffer are sorted and the minimum hash value is selected as the landmark for the window.

    摘要翻译: 在窗口中处理与文件相对应的哈希值,以确定每个窗口的最小哈希值。 每个窗口可以以对于前一窗口确定的最小散列值开始,并在固定数量的散列值之后结束。 如果哈希值小于阈值哈希值,则将其添加到缓冲区中,该缓冲区用于按当前窗口的排序顺序存储哈希值。 如果哈希值大于阈值,则将其添加到另一个缓冲区,其哈希值不按排序顺序存储。 在当前窗口的末尾,第一个缓冲区中的最小哈希值被选为窗口的里程碑。 如果第一个缓冲区为空,则另一个缓冲区中的哈希值被排序,并选择最小哈希值作为窗口的里程碑。