MANAGEMENT OF INDEXED DATA TO IMPROVE CONTENT RETRIEVAL PROCESSING

    公开(公告)号:US20210192281A1

    公开(公告)日:2021-06-24

    申请号:US16721652

    申请日:2019-12-19

    摘要: The present disclosure relates to processing operations configured to uniquely utilize indexing of content to improve content retrieval processing, particularly when working with large data sets. The techniques described herein enables efficient content retrieval when working with large data sets such as those that may be associated with a plurality of tenants of a data storage application/service. Among other technical advantages, the present disclosure is applicable to train a classifier using relevant samples based on text search in tenant-specific scenarios, where accurate searching can be executed for content associated with one or more tenant accounts of an application/service concurrently in milliseconds even in instances where there may be millions of documents to be searched. As an example, exemplary data shards may be generated and managed for efficient and scalable content retrieval processing including training of a classifier (e.g., artificial intelligence classifier) and real-time (or near real-time) query processing.

    Document retrieval/identification using topics
    4.
    发明授权
    Document retrieval/identification using topics 有权
    使用主题的文件检索/识别

    公开(公告)号:US09483474B2

    公开(公告)日:2016-11-01

    申请号:US14615156

    申请日:2015-02-05

    摘要: A system for retrieving/identifying a document comprising text stored in a document repository is described. A memory stores a graphical structure comprising a first plurality of nodes each representing a person, and a second plurality of nodes each representing a document in the document repository, the nodes being connected by edges according to automatically observed interactions between the represented people and documents. At least some of the nodes have one or more annotations each denoting a topic. A node relatedness calculator computes distances between nodes of the graphical structure using the topic annotations. An input receives an identifier of a user who is represented by one of the first plurality of nodes. An identifier/retriever identifies one or more documents from the document repository by using the identifier and using the computed distances between nodes.

    摘要翻译: 描述用于检索/识别包含存储在文档库中的文本的系统。 存储器存储图形结构,其包括每个表示人的第一多个节点,以及每个表示文档库中的文档的第二多个节点,所述节点根据所代表的人和文档之间的自动观察到的交互而被边缘连接。 至少一些节点具有每个表示主题的一个或多个注释。 节点相关性计算器使用主题注释计算图形结构的节点之间的距离。 输入接收由第一多个节点之一表示的用户的标识符。 标识符/检索者通过使用标识符并且使用所计算的节点之间的距离来从文档存储库识别一个或多个文档。

    DOCUMENT RETRIEVAL/IDENTIFICATION USING TOPICS
    5.
    发明申请
    DOCUMENT RETRIEVAL/IDENTIFICATION USING TOPICS 有权
    文件检索/使用主题识别

    公开(公告)号:US20160232157A1

    公开(公告)日:2016-08-11

    申请号:US14615156

    申请日:2015-02-05

    IPC分类号: G06F17/30 G06K9/62 G06K9/00

    摘要: A system for retrieving/identifying a document comprising text stored in a document repository is described. A memory stores a graphical structure comprising a first plurality of nodes each representing a person, and a second plurality of nodes each representing a document in the document repository, the nodes being connected by edges according to automatically observed interactions between the represented people and documents. At least some of the nodes have one or more annotations each denoting a topic. A node relatedness calculator computes distances between nodes of the graphical structure using the topic annotations. An input receives an identifier of a user who is represented by one of the first plurality of nodes. An identifier/retriever identifies one or more documents from the document repository by using the identifier and using the computed distances between nodes.

    摘要翻译: 描述用于检索/识别包含存储在文档库中的文本的系统。 存储器存储图形结构,其包括每个表示人的第一多个节点,以及每个表示文档库中的文档的第二多个节点,所述节点根据所代表的人和文档之间的自动观察到的交互而被边缘连接。 至少一些节点具有每个表示主题的一个或多个注释。 节点相关性计算器使用主题注释计算图形结构的节点之间的距离。 输入接收由第一多个节点之一表示的用户的标识符。 标识符/检索者通过使用标识符并且使用所计算的节点之间的距离来从文档存储库识别一个或多个文档。

    Management of indexed data to improve content retrieval processing

    公开(公告)号:US11544502B2

    公开(公告)日:2023-01-03

    申请号:US16721652

    申请日:2019-12-19

    摘要: The present disclosure relates to processing operations configured to uniquely utilize indexing of content to improve content retrieval processing, particularly when working with large data sets. The techniques described herein enables efficient content retrieval when working with large data sets such as those that may be associated with a plurality of tenants of a data storage application/service. Among other technical advantages, the present disclosure is applicable to train a classifier using relevant samples based on text search in tenant-specific scenarios, where accurate searching can be executed for content associated with one or more tenant accounts of an application/service concurrently in milliseconds even in instances where there may be millions of documents to be searched. As an example, exemplary data shards may be generated and managed for efficient and scalable content retrieval processing including training of a classifier (e.g., artificial intelligence classifier) and real-time (or near real-time) query processing.