-
1.
公开(公告)号:US20250094398A1
公开(公告)日:2025-03-20
申请号:US18885531
申请日:2024-09-13
Applicant: Oracle International Corporation
Inventor: Aleksandra Czarlinska , Saurabh Naresh Netravalkar , Denis B. Mukhin , Harichandan Roy , Zhen Hua Liu , Sebastian de la Hoz Luna , Beda Christoph Hammerschmidt , George R. Krupka , Bo Xia , David Chih-Wei Jiang
Abstract: Techniques for a unified relational database framework for hybrid vector search are provided. In one technique, multiple documents are accessed and a vector table and a text table are generated. For each accessed document, data within the document is converted to plaintext, multiple chunks are generated based on the plaintext, an embedding model generates a vector for each of the chunks, the vectors are stored in the vector table along with a document identifier that identifies the accessed document, tokens are generated based on the plaintext, the tokens are stored in the text table along with the document identifier. Such processing may be performed in a database system in response to a single database statement to create a hybrid index. In response to receiving a hybrid query, a vector query and a text query are generated and executed and the respective results may be combined.
-
公开(公告)号:US20190243926A1
公开(公告)日:2019-08-08
申请号:US15891145
申请日:2018-02-07
Applicant: Oracle International Corporation
Inventor: Rahul Manohar Kadwe , Saurabh Naresh Netravalkar
IPC: G06F17/30
CPC classification number: G06F16/90344 , G06F16/2255 , G06F16/243
Abstract: Techniques herein improve computational efficiency for wildcard searches by using numeric string hashes. In an embodiment, a plurality of query K-gram tokens for a term in a query are generated. Using a first index, an intersection of hash tokens is determined, wherein said first index indexes each query K-gram token of said K-gram tokens to a respective subset of hash tokens of a plurality of hash tokens, each of hash token of said plurality of hash tokens corresponding to a term found in one or more documents of a corpus of documents. The intersection of hash tokens comprises only hash tokens indexed to all of said plurality of query K-gram tokens by said first index. Using a second index, documents of said corpus of documents that contain said term are determined, said second index indexing said hash tokens to a plurality of terms in said corpus of documents and for each term of said plurality of terms, a respective subset of documents of corpus of documents that contain said each term.
-
公开(公告)号:US12271378B2
公开(公告)日:2025-04-08
申请号:US18367722
申请日:2023-09-13
Applicant: Oracle International Corporation
Inventor: Saurabh Naresh Netravalkar , Aleksandra Czarlinska , Zhen Hua Liu , Beda Christoph Hammerschmidt
IPC: G06F16/20 , G06F16/2453
Abstract: Techniques are provided for creating a “ubiquitous search index” which allows for full-text as well as value range-based search across all columns from multiple database tables, multiple user-defined unmaterialized views, and external sources. In one implementation, the data is indexed in a peculiarly constructed schema-based JSON format without duplicating data. The techniques maintain eventual consistency with the normalized source of truth database tables, and do not have a significant impact on the performance of transactional Data Manipulation Language (DML) operations.
-
公开(公告)号:US20240378199A1
公开(公告)日:2024-11-14
申请号:US18367722
申请日:2023-09-13
Applicant: Oracle International Corporation
Inventor: Saurabh Naresh Netravalkar , Aleksandra Czarlinska , Zhen Hua Liu , Beda Christoph Hammerschmidt
IPC: G06F16/2453
Abstract: Techniques are provided for creating a “ubiquitous search index” which allows for full-text as well as value range-based search across all columns from multiple database tables, multiple user-defined unmaterialized views, and external sources. In one implementation, the data is indexed in a peculiarly constructed schema-based JSON format without duplicating data. The techniques maintain eventual consistency with the normalized source of truth database tables, and do not have a significant impact on the performance of transactional Data Manipulation Language (DML) operations.
-
公开(公告)号:US11188594B2
公开(公告)日:2021-11-30
申请号:US15891145
申请日:2018-02-07
Applicant: Oracle International Corporation
Inventor: Rahul Manohar Kadwe , Saurabh Naresh Netravalkar
IPC: G06F16/903 , G06F16/22 , G06F16/242 , G06F16/31
Abstract: Techniques herein improve computational efficiency for wildcard searches by using numeric string hashes. In an embodiment, a plurality of query K-gram tokens for a term in a query are generated. Using a first index, an intersection of hash tokens is determined, wherein said first index indexes each query K-gram token of said K-gram tokens to a respective subset of hash tokens of a plurality of hash tokens, each of hash token of said plurality of hash tokens corresponding to a term found in one or more documents of a corpus of documents. The intersection of hash tokens comprises only hash tokens indexed to all of said plurality of query K-gram tokens by said first index. Using a second index, documents of said corpus of documents that contain said term are determined, said second index indexing said hash tokens to a plurality of terms in said corpus of documents and for each term of said plurality of terms, a respective subset of documents of corpus of documents that contain said each term.
-
-
-
-