-
公开(公告)号:US08527518B2
公开(公告)日:2013-09-03
申请号:US12970766
申请日:2010-12-16
IPC分类号: G06F17/30
CPC分类号: G06F17/30622
摘要: A search query for a collection of electronic documents is parsed to identify one or more terms and such identified terms are associated with one or more languages (i.e., spoken languages such as English, German, Spanish, etc.). A terms inverted index and a language inverted index are accessed to identify documents responsive to the query. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 解析用于电子文档集合的搜索查询以识别一个或多个术语,并且这些识别的术语与一种或多种语言(即,口语,例如英语,德语,西班牙语等)相关联。 访问术语反向索引和语言反转索引以识别响应于查询的文档。 还描述了相关设备,系统,技术和物品。
-
公开(公告)号:US20120158718A1
公开(公告)日:2012-06-21
申请号:US12970766
申请日:2010-12-16
IPC分类号: G06F17/30
CPC分类号: G06F17/30622
摘要: A search query for a collection of electronic documents is parsed to identify one or more terms and such identified terms are associated with one or more languages (i.e., spoken languages such as English, German, Spanish, etc.). A terms inverted index and a language inverted index are accessed to identify documents responsive to the query. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 解析用于电子文档集合的搜索查询以识别一个或多个术语,并且这些识别的术语与一种或多种语言(即,口语,例如英语,德语,西班牙语等)相关联。 访问术语反向索引和语言反转索引以识别响应于查询的文档。 还描述了相关设备,系统,技术和物品。
-
公开(公告)号:US20090089256A1
公开(公告)日:2009-04-02
申请号:US12056856
申请日:2008-03-27
申请人: Frederik Transier , Peter Sanders
发明人: Frederik Transier , Peter Sanders
CPC分类号: G06F17/30613
摘要: A method, in some embodiments, may include mapping, for a collection of documents, each term and each document to an integer value to obtain document identifiers (IDs) and term identifiers (IDs), respectively; storing an indication of each term ID in a document-grained inverted index; storing positional information for each term ID in a separate data structure other than the document-grained inverted index; determining a list of all the term IDs of each document without duplicates and without preserving an original order of the terms; and reconstructing a document from the collection of documents based on the list of all the term IDs of each document and the mapped term IDs and document IDs.
摘要翻译: 在一些实施例中,方法可以包括将文档集合中的每个术语和每个文档映射到整数值,以分别获得文档标识符(ID)和术语标识符(ID); 将每个术语ID的指示存储在文档粒度反转索引中; 将每个术语ID的位置信息存储在除了文档粒度反转索引之外的单独的数据结构中; 确定每个文档的所有术语ID的列表,而不重复,而不保留术语的原始顺序; 并且基于每个文档的所有术语ID和映射的术语ID和文档ID的列表来从文档集合重建文档。
-
公开(公告)号:US20130290345A1
公开(公告)日:2013-10-31
申请号:US13926917
申请日:2013-06-25
申请人: Frederik Transier , Franz Faerber
发明人: Frederik Transier , Franz Faerber
IPC分类号: G06F17/30
CPC分类号: G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30622
摘要: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 单独提供术语和术语分隔符的反向索引,以最大限度地减少数据冗余。 解析搜索查询以识别术语和术语分隔符(如果有),并搜索相应的反向索引以获得响应文档。 还描述了相关设备,系统,技术和物品。
-
公开(公告)号:US09009155B2
公开(公告)日:2015-04-14
申请号:US13651718
申请日:2012-10-15
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30286 , G06F17/30867
摘要: A system, method and medium may provide determination of a first plurality of a plurality of data records assigned to a first processing unit, identification of a first record of the first plurality of data records, the first record associated with a first key value, generation of a first dictionary entry of a first dictionary for the first key value, storage of a first identifier of the first record as a tail identifier and as a head identifier in the first dictionary entry, storage an end flag in a first shared memory location, the first shared memory location associated with the first record, identification of a second record of the first plurality of data records, the second record associated with the first key value, replacement of the tail identifier in the first dictionary entry with a second identifier of the second record, and storage of the first identifier in a second shared memory location, the second shared memory location associated with the second record.
摘要翻译: 系统,方法和介质可以提供分配给第一处理单元的第一多个数据记录的确定,第一多个数据记录的第一记录的识别,与第一关键值相关的第一记录,生成 对于第一键值的第一字典的第一字典条目,将第一记录的第一标识符作为尾标识符存储,并将其作为头标识符存储在第一字典条目中,将结束标志存储在第一共享存储器位置中, 与第一记录相关联的第一共享存储器位置,第一多个数据记录的第二记录的识别,与第一密钥值相关联的第二记录,用第二标识符替换第一字典条目中的尾标识符 第二记录和在第二共享存储器位置中存储第一标识符,第二共享存储器位置与第二记录相关联。
-
公开(公告)号:US08805808B2
公开(公告)日:2014-08-12
申请号:US13926917
申请日:2013-06-25
申请人: Frederik Transier , Franz Faerber
发明人: Frederik Transier , Franz Faerber
IPC分类号: G06F17/30
CPC分类号: G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30622
摘要: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 单独提供术语和术语分隔符的反向索引,以最大限度地减少数据冗余。 解析搜索查询以识别术语和术语分隔符(如果有),并搜索相应的反向索引以获得响应文档。 还描述了相关设备,系统,技术和物品。
-
公开(公告)号:US20130290327A1
公开(公告)日:2013-10-31
申请号:US13651718
申请日:2012-10-15
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30286 , G06F17/30867
摘要: A system, method and medium may provide determination of a first plurality of a plurality of data records assigned to a first processing unit, identification of a first record of the first plurality of data records, the first record associated with a first key value, generation of a first dictionary entry of a first dictionary for the first key value, storage of a first identifier of the first record as a tail identifier and as a head identifier in the first dictionary entry, storage an end flag in a first shared memory location, the first shared memory location associated with the first record, identification of a second record of the first plurality of data records, the second record associated with the first key value, replacement of the tail identifier in the first dictionary entry with a second identifier of the second record, and storage of the first identifier in a second shared memory location, the second shared memory location associated with the second record.
摘要翻译: 系统,方法和介质可以提供分配给第一处理单元的第一多个数据记录的确定,第一多个数据记录的第一记录的识别,与第一关键值相关的第一记录,生成 对于第一键值的第一字典的第一字典条目,将第一记录的第一标识符作为尾标识符存储,并将其作为头标识符存储在第一字典条目中,将结束标志存储在第一共享存储器位置中, 与第一记录相关联的第一共享存储器位置,第一多个数据记录的第二记录的识别,与第一密钥值相关联的第二记录,用第二标识符替换第一字典条目中的尾标识符 第二记录和在第二共享存储器位置中存储第一标识符,第二共享存储器位置与第二记录相关联。
-
公开(公告)号:US08498972B2
公开(公告)日:2013-07-30
申请号:US12970780
申请日:2010-12-16
申请人: Frederik Transier , Franz Faerber
发明人: Frederik Transier , Franz Faerber
IPC分类号: G06F17/30
CPC分类号: G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30622
摘要: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 单独提供术语和术语分隔符的反向索引,以最大限度地减少数据冗余。 解析搜索查询以识别术语和术语分隔符(如果有),并搜索相应的反向索引以获得响应文档。 还描述了相关设备,系统,技术和物品。
-
公开(公告)号:US20130138628A1
公开(公告)日:2013-05-30
申请号:US13742034
申请日:2013-01-15
IPC分类号: G06F17/30
CPC分类号: G06F17/30466 , G06F17/3033 , G06F17/30445
摘要: According to some embodiments, a system and method for a parallel join of relational data tables may be provided by calculating, by a plurality of concurrently executing execution threads, hash values for join columns of a first input table and a second input table; storing the calculated hash values in a set of disjoint thread-local hash maps for each of the first input table and the second input table; merging the set of thread-local hash maps of the first input table, by a second plurality of execution threads operating concurrently, to produce a set of merged hash maps; comparing each entry of the merged hash maps to each entry of the set of thread-local hash maps for the second input table to determine whether there is a match, according to a join type; and generating an output table including matches as determined by the comparing.
摘要翻译: 根据一些实施例,可以通过由多个并发执行执行线程计算第一输入表和第二输入表的连接列的散列值来提供用于关系数据表的并行连接的系统和方法; 将所计算的散列值存储在所述第一输入表和所述第二输入表中的每一个的一组不相交的线程局部散列图中; 通过并行操作的第二多个执行线程来合并第一输入表的一组线程局部散列图,以产生一组合并的散列图; 将合并的散列映射的每个条目与第二输入表的线程局部散列映射集合的每个条目进行比较,以根据连接类型确定是否存在匹配; 以及生成包括由所述比较确定的匹配的输出表。
-
公开(公告)号:US20120158782A1
公开(公告)日:2012-06-21
申请号:US12970780
申请日:2010-12-16
申请人: Frederik Transier , Franz Faerber
发明人: Frederik Transier , Franz Faerber
IPC分类号: G06F17/30
CPC分类号: G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30622
摘要: Inverted indexes for terms and for term separators are separately provided to minimize data redundancy. Search queries are parsed to identify terms and term separators, if any, and the corresponding inverted indexes are searched for responsive documents. Related apparatus, systems, techniques and articles are also described.
摘要翻译: 单独提供术语和术语分隔符的反向索引,以最大限度地减少数据冗余。 解析搜索查询以识别术语和术语分隔符(如果有),并搜索相应的反向索引以获得响应文档。 还描述了相关设备,系统,技术和物品。
-
-
-
-
-
-
-
-
-