摘要:
Disclosed herein are systems and methods for compressing structured or semi-structured data in a horizontal manner achieving compression ratios similar to vertical compression. Collections include structured or semi-structured data include a number of fields and are described using a schema. Fields include information having semantic similarity and are compressed using methods suitable for compressing the type of data. Data of a collection is compressed after fragmentation or may be normalized prior to compression. Data with semantic similarity is compressed using token tables and/or n-gram tables, where higher weighted, consisting of the product of frequency and length, occurring values may be stored in the lower numbered indices of the data table. Records include record descriptor bytes, field descriptor bytes, zero or more array descriptor bytes, zero or more object descriptor bytes, or bytes representing the data associated with the record. Data is indexed or compressed by a suitable module.
摘要:
A method for searching for related entities using entity co-occurrence is disclosed. Embodiments of the method may be employed in any search system that may include at least one search engine, at least one entity co-occurrence knowledge base, an entity extraction module, and at least an entity indexed corpus. The method may extract and disambiguate entities from search queries by using an entity co-occurrence knowledge base, find extracted entities in an entity indexed corpus and finally present search results as related entities of interest.
摘要:
A system and method for detecting events based on input data from a plurality of sources. The system may receive input from a plurality of sources containing information about possible events. A method for event detection involves pre-processing and normalizing a data input from a plurality of sources, extracting and disambiguating events and entities, associate event and entities, correlate events and entities associated from a data input to results from a different data sources to determine if an event has occurred, and store the detected events in a data storage.
摘要:
Disclosed are pluggable, distributed computing-system architectures allowing for embedding analytics to be added or removed from nodes of a system hosting an in-memory database. The disclosed system includes an API that may be used to create customized, application specific analytics modules. The newly created analytics modules may be easily plugged into the in-memory database. Each user query submitted to the in-memory database may specify different analytics be applied with differing parameters. All analytics modules operate on the in-memory image of the data, inside the in-memory database platform. All the analytics modules, may be capable of performing on-the-fly analytics, which may allow a dynamic and comprehensive processing of search results.
摘要:
Disclosed herein are systems and methods for compressing structured or semi-structured data in a horizontal manner achieving compression ratios similar to vertical compression. Collections include structured or semi-structured data include a number of fields and are described using a schema. Fields include information having semantic similarity and are compressed using methods suitable for compressing the type of data. Data of a collection is compressed after fragmentation or may be normalized prior to compression. Data with semantic similarity is compressed using token tables and/or n-gram tables, where higher weighted, consisting of the product of frequency and length, occurring values may be stored in the lower numbered indices of the data table. Records include record descriptor bytes, field descriptor bytes, zero or more array descriptor bytes, zero or more object descriptor bytes, or bytes representing the data associated with the record. Data is indexed or compressed by a suitable module.
摘要:
A method for generating search suggestions of related entities based on co-occurrence and/or fuzzy score matching is disclosed. The method may be employed in a search system that may include a client/server type architecture. The search system may include a user interface for a search engine in communication with one or more server devices over a network connection. The server device may include an entity extraction module, a fuzzy-score matching module, and an entity co-occurrence knowledge base database. In one embodiment, the search system may process a partial search query from a user and present search suggestions to complete the partial query. In another embodiment, the complete search query may be used as a new search query. The search system may process the new search query, run an entity extraction, find related entities from the entity co-occurrence knowledge base, and present said related entities in a drop down list.
摘要:
A method for generating search suggestions by using fuzzy-score matching and entity co-occurrence in a knowledge base is disclosed. Embodiments of the method may be employed in any search system that may include an entity extraction computer module that may perform partial entity extractions from provided search queries, a fuzzy-score matching computer module that may generate algorithms based on the type of entity extracted and perform a search against an entity co-occurrence knowledge base. The entity co-occurrence knowledge base, which may include a repository where entities may be indexed as entities to entities, entities to topics, or entities to facts among others, may return fast and accurate suggestions to the user to complete the search query. The suggestions may include alternates to the partial query provided by the user that may enhance and save time when performing searches.
摘要:
An in-memory database system and method for administrating a distributed in-memory database, comprising one or more nodes having modules configured to store and distribute database partitions of collections partitioned by a partitioner associated with a search conductor. Database collections are partitioned according to a schema. Partitions, collections, and records, are updated and removed when requested by a system interface, according to the schema. Supervisors determine a node status based on a heartbeat signal received from each node. Users can send queries through a system interface to search managers. Search managers apply a field processing technique, forward the search query to search conductors, and return a set of result records to the analytics agents. Analytics agents perform analytics processing on a candidate results records from a search manager. The search conductors comprising partitioners associated with a collection, search and score the records in a partition, then return a set of candidate result records after receiving a search query from a search manager.
摘要:
A method for entity-driven alerts based on disambiguated features, is disclosed. According to an embodiment, disclosed method may refer to entity-driven alerts based on trending or new knowledge of a disambiguated feature. The alerts may be sent to a user when new knowledge is discovered about the disambiguated feature, a new association (such as new features, facts, quotations, or topic IDs related, among others) with the feature of interest, and/or new trending changes are emerging about the feature of interest. According to various embodiments, method for entity-driven alerts based on disambiguated features may reduce the number of false positives resulting in a normal search query. Which in turn, may increase the efficiency of monitoring, allowing for broadened universe of alerts.
摘要:
Methods for non-exclusionary searching within clustered in-memory databases are disclosed. The non-exclusionary search methods may allow the execution of searches where the results may include records where fields specified in the query are not populated or defined. The disclosed methods include the application of fuzzy matching and scoring algorithms, which enables the system to search, score and compare records with different schemata. This may significantly improve the recall of relevant records.