摘要:
Approximate substring indexing is accomplished by decomposing each string in a database into overlapping “positional q-grams”, sequences of a predetermined length q, and containing information regarding the “position” of each q-gram within the string (i.e., 1st q-gram, 4th q-gram, etc.). An index is then formed of the tuples of the positional q-gram data (such as, for example, a B-tree index or a hash index). Each query applied to the database is similarly parsed into a plurality of positional q-grams (of the same length), and a candidate set of matches is found. Position-directed filtering is used to remove the candidates which have the q-grams in the wrong order and/or too far apart to form a “verified” output of matching candidates. If errors are permitted (defined in terms of an edit distance between each candidate and the query), an edit distance calculation can then be performed to produce the final set of matching strings.
摘要翻译:通过将数据库中的每个字符串分解为重叠的“位置q-gram”,预定长度q的序列,并且包含关于字符串中每个q-gram的“位置”的信息(即,1 st sup> q-gram,4 nd q-gram等)。 然后由位置q-gram数据(例如,B树索引或散列索引)的元组形成索引。 应用于数据库的每个查询也被类似地解析为多个位置q-gram(相同长度),并且找到候选的匹配集合。 位置定向滤波用于去除具有错误顺序的q-gram和/或相距太远的候选,以形成匹配候选的“验证”输出。 如果允许错误(根据每个候选者和查询之间的编辑距离定义),则可以执行编辑距离计算以产生最终匹配的字符串。
摘要:
Approximate substring indexing is accomplished by decomposing each string in a database into overlapping “positional q-grams”, sequences of a predetermined length q, and containing information regarding the “position” of each q-gram within the string (i.e., 1st q-gram, 4th q-gram, etc.). An index is then formed of the tuples of the positional q-gram data (such as, for example, a B-tree index or a hash index). Each query applied to the database is similarly parsed into a plurality of positional q-grams (of the same length), and a candidate set of matches is found. Position-directed filtering is used to remove the candidates which have the q-grams in the wrong order and/or too far apart to form a “verified” output of matching candidates. If errors are permitted (defined in terms of an edit distance between each candidate and the query), an edit distance calculation can then be performed to produce the final set of matching strings.
摘要翻译:通过将数据库中的每个字符串分解为重叠的“位置q-gram”,预定长度q的序列,并且包含关于字符串中每个q-gram的“位置”的信息(即,1 st sup> q-gram,4 nd q-gram等)。 然后由位置q-gram数据(例如,B树索引或散列索引)的元组形成索引。 应用于数据库的每个查询也被类似地解析为多个位置q-gram(相同长度),并且找到候选的匹配集合。 位置定向滤波用于去除具有错误顺序的q-gram和/或相距太远的候选,以形成匹配候选的“验证”输出。 如果允许错误(根据每个候选者和查询之间的编辑距离定义),则可以执行编辑距离计算以产生最终匹配的字符串。
摘要:
A messaging system in which a core messaging infrastructure stores and manages messaging attributes, but applications external to the core infrastructure define and modify most attributes. Attribute types may be easily defined or modified, the manner in which attribute values are obtained may be easily defined or modified, and the entity types to which attributes are assigned may be easily defined or modified. The messaging system includes a plurality of messaging entities, such as messages, folders, and users, a plurality of attributes associated with the messaging entities, and a plurality of applications. Each application is operable to examine and modify at least some of the messaging entities and attributes. An application selection device is operable to examine at least some of the messaging entities and at least some of the attributes and to select an application to be invoked, from among the plurality of applications, based on values of the examined messaging entities and attributes. An application invocation device invokes the selected application. The applications may define and modify a type of an attribute and/or may define and modify an association of an attribute with a messaging entity.
摘要:
A messaging system, and method of operation thereof, which supports combinations of directory and mailing list addressing mechanisms. Intended message recipients are specified as declarative addresses, which may include combinations of directory and mailing list information. The messaging system includes a messaging server and an address resolution module. The messaging server receives a message from a sender system and transmits the message to the recipient system. The address resolution module, which is coupled to the messaging server, receives a declarative address associated with the message, resolves the declarative address into at least one messaging address and transmits the at least one messaging address to the messaging server. In one embodiment, a database system may be coupled to the address resolution module to allow address resolution based on information stored in a database. The address resolution module generates a database query based on the declarative address and transmits the generated query to a database system. The database system receives a database query, retrieves at least one messaging address specified by the query and transmits the retrieved at least one messaging address to the address resolution module.
摘要:
A messaging system, and method of operation thereof, which supports combinations of directory and mailing list addressing mechanisms. Intended message recipients are specified as declarative addresses, which may include combinations of directory and mailing list information. The messaging system includes a messaging server and an address resolution module. The messaging server receives a message from a sender system and transmits the message to the recipient system. The address resolution module, which is coupled to the messaging server, receives a declarative address associated with the message, resolves the declarative address into at least one messaging address and transmits the at least one messaging address to the messaging server. In one embodiment, a database system may be coupled to the address resolution module to allow address resolution based on information stored in a database. The address resolution module generates a database query based on the declarative address and transmits the generated query to a database system. The database system receives a database query, retrieves at least one messaging address specified by the query and transmits the retrieved at least one messaging address to the address resolution module.
摘要:
The present invention is a method and system for using materialized views to compute answers to SQL queries with grouping and aggregation. A query is evaluated a using a materialized view. The materialized view is semantically analyzed to determine whether the materialized view is usable in evaluating an input query. The semantic analysis includes determining that the materialized view does not project out any columns needed to evaluate the input query and determining that the view does not discard any tuple that satisfies a condition enforced in the input query. If the view is usable, the input query is rewritten to produce an output query that is multi-set equivalent to the input query and that specifies one or more occurrences of the materialized view as a source of information to be returned by the output query. The output query is then evaluated. The semantic analysis and rewriting may be iterated, with the output query of each iteration being the input query of the next iteration. The output query is evaluated after the last iteration.
摘要:
A method for estimating string-occurrence probability in a database comprises receiving a first probability of occurrence for each maximal substring from a plurality of substrings, each maximal substring in the plurality of substrings belonging to the string; obtaining an overall probability of occurrence; receiving a probability of occurrence for a maximal overlap of each maximal substring in the plurality of maximal substrings; obtaining a normalization factor; and dividing the overall probability of occurrence by the normalization factor to obtain the estimate.
摘要:
The present invention is a messaging system, and method of operation thereof, which provides message recipients with control over the delivery of message and charges the cost of a message to the sender of the message. A message is received at a messaging server from a sender system, the message including an indication of a recipient system. A notification message is transmitted to the recipient system, allowing the message recipient to determine whether they desire the message to be delivered. If so, an activation message is received from the recipient system and the message is transmitted to the recipient system. A charge for the message is assessed to the sender of the message. The message is stored in the messaging server until the activation message is received. At least a portion of the assessed charge may be credited or debited to the recipient of the message. The message may include any type of electronic information, such as text, graphics, video and audio information, and may be encrypted or unencrypted.
摘要:
Described is a system and method for receiving a data stream of multi-dimensional items, collecting a sample of the data stream having a predetermined number of items and dividing the sample into a plurality of subsamples, each subsample corresponding to a single dimension of each of the predetermined number of items. A query is then executed on a particular item in at least two of the subsamples to generate data for the corresponding subsample. This data is combined into a single value.
摘要:
A method including receiving a plurality of elements of a data stream, storing a multi-dimensional data structure in a memory, said multi-dimensional data structure storing the plurality of elements as a hierarchy of nodes, each node having a frequency count corresponding to the number of elements stored therein, comparing the frequency count of each node to a threshold value based on a total number of the elements stored in the nodes and identifying each node for which the frequency count is at least as great as the threshold value as a hierarchical heavy hitter (HHH) node and propagating the frequency count of each non-HHH nodes to its corresponding parent nodes.