摘要:
A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
摘要:
A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
摘要:
This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
摘要:
This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
摘要:
Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.
摘要:
For a data processing system having memory for storing a database, a method, a system and a computer program product for directing the data processing system to process a record to be inserted into the database is disclosed. The database includes a plurality of base tables. The method includes the steps of making a record copy matching the record, for each base table to be selected from the plurality of base tables: providing a base table candidate indication for a selected base table, the base table candidate indication indicating whether the selected base table is a candidate base table that may receive the record, the base table candidate indication being determined on an outcome of executing before triggers and an outcome of testing constraints in association with the record copy, the before triggers and the constraints being associated with the selected base table; and restoring the record copy so that the record copy matches the record before providing a next subsequent base table candidate indication for another base table to be selected.
摘要:
Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.
摘要:
For a data processing system having memory for storing a database, a method, a system and a computer program product for directing the data processing system to process a record to be inserted into the database is disclosed. The database includes a plurality of base tables. The method includes the steps of making a record copy matching the record, for each base table to be selected from the plurality of base tables: providing a base table candidate indication for a selected base table, the base table candidate indication indicating whether the selected base table is a candidate base table that may receive the record, the base table candidate indication being determined on an outcome of executing before triggers and an outcome of testing constraints in association with the record copy, the before triggers and the constraints being associated with the selected base table; and restoring the record copy so that the record copy matches the record before providing a next subsequent base table candidate indication for another base table to be selected.
摘要:
Systems and methodologies for computation of multiple group by queries via an optimizer that examines the space of plans in a systematic and cost based manner. The optimizer includes a merging component to merge pairs of sub plans to facilitate a plan choice with a lowest cost. The merging component can take as input two sub plans (e.g., sub plan P1 with root node V1 and sub plan P2 with root node V2, wherein each sub plan is a sub-tree of a logical plan whose root node is directly pointed to a Relation “R”), to return a set of sub-plans as out put with a root node V1∪V2 that is the smallest relation from which both V1 and V2 can be computed.