摘要:
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data.Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.
摘要:
Systems and methods provide several enhancements for the viewing, analysis, and generation of landscape views in a data analysis system, including: allowing a user to select from multiple methods to generate a landscape view, providing labels for peaks of a landscape, enabling the user to replace labels displayed on the landscape view, enabling a landscape view to be recalculated based on the replacement labels, and allowing a user to switch or morph between two landscape views generated by different methods. Such methods or systems generate graphical landscape map visualizations from a set of data records.
摘要:
Systems for creating high-dimensional vectors representing sequence strings and biopolymer materials are provided. A first system for divides respective sequence strings into blocks of at least three units to create a vocabulary of blocks. A second system selects predefined domains of a plurality of items of biopolymer materials. A third system defines each item of biopolymer material in a data set of biopolymer materials as a surface using descriptors of at least one of structure and function. A fourth system compares information regarding each biopolymer material of a plurality of biopolymer materials to information regarding each other biopolymer material.
摘要:
A data import system enables access to data of multiple types from multiple data sources of different formats and provides an interface for importing data into a data analysis system. The interface enables a user to customize the formatting of the data as the data is being imported into a data analysis system. A user may select first user defined options for operating on a first data set received during a data importation process. An intermediate representation of the data set is generated based on the user first defined options. A user may specify second user defined options based on the intermediate representation during the data importation process. The second user defined options are processed to produce a final data representation of the data set to be used for analysis of the data. The intermediate representation may be a data table. The processing of a data set may include merging a first and second data set to produce the final data representation. The second user defined options may enable a user to select a basic operation for merging the data sets or to select a non-basic operation for merging the data sets. The basic operation may combine data sets in response to a user's selection of a first graphical interface control, and the non-basic operation may combine the data sets based on user selection of at least two graphical interface controls from a group of graphical interface controls.
摘要:
Methods and systems are provided that enable text in various sections of data records to be separately catalogued, indexed, or vectorized for analysis in a text visualization and mining system. A text processing system receives a plurality of data records, where each data record has one or a plurality of attribute fields associated with the records. The attributes fields containing textual information are identified. The specific textual content of each attribute field is identified. An index is generated that associates the textual content contained in each attribute field with the attribute field containing the textual content. The index is operable for use in text processing. The plurality of data records may be located in a data table and the textual information may be contained within cells of the data table. In another aspect, a plurality of data records is received, where at least some of the data records contain text terms. A first method is applied to weight text terms of the data records in a first manner to aid in distinguishing records from each other in response to selection of the first method. A second method is applied to weight text terms of the data records in a second manner to aid in distinguishing records from each other in response to selection of the second method. A vector is generated to distinguish each of the data records based on the text terms weighted by either the first or second method.
摘要:
Methods and apparatus allow a user to explore the information contained in sets of records and their associated attributes through the use of interactive surface maps. The records may contain various types of attributes, including text, numeric, categoric, and sequence data.