摘要:
A data import system enables access to data of multiple types from multiple data sources of different formats and provides an interface for importing data into a data analysis system. The interface enables a user to customize the formatting of the data as the data is being imported into a data analysis system. A user may select first user defined options for operating on a first data set received during a data importation process. An intermediate representation of the data set is generated based on the user first defined options. A user may specify second user defined options based on the intermediate representation during the data importation process. The second user defined options are processed to produce a final data representation of the data set to be used for analysis of the data. The intermediate representation may be a data table. The processing of a data set may include merging a first and second data set to produce the final data representation. The second user defined options may enable a user to select a basic operation for merging the data sets or to select a non-basic operation for merging the data sets. The basic operation may combine data sets in response to a user's selection of a first graphical interface control, and the non-basic operation may combine the data sets based on user selection of at least two graphical interface controls from a group of graphical interface controls.
摘要:
Desired content, metadata, or both can be isolated from the full content of social media websites having content-rich pages. Achieving this can include obtaining from the content-rich pages a language-independent representation having a hierarchical structure of nodes and then generating a node representation for each node. Feature vectors for the nodes are generated and a label is assigned to each node representation according to a schema. Assignment can occur by executing a trained classification algorithm on the feature vectors. The schema has schema elements and each schema element corresponds to a label. For each schema element, all node representations having matching labels are gathered and then one node representation is elected from among those with matching labels to be assigned to a schema element field in a template. The template can be applied to extract desired content, metadata, or both according to the schema from all the content-rich pages.
摘要:
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data.Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.
摘要:
Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a data visualization method includes accessing a plurality of initial documents at a first moment in time, first processing the initial documents providing processed initial documents, first identifying a plurality of first associations of the initial documents using the processed initial documents, generating a first visualization depicting the first associations, accessing a plurality of additional documents at a second moment in time after the first moment in time, second processing the additional documents providing processed additional documents, second identifying a plurality of second associations of the additional documents and at least some of the initial documents, wherein the second identifying comprises identifying using the processed initial documents and the processed additional documents, and generating a second visualization depicting the second associations.
摘要:
Desired content, metadata, or both can be isolated from the full content of social media websites having content-rich pages. Achieving this can include obtaining from the content-rich pages a language-independent representation having a hierarchical structure of nodes and then generating a node representation for each node. Feature vectors for the nodes are generated and a label is assigned to each node representation according to a schema. Assignment can occur by executing a trained classification algorithm on the feature vectors. The schema has schema elements and each schema element corresponds to a label. For each schema element, all node representations having matching labels are gathered and then one node representation is elected from among those with matching labels to be assigned to a schema element field in a template. The template can be applied to extract desired content, metadata, or both according to the schema from all the content-rich pages.