Abstract:
An abnormality detection apparatus (2000) handles tasks allocated to a plurality of processing servers (3200) as processing targets in a distribution system (3000) having the processing servers (3200). A history acquisition unit (2020) acquires progress history information which is information regarding progress of the plurality of tasks at a plurality of time point of recording. A target range determination unit (2040) determines a target range. A distribution calculation unit (2060) calculates a task speed distribution which is a probability distribution of processing speeds of the tasks using the progress history information regarding the plurality of tasks. An abnormality determination unit (2080) compares a processing speed of a task to be determined with the task speed distribution to thereby determine whether or not the processing speed of the task to be determined is abnormal.
Abstract:
To reduce the overall computation time of a batch of queries, multiple query optimization in SQL-on-Hadoop systems groups multiple MapReduce jobs converted from queries into a single one, thus avoiding redundant computations by taking sharing opportunities of data scan, map function and map output. SQL-on-Hadoop converts a query into a DAG of MapReduce jobs and each map function is a part of query plan composed of a sequence of relational operators. As each map function is a part of query plan which is usually complex and heavy, disclosed method creates a cost model to simulate the computation time which takes both I/O cost for reading/writing input file and intermediate data and CPU cost for the computation of map function into consideration. A heuristic algorithm is disclosed to find near-optimal integrated query plan for each group based on an observation that each query plan is locally optimal.
Abstract:
A method of extracting a combination of a drug and an adverse event related to the drug includes: for each of positive example combinations, negative example combinations and combinations that are neither positive examples nor negative examples, which are combinations of drug and disease, extracting medical events from medical information data about a patient and generating attribute data based on time-series information about the medical events; and learning a discriminant model based on attribute data of the positive and negative examples; and inputting attribute data corresponding to the combinations that are neither positive examples nor negative examples to the discriminant model to determine scores.
Abstract:
Even in circumstances where the size of training data is more than the memory size of a calculator, CD method can be used.A data management apparatus (101) according to the present invention includes a blocking unit (20) which divides training data representing matrix data into a plurality of blocks, and generates meta data indicating a column for which each block holds a value of the original training data, and a re-blocking unit (40) which, when a component of a parameter learned from the training data converges to zero, replaces an old block including an unnecessary column, among the plurality of blocks, with a block from which the unnecessary column has been removed, and regenerates the meta data.