-
公开(公告)号:US20250061472A1
公开(公告)日:2025-02-20
申请号:US18234577
申请日:2023-08-16
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Sarbajit K. Rakshit , Vijay Ekambaram
IPC: G06Q30/0201 , G06N20/00
Abstract: Mechanisms are provided for rendering content in a compacted view. A machine learning computer model is trained by a machine learning process to predict a user attention score for segments of content based on features of the content and historical user attention data. The trained machine learning computer model processes new content to associate with each segment, in a plurality of segments, of the new content, a corresponding user attention score. The segments, in the plurality of segments, of the new content are ranked relative to one another based on the corresponding user attention scores of the segments. A compacted view of the new content is rendered based on the ranking of the segments. A first number of segments are rendered in the compacted view and a second number of segments are not rendered in the compacted view, and are replaced with an inserted user selectable expansion element.
-
公开(公告)号:US11748382B2
公开(公告)日:2023-09-05
申请号:US16876660
申请日:2020-05-18
Applicant: International Business Machines Corporation
Inventor: Yannick Saillet , Namit Kabra , Mike W. Grasselt , Krishna Kishore Bonagiri
IPC: G06F16/28 , G06F16/2457 , G06F16/22 , G06N20/00 , G06F16/248 , G06F18/214 , G06N7/01
CPC classification number: G06F16/285 , G06F16/221 , G06F16/248 , G06F16/24573 , G06F18/214 , G06N7/01 , G06N20/00
Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.
-
公开(公告)号:US11687491B2
公开(公告)日:2023-06-27
申请号:US16037444
申请日:2018-07-17
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Manish A. Bhide
IPC: G06F16/17 , G06F16/174 , G06F17/10 , G06F16/903 , G06F18/2411
CPC classification number: G06F16/1748 , G06F16/90335 , G06F17/10 , G06F18/2411
Abstract: Data-deduplicating includes comparing a first record of a data-store with a second record of the data-store but instead of using a static weight for a field, the present data-deduplicating dynamically assigns a first weight for the first score to generate a first weighted score, wherein the first weight is based on one or both of the first value or the second value; and assigns a second weight for the second score to generate a second weighted score. A composite score is calculated based on the first weighted score and the second weighted score; and it is determined whether or not the first record and the second record are duplicate records, based on the composite score.
-
公开(公告)号:US20220222265A1
公开(公告)日:2022-07-14
申请号:US17145458
申请日:2021-01-11
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Ritesh Kumar Gupta , Ron Reuben , Vijay Ekambaram , Smitkumar Narotambhai Marvaniya
IPC: G06F16/25 , G06F16/23 , G06F16/951 , G06F16/215 , G06N20/00
Abstract: A computer-implemented method applies insights from a variety of data sources to each of the data sources. The method includes identifying a set of data sources, wherein each of the data sources are associated with a domain. The method includes analyzing documentation for each of the data sources. The method further includes extracting a set of attributes for each data source, and determining a data schema associated with each data source. The method includes mapping each data schema to a common domain schema. The method also includes linking, based on the mapping and on the set of attributes for each data source, common features across each data source. The method includes generating, in response to the linking, a knowledge graph. The method further includes preparing a visual display for a set of domain insights; and forking the set of domain insights into a first data source.
-
公开(公告)号:US20220028168A1
公开(公告)日:2022-01-27
申请号:US16934280
申请日:2020-07-21
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Smitkumar Narotambhai Marvaniya , Yannick Saillet , Kunjavihari Madhav Kashalikar
IPC: G06T19/00 , G06K9/00 , G02B27/01 , G06F3/0481 , G06F3/0484
Abstract: Aspects of the present disclosure relate to controlling virtual reality (VR) content displayed on a VR head mounted display (HMD). Communication can be established between a computer system, a VR HMD, and a mobile device. A user input configured to control VR content displayed on a display of the VR HMD can be received on the mobile device. The VR content displayed on the VR HMD can then be controlled based on the user input received on the mobile device.
-
公开(公告)号:US10452627B2
公开(公告)日:2019-10-22
申请号:US15171200
申请日:2016-06-02
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Yannick Saillet
IPC: G06F7/00 , G06F16/215 , G06F16/21 , G06F16/174
Abstract: A computer system with the capability to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.
-
公开(公告)号:US20190179888A1
公开(公告)日:2019-06-13
申请号:US15838463
申请日:2017-12-12
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Yannick Saillet , Martin Oberhofer , Namit Kabra
Abstract: A method for generating data standardization rules includes receiving a training data set containing tokenized and tagged data values. A set of machine mining models is built using different learning algorithms for identifying tags and tag patterns using the training set. For each data value in a further data set: a tokenization is applied on the data value, resulting in a set of tokens. For each token of the set of tokens one or more tag candidates are determined using a lookup dictionary of tags and tokens and/or at least part of the set of machine mining models, resulting for each token of the set of tokens in a list of possible tags. Unique combinations of the sets of tags of the further data set having highest aggregated confidence values are provided for use as standardization rules.
-
公开(公告)号:US20180137189A1
公开(公告)日:2018-05-17
申请号:US15349421
申请日:2016-11-11
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Yannick Saillet
IPC: G06F17/30
CPC classification number: G06F16/285 , G06F16/215 , G06F16/24553 , G06F16/24578 , G06F16/258
Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
-
公开(公告)号:US20180137151A1
公开(公告)日:2018-05-17
申请号:US15831575
申请日:2017-12-05
Applicant: International Business Machines Corporation
Inventor: Namit Kabra , Yannick Saillet
IPC: G06F17/30
Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
-
公开(公告)号:US20160162507A1
公开(公告)日:2016-06-09
申请号:US14561927
申请日:2014-12-05
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Ritesh K. Gupta , Namit Kabra , Manish Kumar , Srinivas K. Mittapalli
IPC: G06F17/30
CPC classification number: G06F16/215
Abstract: In an approach to identifying duplicates in data, one or more computer processors receive a request from a user to identify duplicates in a data set. The one or more computer processors retrieve the data set utilizing data discovery. The one or more computer processors perform data profiling on the data set. The one or more computer processors determine one or more domain types of the data set, based, at least in part, on the performed data profiling. The one or more computer processors perform data standardization on the data set, based, at least in part, on the one or more determined domain types. Responsive to performing data standardization, the one or more computer processors perform probabilistic matching on the data set. The one or more computer processors to identify two or more duplicates in the data set, based, at least in part, on the probabilistic matching.
Abstract translation: 在识别数据中的重复的方法中,一个或多个计算机处理器从用户接收请求以识别数据集中的重复。 一个或多个计算机处理器利用数据发现来检索数据集。 一个或多个计算机处理器对数据集进行数据分析。 所述一个或多个计算机处理器至少部分地基于所执行的数据分析来确定所述数据集的一个或多个域类型。 一个或多个计算机处理器至少部分地基于一个或多个确定的域类型来对数据集执行数据标准化。 响应于执行数据标准化,一个或多个计算机处理器对数据集执行概率匹配。 所述一个或多个计算机处理器至少部分地基于概率匹配来识别所述数据集中的两个或更多个重复项。
-
-
-
-
-
-
-
-
-