Dynamic Condensing of Digital Content with Insertion of Expansion Elements

    公开(公告)号:US20250061472A1

    公开(公告)日:2025-02-20

    申请号:US18234577

    申请日:2023-08-16

    Abstract: Mechanisms are provided for rendering content in a compacted view. A machine learning computer model is trained by a machine learning process to predict a user attention score for segments of content based on features of the content and historical user attention data. The trained machine learning computer model processes new content to associate with each segment, in a plurality of segments, of the new content, a corresponding user attention score. The segments, in the plurality of segments, of the new content are ranked relative to one another based on the corresponding user attention scores of the segments. A compacted view of the new content is rendered based on the ranking of the segments. A first number of segments are rendered in the compacted view and a second number of segments are not rendered in the compacted view, and are replaced with an inserted user selectable expansion element.

    INSIGHT EXPANSION IN SMART DATA RETENTION SYSTEMS

    公开(公告)号:US20220222265A1

    公开(公告)日:2022-07-14

    申请号:US17145458

    申请日:2021-01-11

    Abstract: A computer-implemented method applies insights from a variety of data sources to each of the data sources. The method includes identifying a set of data sources, wherein each of the data sources are associated with a domain. The method includes analyzing documentation for each of the data sources. The method further includes extracting a set of attributes for each data source, and determining a data schema associated with each data source. The method includes mapping each data schema to a common domain schema. The method also includes linking, based on the mapping and on the set of attributes for each data source, common features across each data source. The method includes generating, in response to the linking, a knowledge graph. The method further includes preparing a visual display for a set of domain insights; and forking the set of domain insights into a first data source.

    Column weight calculation for data deduplication

    公开(公告)号:US10452627B2

    公开(公告)日:2019-10-22

    申请号:US15171200

    申请日:2016-06-02

    Abstract: A computer system with the capability to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.

    DATA STANDARDIZATION RULES GENERATION
    7.
    发明申请

    公开(公告)号:US20190179888A1

    公开(公告)日:2019-06-13

    申请号:US15838463

    申请日:2017-12-12

    Abstract: A method for generating data standardization rules includes receiving a training data set containing tokenized and tagged data values. A set of machine mining models is built using different learning algorithms for identifying tags and tag patterns using the training set. For each data value in a further data set: a tokenization is applied on the data value, resulting in a set of tokens. For each token of the set of tokens one or more tag candidates are determined using a lookup dictionary of tags and tokens and/or at least part of the set of machine mining models, resulting for each token of the set of tokens in a list of possible tags. Unique combinations of the sets of tags of the further data set having highest aggregated confidence values are provided for use as standardization rules.

    EFFICIENTLY FINDING POTENTIAL DUPLICATE VALUES IN DATA

    公开(公告)号:US20180137189A1

    公开(公告)日:2018-05-17

    申请号:US15349421

    申请日:2016-11-11

    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.

    COMPUTING THE NEED FOR STANDARDIZATION OF A SET OF VALUES

    公开(公告)号:US20180137151A1

    公开(公告)日:2018-05-17

    申请号:US15831575

    申请日:2017-12-05

    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.

    AUTOMATED DATA DUPLICATE IDENTIFICATION
    10.
    发明申请
    AUTOMATED DATA DUPLICATE IDENTIFICATION 审中-公开
    自动数据重复标识

    公开(公告)号:US20160162507A1

    公开(公告)日:2016-06-09

    申请号:US14561927

    申请日:2014-12-05

    CPC classification number: G06F16/215

    Abstract: In an approach to identifying duplicates in data, one or more computer processors receive a request from a user to identify duplicates in a data set. The one or more computer processors retrieve the data set utilizing data discovery. The one or more computer processors perform data profiling on the data set. The one or more computer processors determine one or more domain types of the data set, based, at least in part, on the performed data profiling. The one or more computer processors perform data standardization on the data set, based, at least in part, on the one or more determined domain types. Responsive to performing data standardization, the one or more computer processors perform probabilistic matching on the data set. The one or more computer processors to identify two or more duplicates in the data set, based, at least in part, on the probabilistic matching.

    Abstract translation: 在识别数据中的重复的方法中,一个或多个计算机处理器从用户接收请求以识别数据集中的重复。 一个或多个计算机处理器利用数据发现来检索数据集。 一个或多个计算机处理器对数据集进行数据分析。 所述一个或多个计算机处理器至少部分地基于所执行的数据分析来确定所述数据集的一个或多个域类型。 一个或多个计算机处理器至少部分地基于一个或多个确定的域类型来对数据集执行数据标准化。 响应于执行数据标准化,一个或多个计算机处理器对数据集执行概率匹配。 所述一个或多个计算机处理器至少部分地基于概率匹配来识别所述数据集中的两个或更多个重复项。

Patent Agency Ranking