Systems and methods for anonymizing large scale datasets

    公开(公告)号:US12164673B2

    公开(公告)日:2024-12-10

    申请号:US18345657

    申请日:2023-06-30

    Applicant: Google LLC

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Systems and Methods for Anonymizing Large Scale Datasets

    公开(公告)号:US20250077709A1

    公开(公告)日:2025-03-06

    申请号:US18955530

    申请日:2024-11-21

    Applicant: Google LLC

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Systems and Methods to Detect Clusters in Graphs

    公开(公告)号:US20200065333A1

    公开(公告)日:2020-02-27

    申请号:US16610000

    申请日:2018-02-14

    Applicant: Google LLC

    Abstract: The present disclosure provides a new framework and associated techniques, referred to herein as “ego-splitting,” that enable the detection of clusters in graphs that are descriptive of networks, including highly complex networks. Ego-splitting leverages local structures within a graph known as ego-nets to de-couple overlapping clusters. For example, an ego-net can be the subgraph induced by the neighborhood of each node. Ego-splitting is a highly scalable and flexible framework, with provable theoretical guarantees. Ego-splitting reduces the complex overlapping clustering problem to a simpler and more amenable non-overlapping (also known as partitioning) problem. Ego-splitting enables the scaling of community detection to graphs with tens of billions of edges and outperforms previous solutions.

    Systems and methods to detect clusters in graphs

    公开(公告)号:US11829416B2

    公开(公告)日:2023-11-28

    申请号:US16610000

    申请日:2018-02-14

    Applicant: Google LLC

    CPC classification number: G06F16/9024 G06F16/906 G06F18/2163 G06F18/24147

    Abstract: The present disclosure provides a new framework and associated techniques, referred to herein as “ego-splitting,” that enable the detection of clusters in graphs that are descriptive of networks, including highly complex networks. Ego-splitting leverages local structures within a graph known as ego-nets to de-couple overlapping clusters. For example, an ego-net can be the subgraph induced by the neighborhood of each node. Ego-splitting is a highly scalable and flexible framework, with provable theoretical guarantees. Ego-splitting reduces the complex overlapping clustering problem to a simpler and more amenable non-overlapping (also known as partitioning) problem. Ego-splitting enables the scaling of community detection to graphs with tens of billions of edges and outperforms previous solutions.

    Systems and methods for anonymizing large scale datasets

    公开(公告)号:US11727147B2

    公开(公告)日:2023-08-15

    申请号:US17016788

    申请日:2020-09-10

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G06F16/285 G06N20/00

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Efficient on-device public-private computation

    公开(公告)号:US11574067B2

    公开(公告)日:2023-02-07

    申请号:US16774380

    申请日:2020-01-28

    Applicant: Google LLC

    Abstract: Example systems and methods enhance user privacy by performing efficient on-device public-private computation on a combination of public and private data, such as, for example, public and private graph data. In particular, the on-device public-private computation framework described herein can enable a device associated with an entity to efficiently compute a combined output that takes into account and is explicitly based upon a combination of data that is associated with the entity and data that is associated with one or more other entities that are private connections of the entity, all without revealing to a centralized computing system a set of locally stored private data that identifies the one or more other entities that are private connections of the entity.

    Systems and Methods for Anonymizing Large Scale Datasets

    公开(公告)号:US20230359769A1

    公开(公告)日:2023-11-09

    申请号:US18345657

    申请日:2023-06-30

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G06N20/00 G06F16/285

    Abstract: A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.

    Methods and systems for encoding graphs

    公开(公告)号:US11100688B2

    公开(公告)日:2021-08-24

    申请号:US16523612

    申请日:2019-07-26

    Applicant: Google LLC

    Abstract: The present disclosure is directed to encoding graphs. In particular, the methods and systems of the present disclosure can: receive data describing a first graph; and for each node, of one or more nodes, of the first graph, determine, based at least in part on data describing a second graph, and for each of multiple nodes of the second graph corresponding to the node of the first graph, a representation of a role of the node of the multiple nodes in a community to which the node of the multiple nodes belongs.

    Methods and Systems for Encoding Graphs
    9.
    发明申请

    公开(公告)号:US20200035002A1

    公开(公告)日:2020-01-30

    申请号:US16523612

    申请日:2019-07-26

    Applicant: Google LLC

    Abstract: The present disclosure is directed to encoding graphs. In particular, the methods and systems of the present disclosure can: receive data describing a first graph; and for each node, of one or more nodes, of the first graph, determine, based at least in part on data describing a second graph, and for each of multiple nodes of the second graph corresponding to the node of the first graph, a representation of a role of the node of the multiple nodes in a community to which the node of the multiple nodes belongs.

    Federated Privacy-Preserving Nearest-Neighbor Search (NNS)-Based Label Propagation on Shared Embedding Space

    公开(公告)号:US20250005149A1

    公开(公告)日:2025-01-02

    申请号:US18343132

    申请日:2023-06-28

    Applicant: Google LLC

    Abstract: For a plurality of iterations, entity detection information is obtained from one or more client computing devices. The entity detection information includes (a) information that indicates whether an entity detected at the client computing device is malicious, and (b) information that associates the entity with a particular subspace of a plurality of subspaces of an embedding space. The entity detection information received over the plurality of iterations is aggregated to obtain aggregated threat information, wherein the aggregated threat information is descriptive of a number of malicious entities and a total number of entities detected for each subspace of the plurality of subspaces. Based on the entity detection information subspace classification information is generated that identifies a first subspace of the plurality of subspaces as being a malicious subspace associated with malicious entities.

Patent Agency Ranking