Abstract:
An online system identifies an additional feature to evaluate for inclusion in a machine learned model. The additional feature is based on characteristics of one or more dimensions of information maintained by the online system. To generate data for evaluating the additional feature, the online system generates various partitions of stored data, where each partition includes characteristics associated with one or more dimensions on which the additional feature is based. Using values of characteristics in a partition, the online system generates values for the additional feature and includes the values of the additional feature in the partition. Values for the additional feature are generated for various partitions based on the values of characteristics in each partition. The online system combines multiple partitions that include values for the additional feature to generate a training set for evaluating a machine learned model including the additional feature.
Abstract:
A computer system is optimized for implementing a neural network nodal graph that has dense inputs and sparse inputs. The computer system has a local machine that receives user inputs and is optimized for computing power, and has a remote machine that stores embedding matrices and parameters, and is optimized for memory capacity. In accordance with a cost function applied to each node, the neural network nodal graph is divided into graph segments based on its types of inputs and needed computing resources for execution. In accordance with the cost functions, the graph segments are divided between the remote and local machines for execution, and the results of all the graph segments are combined in the local machine.
Abstract:
The present disclosure is directed to a high-capacity training and prediction machine learning platform that can support high-capacity parameter models (e.g., with 10 billion parameters). The platform generates a model for a metric of interest based on a known training set. The model includes parameters indicating importances of different features of the model, taken both singly and in pairs. The model may be applied to predict a value for the metric for given sets of objects, such as for a pair consisting of a user object and a content item object.
Abstract:
An online system identifies an additional feature to evaluate for inclusion in a machine learned model. The additional feature is based on characteristics of one or more dimensions of information maintained by the online system. To generate data for evaluating the additional feature, the online system generates various partitions of stored data, where each partition includes characteristics associated with one or more dimensions on which the additional feature is based. Using values of characteristics in a partition, the online system generates values for the additional feature and includes the values of the additional feature in the partition. Values for the additional feature are generated for various partitions based on the values of characteristics in each partition. The online system combines multiple partitions that include values for the additional feature to generate a training set for evaluating a machine learned model including the additional feature.
Abstract:
The present disclosure is directed to a high-capacity training and prediction machine learning platform that can support high-capacity parameter models (e.g., with 10 billion parameters). The platform generates a model for a metric of interest based on a known training set. The model includes parameters indicating importances of different features of the model, taken both singly and in pairs. The model may be applied to predict a value for the metric for given sets of objects, such as for a pair consisting of a user object and a content item object.
Abstract:
An online system identifies an additional feature to evaluate for inclusion in a machine learned model. The additional feature is based on characteristics of one or more dimensions of information maintained by the online system. To generate data for evaluating the additional feature, the online system generates various partitions of stored data, where each partition includes characteristics associated with one or more dimensions on which the additional feature is based. Using values of characteristics in a partition, the online system generates values for the additional feature and includes the values of the additional feature in the partition. Values for the additional feature are generated for various partitions based on the values of characteristics in each partition. The online system combines multiple partitions that include values for the additional feature to generate a training set for evaluating a machine learned model including the additional feature.