Machine learning classification on hardware accelerators with stacked memory
Abstract:
A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.
Information query
Patent Agency Ranking
0/0