Algorithm Speeds GPU-based AI Training 10x on Big Data Sets

发布时间：2017-12-06 00:00

作者：Ameya360

来源： R. Colin Johnson

阅读量：887

　　IBM Zurich researchers have developed a generic artificial-intelligence preprocessing building block for accelerating Big Data machine learning algorithms by at least 10 times over existing methods. The approach, which IBM presented Monday (Dec. 4) at the Neural Information Processing Systems conference (NIPS 2017) in Long Beach, Calif., uses mathematical duality to cherry-pick the items in a Big Data stream that will make a difference, ignoring the rest.

　　“Our motivation was how to use hardware accelerators, such as GPUs [graphic processing units] and FPGAs [field-programmable gate arrays], when they do not have enough memory to hold all the data points” for Big Data machine learning, IBM Zurich collaborator Celestine Dünner, co-inventor of the algorithm, told EE Times in advance of the announcement.

　　“To the best of our knowledge, we are first to have generic solution with a 10x speedup,” said co-inventor Thomas Parnell, an IBM Zurich mathematician. “Specifically, for traditional, linear machine learning models — which are widely used for data sets that are too big for neural networks to train on — we have implemented the techniques on the best reference schemes and demonstrated a minimum of a 10x speedup.”

　　IBM Zurich researcher Martin Jaggi at ?cole Polytechnique Fédérale de Lausanne (EPFL), also contributed to the machine learning preprocessing algorithm.

　　For their initial demonstration, the researchers used a single Nvidia Quadro M4000 GPU with 8 gigabytes of memory training on a 30-Gbyte data set of 40,000 photos using a support vector machine (SVM) algorithm that resolves the images into classes for recognition. The SVM algorithm also creates a geometric interpretation of the model learned (unlike neural networks, which cannot justify their conclusions). IBM’s data preprocessing method enabled the algorithm to run in less than a one minute, a tenfold speedup over existing methods using limited-memory training.

　　The key to the technique is preprocessing each data point to see if it is the mathematical dual of a point already processed. If it is, then the algorithm just skips it, a process that becomes increasingly frequent as the data set is processed. “We calculate the importance of each data point before it is processed by measuring how big the duality gap is,” Dünner said.

　　“If you can fit your problem in the memory space of the accelerator, then running in-memory will achieve even better results,” Parnell told EE Times. “So our results apply only to Big Data problems. Not only will it speed up execution time by 10 times or more, but if you are running in the cloud, you won’t have to pay as much.”

　　As Big Data sets grow, such time- and money-saving preprocessing algorithms will become increasingly important, according to IBM. To show that its duality-based algorithm works with arbitrarily large data sets, the company showed an eight-GPU version at NIPS that handles a billion examples of click-through data for web ads.

　　The researchers are developing the algorithm further for deployment in IBM’s Cloud. It will be recommended for Big Data sets involving social media, online marketing, targeted advertising, finding patterns in telecom data, and fraud detection.

　　For details, read Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems, by Dünner, Parnell, and Jaggi.

（备注：文章来源于网络，信息仅供参考，不代表本网站观点，如有侵权请联系删除！）

上一篇：半导体的下个十年,这些大咖怎么说?

在线留言询价

品牌

型号

数量

联系人

联系电话

邮箱

PART	数量*	目标价格
	数量最小起订量: 1	目标价格 $ 如不确定，可不填
remark

联系电话 *	姓名
公司
邮箱地址