Support vector machines (SVMs) have proved to be highly successful for use in many applications that require classification of data. However, training an SVM requires solving an optimization problem that is quadratic in the number of training examples. This is increasingly becoming a bottleneck for SVMs because while the size of the datasets is increasing, especially in applications such as bioinformatics, single-node processing power has leveled off in recent years. One possible solution to these trends lies in solving SVMs on multiple computing cores or on computing clusters.
We introduce a new parallel SVM solver based on the Forgetron algorithm. We compare this solver to a previously proposed parallel SVM solver and to a single node solver. The comparison covers accuracy, speed, and the ability to process large datasets. We show that while none of these solvers performs well on all three metrics, each of them ranks high on two of them. Based on these findings we discuss how practitioners should choose the most appropriate SVM solver, based on their requirements.
By: Haggai Toledano; Elad Yom-Tov; Dan Pelleg; Edwin Pednault; Ramesh Natarajan
Published in: H-0260 in 2008
LIMITED DISTRIBUTION NOTICE:
This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.
Questions about this service can be mailed to firstname.lastname@example.org .