Australia
December 1, 2021
1931-1941
December 1, 2021

Support Vector Machines

A support vector machine (SVM) is a computer algorithm that learns by example to assign labels/points to objects(1). A simple way to classify the points is to draw a straight line and call points lying on one side positive and on the other side negative. If both sets are well separated, we could intuitively draw the separating line such that it is as far as possible away from the points in the sets. Instead of the abstract idea of points in space, we can also think of our data points as representing objects using a set of features derived from measurements performed on each object.(2)

SVM has been successfully used in an increasing variety of biological applications: automatic distribution of microarray gene expression profiles, analysis of objects as diverse as protein and DNA sequences, microarray expression profiles, and mass spectra, among others.

Linear Classifiers - Hyperplanes

SVM is an example of a linear two-class classifier.  In two dimensions, the dots correspond to a line through the origin; in three dimensions, a plane (generally a hyperplane.(1)

The boundary (line) between regions described as positive and negative is named the decision boundary of the classifier. A classifier that has a linear decision boundary is better known as a linear classifier.

Large margin separation: Whenever a data set is linearly separable, there is a hyperplane,  but there can be many. The most accurate choice would be the hard margin support vector (fig1) which is the classifier with the maximum margin between all classifiers that correctly introduces all the input examples.

Soft margin: In practice, data are not usually linearly separable; and even if they are, a bigger margin can be accomplished by allowing the classifier to misclassify some dots. Introducing the soft margin needs user-specified parameters that can manage how many examples are allowed to disrupt the separating hyperplane and how far across the line they are permitted to go.

SVM is an example of a linear two-class classifier.  In two dimensions, the dots correspond to a line through the origin; in three dimensions, a plane (generally a hyperplane.(1)

The boundary (line) between regions described as positive and negative is named the decision boundary of the classifier. A classifier that has a linear decision boundary is better known as a linear classifier.

Large margin separation: Whenever a data set is linearly separable, there is a hyperplane,  but there can be many. The most accurate choice would be the hard margin support vector (fig1) which is the classifier with the maximum margin between all classifiers that correctly introduces all the input examples.

Soft margin: In practice, data are not usually linearly separable; and even if they are, a bigger margin can be accomplished by allowing the classifier to misclassify some dots. Introducing the soft margin needs user-specified parameters that can manage how many examples are allowed to disrupt the separating hyperplane and how far across the line they are permitted to go.

Nonlinear Classifiers - Kernels

Nonlinear classifiers have shown better accuracy. There is a direct pathway to turn a linear classifier into a nonlinear. It is all about mapping the data into a vector space. The most practical example is one that gathers all products of pairs of features. If variables where all exponents are whole numbers, monomials, were used, the dimensionality would be exponential. If the data is high-dimensional to begin with (like gene expression data), this is not acceptable.(2) Kernel methods dodge this complexity by evading the step of mapping the data to a high dimensional feature space (fig2).

Using Kernels for real-valued data (data are vectors of a given dimensionality) is common in bioinformatics and other fields. The most frequently used are the polynomial and the Gaussian kernel.

Additional advantages of kernels include them being able to be programmed on inputs that are not vectors (ability to manage nonvector data), allowing the SVM to sort DNA and protein sequences, protein-protein interaction networks, and microscopy images. Second, kernels provide a mathematical solution for mixing different types of data. We can mix a kernel on microarray data with one on mass spectrometry data. The final kernel would let us train a single SVM to analyze both types of data simultaneously.

Nonlinear classifiers have shown better accuracy. There is a direct pathway to turn a linear classifier into a nonlinear. It is all about mapping the data into a vector space. The most practical example is one that gathers all products of pairs of features. If variables where all exponents are whole numbers, monomials, were used, the dimensionality would be exponential. 

If the data is high-dimensional to begin with (like gene expression data), this is not acceptable.(2) Kernel methods dodge this complexity by evading the step of mapping the data to a high dimensional feature space (fig2).

Using Kernels for real-valued data (data are vectors of a given dimensionality) is common in bioinformatics and other fields. The most frequently used are the polynomial and the Gaussian kernel.

Additional advantages of kernels include them being able to be programmed on inputs that are not vectors (ability to manage nonvector data), allowing the SVM to sort DNA and protein sequences, protein-protein interaction networks, and microscopy images. Second, kernels provide a mathematical solution for mixing different types of data. We can mix a kernel on microarray data with one on mass spectrometry data. The final kernel would let us train a single SVM to analyze both types of data simultaneously.

Future Perspectives

An important aspect of medical research is the prediction of health outcomes and the scientific identification of important factors. Lately, a novel model selection strategy has been carried out. The Stepwise Support Vector Machine (StepSVM) is based on the SVM to conduct a modified stepwise selection. Even though the StepSVM implies consuming a considerable amount of time, it is more consistent and better.(3)

There is a long pathway yet to discover techniques and algorithms that will improve the certainty of clinical data and manage to bring the best available choice to a clinical problem.

Contact Us