## Invited Session Fri.1.H 2013

#### Friday, 10:30 - 12:00 h, Room: H 2013

**Cluster 11: Integer & mixed-integer programming** [...]

### Integer programming in data mining

**Chair: Dolores Romero Morales**

**Friday, 10:30 - 10:55 h, Room: H 2013, Talk 1**

**Yufeng Liu**

Optimization issues on some margin-based classifiers

**Abstract:**

Margin-based classifiers have been popular in both machine learning and statistics for classification problems. Such techniques have a wide range of applications, from computer science to engineering to bioinformatics. Among various margin-based classifiers, the Support Vector Machine is a well known example. Despite successes, many margin-based classifiers with unbounded loss functions can be sensitive to outliers. To achieve robustness, nonconvex loss functions can be used instead. However, the corresponding optimization problem involves non convex minimization and can be very challenging to implement. In this talk, I will present some connection of such a nonconvex optimization problem with integer programming and illustrate how to solve the problem via mixed-integer programming. Some alternative more efficient approximation algorithms will be discussed as well.

**Friday, 11:00 - 11:25 h, Room: H 2013, Talk 2**

**James Paul Brooks**

Counting misclassifications: Robust support vector machines via integer programming

**Abstract:**

The support vector machine (SVM) for classification is a method for generating rules for assigning data points to categories. The traditional formulation is commonly expressed as a convex quadratic program where the error for an observation is based on its distance to the separating boundary in the feature space. In the interest of enhancing the robustness to outlier observations, we present two formulations that reflect the notion that errors should be counted.

Convex quadratic integer programming formulations are presented for the ramp loss and hard margin loss SVM. We show that the formulations accommodate the kernel trick for SVM while preserving the original geometric interpretation. Solution methods are presented for the formulations, including facets for the convex hull of integer feasible solutions. The consistency of SVM with the alternative loss functions is established. Computational tests indicate that the proposed formulations produce better classification rules on datasets containing unbalanced outliers.

**Friday, 11:30 - 11:55 h, Room: H 2013, Talk 3**

**Amaya Nogales Gómez**

Matheuristics for *Ψ*-learning

**Coauthors: Emilio Carrizosa, Dolores Romero Morales**

**Abstract:**

The *ψ*-learning classifier is an alternative to the Support Vector Machine classifier, which uses the so-called ramp loss function. The *ψ*-learning classifier is expected to be more robust, and therefore to have a better performance, when outliers are present.

A Nonlinear Mixed Integer Programming formulation proposed in the literature is analysed. Solving the problem exactly is only possible for data sets of very small size. For datasets of more realistic size, the state-of-the-art is a recent matheuristic, which attempts to solve the MINLP imposing a time limit.

In this talk, a new matheuristic, based on the solution of much simpler Convex Quadratic Problems and Linear Integer Problems, is developed. Computational results are given showing the improvement against the state-of-the-art method.