## Contributed Session Thu.3.H 2013

#### Thursday, 15:15 - 16:45 h, Room: H 2013

**Cluster 11: Integer & mixed-integer programming** [...]

### Topology, clustering and separation

**Chair: Timo Berthold**

**Thursday, 15:15 - 15:40 h, Room: H 2013, Talk 1**

**Marcia Fampa**

MILP formulation for the software clustering problem

**Coauthors: Olinto Araújo, Viviane Kohler**

**Abstract:**

We present a mixed integer linear programming (MILP) formulation for the Software Clustering

Problem (SCP), where we divide the modules of a software system into groups or clusters, to

facilitate the work of the software maintainers. We discuss a preprocessing that reduces the size

of the instances of the SCP and introduce some valid inequalities that have been shown to be very effective in tightening the MILP formulation. Numerical results presented compare the results

obtained with the formulation proposed with the solutions obtained by the exhaustive algorithm

supported by the freely available Bunch clustering tool, for benchmark problems.

**Thursday, 15:45 - 16:10 h, Room: H 2013, Talk 2**

**Pedro G. Guillén**

Natural languages with the morphosyntactic distance as a topological space

**Coauthors: Juan Castellanos, Alejandro De Santos, Eduardo Villa**

**Abstract:**

The main aim of this paper is to give a proof of the computability of morphosyntactic distance (M.D.) over an arbitrary set of data. Since here, M.D. (defined in the works of De Santos, Villa and Guillén) can be defined over the elements of this group. Distance d induces a topological space, that we call morphosyntactic space. Based on these hypothesis, studying the properties of this space from a topological point of view. Let the associated lexical space built, that haves a semigroup structure, and could be treated as a set, regardless of its algebraic properties. Using the fact that the meaning function is inyective, it is possible to define on it the M.D. d.

In the first section, several topological properties of morphosyntactic space are proved: total disconnection, compactness and separability. Then a comparison is proposed between different structures and morphosyntactic space.

Under the latter theorem, reasonable time to implement algorithms can be assumed over morphosyntactic space. In these conditions, is easy to conclude that the model designed to define the morphosyntactic space is computable, and therefore the algorithm of M.D. is solvable.

**Thursday, 16:15 - 16:40 h, Room: H 2013, Talk 3**

**Inácio Andruski-Guimarães**

Comparison of techniques based on linear programming to detect separation

**Abstract:**

Separation is a key feature in logistic regression. In fact, is well known that, in case of complete separation, iterative methods commonly used to maximize the likelihood, like for example Newton's method, do not converge to finite values. This phenomenon is also known as monotone likelihood, or infinite parameters. Linear programming techniques to detect separation have been proposed in the literature for logistic regression with binary response variable. But, for polytomous response variable, the time required to perform these techniques can be greater than that for fitting the model using an iterative method. The purpose with this job is to develop and implement an alternative approach to detect separation for the parameter estimation in polytomous logistic regression. This approach proposes to use as covariates a reduced set of optimum principal components of the original covariates. Principal components analysis allows the reduction of the number of dimensions and avoiding the multicollinearity of these variables. Examples on datasets taken from the literature show that the approach is feasible and works better than other techniques, in terms of amount of computing.