2009
Autores
Reinaldo, F; Fernandes, C; Rahman, MA; Malucelli, A; Camacho, R;
Publicação
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION
Abstract
Organ transplantation is a highly complex decision process that requires expert, decisions. The major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy the transplanted tissue. It is therefore of capital importance to find a donor with the highest possible compatibility with the receiver, and thus reduce rejection. Finding a good donor is not a straightforward task because a complex network of relations exist's between the immunological and the clinical variables that, influence the receivers acceptance of the transplanted organ. Currently the process of analyzing these variables involves a careful study by the clinical transplant team. The number and complexity of the relations between variables make the manual process very slow. Ill this paper we propose and compare two Machine Learning algorithms that might help the transplant team ill improving and Speeding up their decisions. We achieve that objective by analyzing past real cases and constructing models as set, of rules. Such models are accurate and understandable by experts.
2012
Autores
Alves, M; Alves, J; Camacho, R; Soares, P; Pereira, L;
Publicação
6TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS
Abstract
Phylogenetic networks are a useful way of displaying relationships between nucleotide or protein sequences. They diverge from phylogenetic trees as networks present cycles, several possible evolutionary histories of the sequences analysed, while a tree presents a single evolutionary relationship. Networks are especially useful in studying markers with a high level of homoplasy (same mutation happening more than once during evolution) like the control region of mitochondrial DNA (mtDNA), where the researcher does not need to compromise with a single explanation for the evolution suggested by the data. However in many instances, trees are required. One case where this happens is in the founder analysis methodology that aims at estimating migration times of human populations along history and prehistory. Currently, the founder analysis methodology implicates the creation of networks, from where a probable tree will be extracted by hand by the researcher, a time-consuming process, prone to errors and to the ambiguous decisions of the researcher. In order to automate the founder analysis methodology an algorithm that extracts a single probable tree from a network in a fast, systematic way is presented here.
1999
Autores
Srinivasan, A; Camacho, R;
Publicação
JOURNAL OF LOGIC PROGRAMMING
Abstract
Using problem-specific background knowledge, computer programs developed within the framework of Inductive Logic Programming (ILP) have been used to construct restricted first-order logic solutions to scientific problems. However, their approach to the analysis of data with substantial numerical content has been largely limited to constructing clauses that: (a) provide qualitative descriptions ("high", "low" etc.) of the values of response variables; and (b) contain simple inequalities restricting the ranges of predictor variables. This has precluded the application of such techniques to scientific and engineering problems requiring a more sophisticated approach. A number of specialised methods have been suggested to remedy this. In contrast, we have chosen to take advantage of the fact that the existing theoretical framework for ILP places very few restrictions of the nature of the background knowledge. We describe two issues of implementation that make it possible to use background predicates that implement well-established statistical and numerical analysis procedures. Any improvements in analytical sophistication that result are evaluated empirically using artificial and real-life data. Experiments utilising artificial data are concerned with extracting constraints for response variables in the text-book problem of balancing a pole on a cart. They illustrate the use of clausal definitions of arithmetic and trigonometric functions, inequalities, multiple linear regression, and numerical derivatives. A non-trivial problem concerning the prediction of mutagenic activity of nitroaromatic molecules is also examined. In this case, expert chemists have been unable to devise a model for explaining the data. The result demonstrates the combined use by an ILP program of logical and numerical capabilities to achieve an analysis that includes linear modelling, clustering and classification. In all experiments, the predictions obtained compare favourably against benchmarks set by more traditional methods of quantitative methods, namely, regression and neural-network.
2006
Autores
Srinivasan, A; Page, D; Camacho, R; King, R;
Publicação
MACHINE LEARNING
Abstract
Three-dimensional models, or pharmacophores, describing Euclidean constraints on the location on small molecules of functional groups (like hydrophobic groups, hydrogen acceptors and donors, etc.), are often used in drug design to describe the medicinal activity of potential drugs (or 'ligands'). This medicinal activity is produced by interaction of the functional groups on the ligand with a binding site on a target protein. In identifying structure-activity relations of this kind there are three principal issues: (1) It is often difficult to "align" the ligands in order to identify common structural properties that may be responsible for activity; (2) Ligands in solution can adopt different shapes (or 'conformations') arising from torsional rotations about bonds. The 3-D molecular substructure is typically sought on one or more low-energy conformers; and (3) Pharmacophore models must, ideally, predict medicinal activity on some quantitative scale. It has been shown that the logical representation adopted by Inductive Logic Programming (ILP) naturally resolves many of the difficulties associated with the alignment and multi-conformation issues. However, the predictions of models constructed by ILP have hitherto only been nominal, predicting medicinal activity to be present or absent. In this paper, we investigate the construction of two kinds of quantitative pharmacophoric models with ILP: (a) Models that predict the probability that a ligand is "active"; and (b) Models that predict the actual medicinal activity of a ligand. Quantitative predictions are obtained by the utilising the following statistical procedures as background knowledge: logistic regression and naive Bayes, for probability prediction; linear and kernel regression, for activity prediction. The multi-conformation issue and, more generally, the relational representation used by ILP results in some special difficulties in the use of any statistical procedure. We present the principal issues and some solutions. Specifically, using data on the inhibition of the protease Thermolysin, we demonstrate that it is possible for an ILP program to construct good quantitative structure-activity models. We also comment on the relationship of this work to other recent developments in statistical relational learning.
2006
Autores
Camacho, R; King, RD; Srinivasan, A;
Publicação
Machine Learning
Abstract
2006
Autores
Camacho, R;
Publicação
6th Industrial Conference on Data Mining, Poster Proceedings, ICDM 2006, Leipzig, Germany, July 14-15, 2006
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.