Publications

Publications by CRACS

2011

Assessing the Effect of 2D Fingerprint Filtering on ILP-Based Structure-Activity Relationships Toxicity Studies in Drug Design

Authors
Camacho, R; Pereira, M; Costa, VS; Fonseca, NA; Simoes, CJV; Brito, RMM;

Publication
5TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2011)

Abstract
The rational development of new drugs is a complex and expensive process. A myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognised as the major hurdle behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship studies, using relational Machine Learning algorithms, proved already to be very useful in the complex process of rational drug design. However, a typical problem with those studies concerns the use of available repositories of previously studied molecules. It is quite often the case that those repositories are highly biased since they contain lots of molecules that are similar to each other. This results from the common practice where an expert chemist starts off with a lead molecule, presumed to have some potential, and then introduces small modifications to produce a set of similar molecules. Thus, the resulting sets have a kind of similarity bias. In this paper we assess the advantages of filtering out similar molecules in order to improve the application of relational learners in Structure-Activity Relationship (SAR) problems to predict toxicity. Furthermore, we also assess the advantage of using a relational learner to construct comprehensible models that may be quite valuable to bring insights into the workings of toxicity.

CloseRead Abstract

2011

On the Portability of Prolog Applications

Authors
Wielemaker, J; Costa, VS;

Publication
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES

Abstract
The non-portability of Prolog programs is widely considered one of the main problems facing Prolog programmers. Although since 1995, the core of the language is covered by the ISO standard 13211-1, this standard has not been sufficient to support large Prolog applications. As an approach to address this problem, since 2007, YAP and SWI-Prolog have established a basic compatibility framework. The aim of the framework is running the same code on Edinburgh-based Prolog systems rather than having to migrate an application. This article describes the implementation and evaluates this framework by studying how it can be used on a number of libraries and an important application.

CloseRead Abstract

2011

Constrained Sequential Pattern Knowledge in Multi-relational Learning

Authors
Ferreira, CA; Gama, J; Costa, VS;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In this work we present XMuSer, a multi-relational framework suitable to explore temporal patterns available in multi-relational databases. XMuSer's main idea consists of exploiting frequent sequence mining, using an efficient and direct method to learn temporal patterns in the form of sequences. Grounded on a coding methodology and on the efficiency of sequential miners, we find the most interesting sequential patterns available and then map these findings into a new table, which encodes the multi-relational timed data using sequential patterns. In the last step of our framework, we use an ILP algorithm to learn a theory on the enlarged relational database that consists on the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems. Moreover, we map each one of three different types of sequential patterns: frequent sequences, closed sequences or maximal sequences.

CloseRead Abstract

2011

On using crowdsourcing and active learning to improve classification performance

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
International Conference on Intelligent Systems Design and Applications, ISDA

Abstract
Crowdsourcing is an emergent trend for general-purpose classification problem solving. Over the past decade, this notion has been embodied by enlisting a crowd of humans to help solve problems. There are a growing number of real-world problems that take advantage of this technique, such as Wikipedia, Linux or Amazon Mechanical Turk. In this paper, we evaluate its suitability for classification, namely if it can outperform state-of-the-art models by combining it with active learning techniques. We propose two approaches based on crowdsourcing and active learning and empirically evaluate the performance of a baseline Support Vector Machine when active learning examples are chosen and made available for classification to a crowd in a web-based scenario. The proposed crowdsourcing active learning approach was tested with Jester data set, a text humour classification benchmark, resulting in promising improvements over baseline results. © 2011 IEEE.

CloseRead Abstract

2011

Get Your Jokes Right: Ask the Crowd

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
MODEL AND DATA ENGINEERING

Abstract
Jokes classification is an intrinsically subjective and complex task, mainly due to the difficulties related to cope with contextual constraints on classifying each joke. Nowadays people have less time to devote to search and enjoy humour and, as a consequence, people are usually interested on having a set of interesting filtered jokes that could be worth reading, that is with a high probability of make them laugh. In this paper we propose a crowdsourcing based collective intelligent mechanism to classify humour and to recommend the most interesting jokes for further reading. Crowdsourcing is becoming a model for problem solving, as it revolves around using groups of people to handle tasks traditionally associated with experts or machines. We put forward an active learning Support Vector Machine (SVM) approach that uses crowdsourcing to improve classification of user custom preferences. Experiments were carried out using the widely available Jester jokes dataset, with encouraging results.

CloseRead Abstract

2011

The importance of precision in humour classification

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Humour classification is one of the most interesting and difficult tasks in text classification. Humour is subjective by nature, yet humans are able to promptly define their preferences. Nowadays people often search for humour as a relaxing proxy to overcome stressful and demanding situations, having little or no time to search contents for such activities. Hence, we propose to aid the definition of personal models that allow the user to access humour with more confidence on the precision of his preferences. In this paper we focus on a Support Vector Machine (SVM) active learning strategy that uses specific most informative examples to improve baseline performance. Experiments were carried out using the widely available Jester jokes dataset, with encouraging results on the proposed framework. © 2011 Springer-Verlag.

CloseRead Abstract