Publications

Publications by Nuno Fonseca

2010

Phylogeny of the Teashirt-related Zinc Finger (tshz) Gene Family and Analysis of the Developmental Expression of tshz2 and tshz3b in the Zebrafish

Authors
Santos, JS; Fonseca, NA; Vieira, CP; Vieira, J; Casares, F;

Publication
DEVELOPMENTAL DYNAMICS

Abstract
The tshz genes comprise a family of evolutionarily conserved transcription factors. However, despite the major role played by Drosophila tsh during the development of the fruit fly, the expression and function of other tshz genes have been analyzed in a very limited set of organisms and, therefore, our current knowledge of these genes is still fragmentary. In this study, we perform detailed phylogenetic analyses of the tshz genes, identify the members of this gene family in zebrafish and describe the developmental expressions of two of them, tshz2 and tshz3b, and compare them with meis1, meis2.1, meis2.2, pax6a, and pax6b expression patterns. The expression patterns of these genes define a complex set of coexpression domains in the developing zebrafish brain where their gene products have the potential to interact. Developmental Dynamics 239:1010-1018, 2010. (C) 2010 Wiley-Liss, Inc.

CloseRead Abstract

2011

Predicting Malignancy from Mammography Findings and Surgical Biopsies

Authors
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publication
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011)

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

CloseRead Abstract

2011

STUDYING THE RELEVANCE OF BREAST IMAGING FEATURES

Authors
Ferreira, P; Dutra, I; Fonseca, NA; Woods, R; Burnside, E;

Publication
HEALTHINF 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HEALTH INFORMATICS

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer in an initial stage. The sole exam approved for this purpose is mammography that, despite the existence of more advanced technologies, is considered the cheapest and most efficient method to detect cancer in a preclinical stage. We investigate, using machine learning techniques, how attributes obtained from mammographies can relate to malignancy. In particular, this study focus is on how mass density can influence malignancy from a data set of 348 patients containing, among other information, results of biopsies. To this end, we applied different learning algorithms on the data set using the WEKA tools, and performed significance tests on the results. The conclusions are threefold: (1) automatic classification of a mammography can reach equal or better results than the ones annotated by specialists, which can help doctors to quickly concentrate on some specific mammogram for a more thorough study; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) we can obtain classifiers that can predict mass density with a quality as good as the specialist blind to biopsy.

CloseRead Abstract

2009

UbiDis: a Flexible and General top-level Middleware to Manage Applications in Grids and Clusters

Authors
Fonseca, NA; Dutra, I;

Publication
IBERGRID: 3RD IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS

Abstract
From an application point of view, the Grid computing with its powerful processing power and large amounts of data storage offers the possibility to process large quantities of data, to run computationally-intensive operations, or both. For instance, in computational biological pipelines, one often has to process large quantities of data in individually computationally-intensive operations. To process this data in the Grid, hundreds, or even thousands of jobs need to be submitted and their results processed. Obviously, performing these tasks manually is unfeasible. On the other hand, developing software to this end, specifically for a single application, is unproductive because if the application changes, or the Grid submission engine changes, then the code needs to be rewritten. In this paper we present a middleware that facilitates the submission of jobs to grids (or clusters) and helps handling their results. The middleware, that we call UbiDis (Ubiquitous Distribution), copies all files necessary for running the program to the UI or front-end host (in a Grid or cluster), compiles programs on the UI or front-end (if necessary), generates and submits the jobs, and copies the outputs to the local machine. Furthermore, UbiDis transparently generates jobs to different job managers, allowing the user to easily and quickly change the location to where the jobs are submitted. Finally, we illustrate the usefulness of UbiDis using two applications.

CloseRead Abstract

2008

LogCHEM: Interactive Discriminative Mining of Chemical Structure

Authors
Costa, VS; Fonseca, NA; Camacho, R;

Publication
2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS

Abstract
One of the most well known successes of Inductive Logic Programming (ILP) is on Structure-Activity Relationship (SAR) problems. In such problems, ILP has proved several times to be capable of constructing expert comprehensible models that hell) to explain the activity of chemical compounds based on their structure and properties. However, despite its successes on SAR problems, ILP has severe scalability problems that prevent its application oil larger datasets. In this paper we present LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. LogCHEM tackles ILP's scalability issues in the context of SAR applications. We show that LogCHEM benefits from the flexibility of ILP both by its ability to quickly extend the original mining model, and by its ability, to interface with external tools. Furthermore, We demonstrate that LogCHEM can be used to mine effectively large chemoinformatics datasets, namely, several datasets from EPA's DSSTox database and on a dataset based on the DTP AIDS anti-viral screen.

CloseRead Abstract

2012

Predicting the secondary structure of proteins using Machine Learning algorithms

Authors
Camacho, R; Ferreira, R; Rosa, N; Guimaraes, V; Fonseca, NA; Costa, VS; de Sousa, M; Magalhaes, A;

Publication
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Abstract
The functions of proteins in living organisms are related to their 3-D structure, which is known to be ultimately determined by their linear sequence of amino acids that together form these macromolecules. It is, therefore, of great importance to be able to understand and predict how the protein 3D-structure arises from a particular linear sequence of amino acids. In this paper we report the application of Machine Learning methods to predict, with high values of accuracy, the secondary structure of proteins, namely alpha-helices and beta-sheets, which are intermediate levels of the local structure.

CloseRead Abstract