Publications

Publications by CRACS

2011

A Hybrid AIS-SVM Ensemble Approach for Text Classification

Authors
Antunes, M; Silva, C; Ribeiro, B; Correia, M;

Publication
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT II

Abstract
In this paper we propose and analyse methods for expanding state-of-the-art performance on text classification. We put forward an ensemble-based structure that includes Support Vector Machines (SVM) and Artificial Immune Systems (AIS). The underpinning idea is that SVM-like approaches can be enhanced with A IS approaches which can capture dynamics in models. While having radically different genesis, and probably because of that, SVM and AIS can cooperate in a committee setting, using a heterogeneous ensemble to improve overall performance, including a confidence on each system classification as the differentiating factor. Results on the well-known Reuters-21578 benchmark are presented, showing promising classification performance gains, resulting in a classification that improves upon all baseline contributors of the ensemble committee.

CloseRead Abstract

2011

T-SPPA: Trended Statistical PreProcessing Algorithm

Authors
Silva, T; Dutra, I;

Publication
DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS, PT 1

Abstract
Traditional machine learning systems learn from non-relational data but in fact most of the real world data is relational. Normally the learning task is done using a single flat file, which prevents the discovery of effective relations among records. Inductive logic programming and statistical relational learning partially solve this problem. In this work, we resource to another method to overcome this problem and propose the T-SPPA: Trended Statistical PreProcessing Algorithm, a preprocessing method that translates related records to one single record before learning. Using different kinds of data, we compare our results when learning with the transformed data with results produced when learning from the original data to demonstrate the efficacy of our method.

CloseRead Abstract

2011

Predicting Malignancy from Mammography Findings and Surgical Biopsies

Authors
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publication
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011)

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

CloseRead Abstract

2011

Integrating machine learning and physician knowledge to improve the accuracy of breast biopsy.

Authors
Dutra, I; Nassif, H; Page, D; Shavlik, J; Strigel, RM; Wu, Y; Elezaby, ME; Burnside, E;

Publication
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

Abstract
In this work we show that combining physician rules and machine learned rules may improve the performance of a classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy (subsequently referred to as "breast core biopsy"). Specifically, we show how advice in the form of logical rules, derived by a sub-specialty, i.e. fellowship trained breast radiologists (subsequently referred to as "our physicians") can guide the search in an inductive logic programming system, and improve the performance of a learned classifier. Our dataset of 890 consecutive benign breast core biopsy results along with corresponding mammographic findings contains 94 cases that were deemed non-definitive by a multidisciplinary panel of physicians, from which 15 were upgraded to malignant disease at surgery. Our goal is to predict upgrade prospectively and avoid surgery in women who do not have breast cancer. Our results, some of which trended toward significance, show evidence that inductive logic programming may produce better results for this task than traditional propositional algorithms with default parameters. Moreover, we show that adding knowledge from our physicians into the learning process may improve the performance of the learned classifier trained only on data.

CloseRead Abstract

2011

STUDYING THE RELEVANCE OF BREAST IMAGING FEATURES

Authors
Ferreira, P; Dutra, I; Fonseca, NA; Woods, R; Burnside, E;

Publication
HEALTHINF 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HEALTH INFORMATICS

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer in an initial stage. The sole exam approved for this purpose is mammography that, despite the existence of more advanced technologies, is considered the cheapest and most efficient method to detect cancer in a preclinical stage. We investigate, using machine learning techniques, how attributes obtained from mammographies can relate to malignancy. In particular, this study focus is on how mass density can influence malignancy from a data set of 348 patients containing, among other information, results of biopsies. To this end, we applied different learning algorithms on the data set using the WEKA tools, and performed significance tests on the results. The conclusions are threefold: (1) automatic classification of a mammography can reach equal or better results than the ones annotated by specialists, which can help doctors to quickly concentrate on some specific mammogram for a more thorough study; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) we can obtain classifiers that can predict mass density with a quality as good as the specialist blind to biopsy.

CloseRead Abstract

2011

DigiScope - Unobtrusive Collection and Annotating of Auscultations in Real Hospital Environments

Authors
Pereira, D; Hedayioglu, F; Correia, R; Silva, T; Dutra, I; Almeida, F; Mattos, SS; Coimbra, M;

Publication
2011 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)

Abstract
Digital stethoscopes are medical devices that can collect, store and sometimes transmit acoustic auscultation signals in a digital format. These can then be replayed, sent to a colleague for a second opinion, studied in detail after an auscultation, used for training or, as we envision it, can be used as a cheap powerful tool for screening cardiac pathologies. In this work, we present the design, development and deployment of a prototype for collecting and annotating auscultation signals within real hospital environments. Our main objective is not only pave the way for future unobtrusive systems for cardiac pathology screening, but more immediately we aim to create a repository of annotated auscultation signals for biomedical signal processing and machine learning research. The presented prototype revolves around a digital stethoscope that can stream the collected audio signal to a nearby tablet PC. Interaction with this system is based on two models: a data collection model adequate for the uncontrolled hospital environments of both emergency room and primary care, and a data annotation model for offline metadata input. A specific data model was created for the repository. The prototype has been deployed and is currently being tested in two Hospitals, one in Portugal and one in Brazil.

CloseRead Abstract