Publications

Publications by Pavel Brazdil

2012

Factors influencing hospital high length of stay outliers

Authors
Freitas, A; Silva Costa, T; Lopes, F; Garcia Lema, I; Teixeira Pinto, A; Brazdil, P; Costa Pereira, A;

Publication
BMC HEALTH SERVICES RESEARCH

Abstract
Background: The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods: We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results: In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions: In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs.

CloseRead Abstract

2005

Predicting relative performance of classifiers from samples

Authors
Leite, R; Brazdil, P;

Publication
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Abstract
This paper is concerned with the problem of predicting relative performance of classification algorithms. It focusses on methods that use results on small samples and discusses the shortcomings of previous approaches. A new variant is proposed that exploits, as some previous approaches, meta-learning. The method requires that experiments be conducted on few samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was carried out fully. This in turn permits to generate a prediction regards the relative performance of algorithms. Experimental evaluation shows that the method competes well with previous approaches and provides quite good and practical solution to this problem.

CloseRead Abstract

2007

A putative gene located at the MHC class I region around the D6S105 marker contributes to the setting of CD8+T-lymphocyte numbers in humans

Authors
Vieira, J; Cardoso, CS; Pinto, J; Patil, K; Brazdil, P; Cruz, E; Mascarenhas, C; Lacerda, R; Gartner, A; Almeida, S; Alves, H; Porto, G;

Publication
INTERNATIONAL JOURNAL OF IMMUNOGENETICS

Abstract
Significant associations between human leucocyte antigen (HLA)-A and -B alleles and CD8+ T-lymphocyte numbers have been reported in the literature in both healthy populations and in HFE-haemochromatosis patients. In order to address whether HLA alleles themselves or alleles at linked genes are responsible for these associations, several genetic markers at the MHC class I region were typed on a population of 147 apparently healthy unrelated subjects phenotypically characterized for their CD8+ and CD4+ T-lymphocyte numbers. By using a machine learning approach, a set of rules was generated that predict the number of CD8+ T-lymphocyte numbers on the basis of the information of the D6S105 microsatellite alleles only. We demonstrate that the previously reported associations with HLA-A and -B alleles are due to the presence of common long (up to 4 megabases long) haplotypes that increased in frequency recently due to positive selection and that encompass a region where a putative gene contributing to the setting of CD8+ T lymphocytes is located, in the neighbourhood of microsatellite locus D6S105, in the 6p21.3 region.

CloseRead Abstract

2012

Selecting classification algorithms with active testing

Authors
Leite, R; Brazdil, P; Vanschoren, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This 'most promising' competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods. © 2012 Springer-Verlag.

CloseRead Abstract

2010

Determining the best classification algorithm with recourse to sampling and metalearning

Authors
Brazdil, P; Leite, R;

Publication
Studies in Computational Intelligence

Abstract
Currently many classification algorithms exist and no algorithm exists that would outperform all the others. Therefore it is of interest to determine which classification algorithm is the best one for a given task. Although direct comparisons can be made for any given problem using a cross-validation evaluation, it is desirable to avoid this, as the computational costs are significant. We describe a method which relies on relatively fast pairwise comparisons involving two algorithms. This method is based on a previous work and exploits sampling landmarks, that is information about learning curves besides classical data characteristics. One key feature of this method is an iterative procedure for extending the series of experiments used to gather new information in the form of sampling landmarks. Metalearning plays also a vital role. The comparisons between various pairs of algorithm are repeated and the result is represented in the form of a partially ordered ranking. Evaluation is done by comparing the partial order of algorithm that has been predicted to the partial order representing the supposedly correct result. The results of our analysis show that the method has good performance and could be of help in practical applications. © 2010 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2010

Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning

Authors
Leite, R; Brazdil, P;

Publication
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Currently many classification algorithms exist and there is no algorithm that would outperform all the others in all tasks. Therefore it is of interest to determine which classification algorithm is the best one for a given task. Although direct comparisons can be made for any given problem using a cross-validation evaluation, it is desirable to avoid this, as the computational costs are significant. We describe a method which relies on relatively fast pairwise comparisons involving two algorithms. This method exploits sampling landmarks, that is information about learning curves besides classical data characteristics. One key feature of this method is an iterative procedure for extending the series of experiments used to gather new information in the form of sampling landmarks. Metalearning plays also a vital role. The comparisons between various pairs of algorithm are repeated and the result is represented in the form of a partially ordered ranking. Evaluation is done by comparing the partial order of algorithm that has been predicted to the partial order representing the supposedly correct result. The results of our analysis show that the method has good performance and could be of help in practical applications.

CloseRead Abstract