2011
Autores
Cejudo, JMC; García, MB; Bueno, RM; Gama, J; Bifet, A;
Publicação
Proceedings of the Second Workshop on Applications of Pattern Analysis, WAPA 2011, Castro Urdiales, Spain, October 19-21, 2011
Abstract
2011
Autores
Rodrigues, PP; Pechenizkiy, M; Gaber, MM; Gama, J;
Publicação
CEUR Workshop Proceedings
Abstract
Clinical practice and research are facing a new challenge created by the rapid growth of health information science and technology, and the complexity and volume of biomedical data. Machine learning from medical data streams is a recent area of research that aims to provide better knowledge extraction and evidence-based clinical decision support in scenarios where data are produced as a continuous flow. This year's edition of AIME, the Conference on Artificial Intelligence in Medicine, enabled the sound discussion of this area of research, mainly by the inclusion of a dedicated workshop. This paper is an introduction to LEMEDS, the Learning from Medical Data Streams workshop, which highlights the contributed papers, the invited talk and expert panel discussion, as well as related papers accepted to the main conference.
2011
Autores
Gama, J; Kosina, P;
Publicação
IJCAI International Joint Conference on Artificial Intelligence
Abstract
Decision rules, which can provide good interpretability and flexibility for data mining tasks, have received very little attention in the stream mining community so far. In this work we introduce a new algorithm to learn rule sets, designed for open-ended data streams. The proposed algorithm is able to continuously learn compact ordered and unordered rule sets. The experimental evaluation shows competitive results in comparison with VFDT and C4.5rules.
2011
Autores
Ikonomovska, E; Gama, J; Zenko, B; Dzeroski, S;
Publicação
Proceedings of the 28th International Conference on Machine Learning, ICML 2011
Abstract
Data streams are ubiquitous and have in the last two decades become an important research topic. For their predictive non-parametric analysis, Hoeffding-based trees are often a method of choice, offering a possibility of any-time predictions. However, one of their main problems is the delay in learning progress due to the existence of equally discriminative attributes. Options are a natural way to deal with this problem. Option trees build upon regular trees by adding splitting options in the internal nodes. As such they are known to improve accuracy, stability and reduce ambiguity. In this paper, we present on-line option trees for faster learning on numerical data streams. Our results show that options improve the any-time performance of ordinary on-line regression trees, while preserving the interpretable structure of trees and without significantly increasing the computational complexity of the algorithm. Copyright 2011 by the author(s)/owner(s).
2011
Autores
Earl, D; Bradnam, K; St John, J; Darling, A; Lin, DW; Fass, J; Hung, OKY; Buffalo, V; Zerbino, DR; Diekhans, M; Nguyen, N; Ariyaratne, PN; Sung, WK; Ning, ZM; Haimel, M; Simpson, JT; Fonseca, NA; Birol, I; Docking, TR; Ho, IY; Rokhsar, DS; Chikhi, R; Lavenier, D; Chapuis, G; Naquin, D; Maillet, N; Schatz, MC; Kelley, DR; Phillippy, AM; Koren, S; Yang, SP; Wu, W; Chou, WC; Srivastava, A; Shaw, TI; Ruby, JG; Skewes Cox, P; Betegon, M; Dimon, MT; Solovyev, V; Seledtsov, I; Kosarev, P; Vorobyev, D; Ramirez Gonzalez, R; Leggett, R; MacLean, D; Xia, FF; Luo, RB; Li, ZY; Xie, YL; Liu, BH; Gnerre, S; MacCallum, I; Przybylski, D; Ribeiro, FJ; Yin, SY; Sharpe, T; Hall, G; Kersey, PJ; Durbin, R; Jackman, SD; Chapman, JA; Huang, XQ; DeRisi, JL; Caccamo, M; Li, YR; Jaffe, DB; Green, RE; Haussler, D; Korf, I; Paten, B;
Publicação
GENOME RESEARCH
Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: ( 1) It is possible to assemble the genome to a high level of coverage and accuracy, and that ( 2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
2011
Autores
de Sousa, MM; Munteanu, CR; Pazos, A; Fonseca, NA; Camacho, R; Magalhaes, AL;
Publicação
JOURNAL OF THEORETICAL BIOLOGY
Abstract
A statistical approach has been applied to analyse primary structure patterns at inner positions of alpha-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse alpha-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the alpha-helix were considered as inner. Amino acid pairings i, i+k(k = 1, 2, 3,4, 5), were analysed and the corresponding 20 x 20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for alpha-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.