2011
Authors
Earl, D; Bradnam, K; St John, J; Darling, A; Lin, DW; Fass, J; Hung, OKY; Buffalo, V; Zerbino, DR; Diekhans, M; Nguyen, N; Ariyaratne, PN; Sung, WK; Ning, ZM; Haimel, M; Simpson, JT; Fonseca, NA; Birol, I; Docking, TR; Ho, IY; Rokhsar, DS; Chikhi, R; Lavenier, D; Chapuis, G; Naquin, D; Maillet, N; Schatz, MC; Kelley, DR; Phillippy, AM; Koren, S; Yang, SP; Wu, W; Chou, WC; Srivastava, A; Shaw, TI; Ruby, JG; Skewes Cox, P; Betegon, M; Dimon, MT; Solovyev, V; Seledtsov, I; Kosarev, P; Vorobyev, D; Ramirez Gonzalez, R; Leggett, R; MacLean, D; Xia, FF; Luo, RB; Li, ZY; Xie, YL; Liu, BH; Gnerre, S; MacCallum, I; Przybylski, D; Ribeiro, FJ; Yin, SY; Sharpe, T; Hall, G; Kersey, PJ; Durbin, R; Jackman, SD; Chapman, JA; Huang, XQ; DeRisi, JL; Caccamo, M; Li, YR; Jaffe, DB; Green, RE; Haussler, D; Korf, I; Paten, B;
Publication
GENOME RESEARCH
Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: ( 1) It is possible to assemble the genome to a high level of coverage and accuracy, and that ( 2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
2011
Authors
de Sousa, MM; Munteanu, CR; Pazos, A; Fonseca, NA; Camacho, R; Magalhaes, AL;
Publication
JOURNAL OF THEORETICAL BIOLOGY
Abstract
A statistical approach has been applied to analyse primary structure patterns at inner positions of alpha-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse alpha-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the alpha-helix were considered as inner. Amino acid pairings i, i+k(k = 1, 2, 3,4, 5), were analysed and the corresponding 20 x 20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for alpha-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.
2011
Authors
Pereira, L; Alshamali, F; Andreassen, R; Ballard, R; Chantratita, W; Cho, NS; Coudray, C; Dugoujon, JM; Espinoza, M; Gonzalez Andrade, F; Hadi, S; Immel, UD; Marian, C; Gonzalez Martin, A; Mertens, G; Parson, W; Perone, C; Prieto, L; Takeshita, H; Rangel Villalobos, HR; Zeng, ZS; Zhivotovsky, L; Camacho, R; Fonseca, NA;
Publication
INTERNATIONAL JOURNAL OF LEGAL MEDICINE
Abstract
Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15-17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator (http://cracs.fc.up.pt/popaffiliator) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.
2009
Authors
Vieira, J; Fonseca, NA; Vieira, CP;
Publication
JOURNAL OF MOLECULAR EVOLUTION
Abstract
Multiple independent recruitments of the S-pollen component (always an F-box gene) during RNase-based gametophytic self-incompatibility evolution have recently been suggested. Therefore, different mechanisms could be used to achieve the rejection of incompatible pollen in different plant families. This hypothesis is, however, mainly based on the interpretation of phylogenetic analyses, using a small number of divergent nucleotide sequences. In this work we show, based on a large collection of F-box S-like sequences, that the inferred relationship of F-box S-pollen and F-box S-like sequences is dependent on the sequence alignment software and phylogenetic method used. Thus, at present, it is not possible to address the phylogenetic relationship of F-box S-pollen and S-like sequences from different plant families. In Petunia and Malus/ Pyrus the putative S-pollen gene(s) show(s) variability patterns different than expected for an S-pollen gene, raising the question of false identification. Here we show that in Petunia, the unexpected features of the putative S-pollen gene are not incompatible with this gene's being the S-pollen gene. On the other hand, it is very unlikely that the Pyrus SFBB-gamma gene is involved in specificity determination.
2008
Authors
Vieira, J; Fonseca, NA; Vieira, CP;
Publication
JOURNAL OF MOLECULAR EVOLUTION
Abstract
It has been argued that the common ancestor of about 75% of all dicots possessed an S-RNase-based gametophytic self-incompatibility (GSI) system. S-RNase genes should thus be found in most plant families showing GSI. The S-RNase gene (or a duplicate) may also acquire a new function and thus genes belonging to the S-RNase lineage may also persist in plant families without GSI. Nevertheless, sequences that belong to the S-RNase lineage have been found in the Solanaceae, Scrophulariaceae, Rosaceae, Cucurbitaceae, and Fabaceae plant families only. Here we search for new sequences that may belong to the S-RNase lineage, using both a phylogenetic and a much faster and simpler amino acid pattern-based approach. We show that the two methods have an apparently similar false-negative rate of discovery (similar to 10%). The amino acid pattern-based approach produces about 15% false positives. Genes belonging to the S-RNase lineage are found in three new plant families, namely, the Rubiaceae, Euphorbiaceae, and Malvaceae. Acquisition of a new function by genes belonging to the S-RNase lineage is shown to be a frequent event. A putative S-RNase sequence is identified in Lotus, a plant genus for which molecular studies on GSI are lacking. The hypothesis of a single origin for S-RNase-based GSI (before the split of the Asteridae and Rosidae) is further supported by the finding of genes belonging to the S-RNase lineage in some of the oldest lineages of the Asteridae and Rosidae, and by Baysean constrained tree analyses.
2008
Authors
Fonseca, NA; Camacho, R; Magalhaes, AL;
Publication
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
Abstract
A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 angstrom and with a R-factor lower than 25%. A set of 5556 subsequences with a-helix or 3(10)-helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N-int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N-cap and C-cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu-Lys-Tyr-Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in alpha-helices when Pro is located at the C-cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.