2023
Authors
Sousa, AO; Veloso, DT; Goncalves, HM; Faria, JP; Mendes Moreira, J; Graca, R; Gomes, D; Castro, RN; Henriques, PC;
Publication
IEEE ACCESS
Abstract
Software estimation is a vital yet challenging project management activity. Various methods, from empirical to algorithmic, have been developed to fit different development contexts, from plan-driven to agile. Recently, machine learning techniques have shown potential in this realm but are still underexplored, especially for individual task estimation. We investigate the use of machine learning techniques in predicting task effort and duration in software projects to assess their applicability and effectiveness in production environments, identify the best-performing algorithms, and pinpoint key input variables (features) for predictions. We conducted experiments with datasets of various sizes and structures exported from three project management tools used by partner companies. For each dataset, we trained regression models for predicting the effort and duration of individual tasks using eight machine learning algorithms. The models were validated using k-fold cross-validation and evaluated with several metrics. Ensemble algorithms like Random Forest, Extra Trees Regressor, and XGBoost consistently outperformed non-ensemble ones across the three datasets. However, the estimation accuracy and feature importance varied significantly across datasets, with a Mean Magnitude of Relative Error (MMRE) ranging from 0.11 to 9.45 across the datasets and target variables. Nevertheless, even in the worst-performing dataset, effort estimates aggregated to the project level showed good accuracy, with MMRE = 0.23. Machine learning algorithms, especially ensemble ones, seem to be a viable option for estimating the effort and duration of individual tasks in software projects. However, the quality of the estimates and the relevant features may depend largely on the characteristics of the available datasets and underlying projects. Nevertheless, even when the accuracy of individual estimates is poor, the aggregated estimates at the project level may present a good accuracy due to error compensation.
2023
Authors
Pedroto, M; Coelho, T; Jorge, A; Mendes Moreira, J;
Publication
FRONTIERS IN NEUROLOGY
Abstract
IntroductionHereditary transthyretin amyloidosis (ATTRv amyloidosis) is a rare neurological hereditary disease clinically characterized as severe, progressive, and life-threatening while the age of onset represents the moment in time when the first symptoms are felt. In this study, we present and discuss our results on the study, development, and evaluation of an approach that allows for time-to-event prediction of the age of onset, while focusing on genealogical feature construction. Materials and methodsThis research was triggered by the need to answer the medical problem of when will an asymptomatic ATTRv patient show symptoms of the disease. To do so, we defined and studied the impact of 77 features (ranging from demographic and genealogical to familial disease history) we studied and compared a pool of prediction algorithms, namely, linear regression (LR), elastic net (EN), lasso (LA), ridge (RI), support vector machines (SV), decision tree (DT), random forest (RF), and XGboost (XG), both in a classification as well as a regression setting; we assembled a baseline (BL) which corresponds to the current medical knowledge of the disease; we studied the problem of predicting the age of onset of ATTRv patients; we assessed the viability of predicting age of onset on short term horizons, with a classification framing, on localized sets of patients (currently symptomatic and asymptomatic carriers, with and without genealogical information); and we compared the results with an out-of-bag evaluation set and assembled in a different time-frame than the original data in order to account for data leakage. ResultsCurrently, we observe that our approach outperforms the BL model, which follows a set of clinical heuristics and represents current medical practice. Overall, our results show the supremacy of SV and XG for both the prediction tasks although impacted by data characteristics, namely, the existence of missing values, complex data, and small-sized available inputs. DiscussionWith this study, we defined a predictive model approach capable to be well-understood by medical professionals, compared with the current practice, namely, the baseline approach (BL), and successfully showed the improvement achieved to the current medical knowledge.
2023
Authors
Bhanu, M; Roy, S; Priya, S; Mendes Moreira, J; Chandra, J;
Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
Predicting taxi demands in large cities can help in better traffic management as well as ensure better commuter satisfaction for an intelligent transportation system. However, the traffic demands across different locations have varying spatio-temporal correlations that are difficult to model. Despite the ability of the existing Deep Neural Network (DNN) models to capture the non-linearity in spatial and temporal characteristics of the demand time-series, capturing spatio-temporal characteristics in different real-world scenarios like varying historic and prediction time frame, spatio-temporal variations due to noise or missing data, etc. still remain a big challenge for the state-of-the-art models. In this paper, we introduce Encoder-ApproXimator (EnAppX), an encoder-decoder DNN-based model that uses Chebyshev function approximation in the decoding stage for taxi demand times-series prediction and can better estimate the time-series in the presence of large spatio-temporal variations. Opposed to any existing state-of-the-art model, the proposed model approximates complete spatiotemporal characteristics in the frequency domain which in turn enables the model to make a robust and improved prediction in different scenarios. Validation over two real-world taxi datasets from different cities shows a considerable improvement of around 23% in RMSE scores compared to the state-of-the-art baseline model. Unlike several existing state-of-the-art models, EnAppX also produces improved prediction accuracy across two regions for both to and fro demands.
2012
Authors
Abreu, P; Moreira, J; Costa, I; Castelao, D; Reis, L; Garganta, J;
Publication
EUROPEAN JOURNAL OF SPORT SCIENCE
Abstract
Soccer is a team sport in which the performances of all team members are important for the outcome of a match. Even though the analysis of game events can be used to measure the team's performance, their perception, especially during the match, is extremely difficult, even for the involved agents. Soccer has been used as a simulation environment in many studies, mainly in the area of robotics. The RoboCup is an international robotics competition with an ambitious goal: in 2050 a robotics team will be capable of defeating the human world champion at the time. In this context, we compared technical similarities between human and robotics soccer. Based on an off-line automatic event detection tool, game statistics for the finals of both human and robotics soccer tournaments were collected and compared using the Wilcoxon test. The results show that the most frequent event in both forms of soccer is successful passes. Analysing the two types of passes considered (successful and missed), we conclude that there are significant differences between the two forms (W = 2, P = 0.000354), with human soccer presenting a higher percentage of successful passes (77.89% vs. 66.97%). Of restart events (W = 0, P = 0.00048965), the most frequent one, in both forms, is the throw-in (human 59.91%, robotics 66.4%), and the least frequent is the corner (human 13.7%, robotics 14.09%). Regarding the frequency of shots, in the robotics environment "shots" were the most predominant type (43.27%), whereas in human soccer "shots on target" predominated (71.25%; W = 64, P = 0.000085641). Finally, the number of faults is minor in robotics soccer.
2009
Authors
Mendes Moreira, PMM; Patto, MCV; Mota, M; Mendes Moreira, J; Santos, JPN; Santos, JPP; Andrade, E; Hallauer, AR; Pego, SE;
Publication
MAYDICA
Abstract
Climatic change emphasize the importance of biodiversity maintenance, Suggesting that germplasm adapted to organic, low input, or conventional conditions is needed to face future demands. This Study presents: I - The two steps genesis of the synthetic maize population 'Fandango', A) 'NUTICA' creation: in 1975, Miguel Mota and Silas Pego, initiated a new type of polycross method involving 77 yellow elite inbred lines (dent and flint; 20% Portuguese and 80% North American germplasm) from the NUMI programme (NUcleo de melhoramento de Milho, Braga, Portugal). These inbreds were intermated in natural isolation and progenies submitted to intensive selection for both parents during continued cycles; B) From 'NUTICA' to 'Fandango': Tandango' was composed of all the crosses that resulted from a North Carolina Design I matting design (1 male crossed with 5 females) applied to 'NUTICA'. II - The diversity evolution of 'Fandango' under a Participatory Breeding project at the Portuguese Sousa Valley region (VASO) initiated in 1985 by Pego, with CIMMYT support. Morphological, fasciation expression, and yield trials were conducted in Portugal (3 locations, 3 years) and in the USA (4 locations, I year) using seeds obtained from five to seven cycles of mass selection (MS). The selection across cycles wits clone by the breeder (until cycle 5) and farmer (before cycle II in present). ANOVA and regression analysis on the rate of direct response to selection were performed when the assumption of normality was positively confirmed. Otherwise the non parametric Multivariate Adaptive Regression Splines (MARS) was performed. Response to mass selection in lowa showed significant decrease in yield, while in Portugal a significant increase for time of silking, plant and ear height, ear diameters 2, 37 4, kernel number, cot) diameters, and rachis was observed. At this location also a significant decrease was observed for thousand kernel weight and ear length. These results showed that mass selection were not effective for significant yield increase, except when considered Lousada with breeder selection (3.09% of gain per cycle per year). Some non-para metric methods (MARS, decision trees and random forests) were used to get insights on the causes that explain yield in Fandango. Kernel weight and ear weight were the most important traits, although row numbers, number of kernels per row, ear length, and ear diameter were also of some importance influencing 'Fandango' yield.
2012
Authors
Moreira Matias, L; Ferreira, C; Gama, J; Mendes Moreira, J; De Sousa, JF;
Publication
CEUR Workshop Proceedings
Abstract
Mining public transportation networks is a growing and explosive challenge due to the increasing number of information available. In highly populated urban zones, the vehicles can often fail the schedule. Such fails cause headway deviations (HD) between high-frequency bus pairs. In this paper, we propose to identify systematic HD which usually provokes the phenomenon known as Bus Bunching (BB). We use the PrefixSpan algorithm to accurately mine sequences of bus stops where multiple HD frequently emerges, forcing two or more buses to clump. Our results are promising: 1) we demonstrated that the BB origin can be modeled like a sequence mining problem where 2) the discovered patterns can easily identify the route schedule points to adjust in order to mitigate such events.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.