2023
Autores
Sousa, AO; Veloso, DT; Goncalves, HM; Faria, JP; Mendes Moreira, J; Graca, R; Gomes, D; Castro, RN; Henriques, PC;
Publicação
IEEE ACCESS
Abstract
Software estimation is a vital yet challenging project management activity. Various methods, from empirical to algorithmic, have been developed to fit different development contexts, from plan-driven to agile. Recently, machine learning techniques have shown potential in this realm but are still underexplored, especially for individual task estimation. We investigate the use of machine learning techniques in predicting task effort and duration in software projects to assess their applicability and effectiveness in production environments, identify the best-performing algorithms, and pinpoint key input variables (features) for predictions. We conducted experiments with datasets of various sizes and structures exported from three project management tools used by partner companies. For each dataset, we trained regression models for predicting the effort and duration of individual tasks using eight machine learning algorithms. The models were validated using k-fold cross-validation and evaluated with several metrics. Ensemble algorithms like Random Forest, Extra Trees Regressor, and XGBoost consistently outperformed non-ensemble ones across the three datasets. However, the estimation accuracy and feature importance varied significantly across datasets, with a Mean Magnitude of Relative Error (MMRE) ranging from 0.11 to 9.45 across the datasets and target variables. Nevertheless, even in the worst-performing dataset, effort estimates aggregated to the project level showed good accuracy, with MMRE = 0.23. Machine learning algorithms, especially ensemble ones, seem to be a viable option for estimating the effort and duration of individual tasks in software projects. However, the quality of the estimates and the relevant features may depend largely on the characteristics of the available datasets and underlying projects. Nevertheless, even when the accuracy of individual estimates is poor, the aggregated estimates at the project level may present a good accuracy due to error compensation.
2023
Autores
Bhanu, M; Roy, S; Priya, S; Mendes Moreira, J; Chandra, J;
Publicação
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
Predicting taxi demands in large cities can help in better traffic management as well as ensure better commuter satisfaction for an intelligent transportation system. However, the traffic demands across different locations have varying spatio-temporal correlations that are difficult to model. Despite the ability of the existing Deep Neural Network (DNN) models to capture the non-linearity in spatial and temporal characteristics of the demand time-series, capturing spatio-temporal characteristics in different real-world scenarios like varying historic and prediction time frame, spatio-temporal variations due to noise or missing data, etc. still remain a big challenge for the state-of-the-art models. In this paper, we introduce Encoder-ApproXimator (EnAppX), an encoder-decoder DNN-based model that uses Chebyshev function approximation in the decoding stage for taxi demand times-series prediction and can better estimate the time-series in the presence of large spatio-temporal variations. Opposed to any existing state-of-the-art model, the proposed model approximates complete spatiotemporal characteristics in the frequency domain which in turn enables the model to make a robust and improved prediction in different scenarios. Validation over two real-world taxi datasets from different cities shows a considerable improvement of around 23% in RMSE scores compared to the state-of-the-art baseline model. Unlike several existing state-of-the-art models, EnAppX also produces improved prediction accuracy across two regions for both to and fro demands.
2023
Autores
Neves, TM; Meireles, L; Moreira, JM;
Publicação
CoRR
Abstract
2023
Autores
Ferreira, PJS; Mendes-Moreira, J; Cardoso, JMP;
Publicação
PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023
Abstract
Human Activity Recognition (HAR) has been a popular research field due to the widespread of devices with sensors and computational power (e.g., smartphones and smartwatches). Applications for HAR systems have been extensively researched in recent literature, mainly due to the benefits of improving quality of life in areas like health and fitness monitoring. However, since persons have different motion patterns when performing physical activities, a HAR system would need to adapt to the characteristics of the user in order to maintain or improve accuracy. Mobile devices, such as smartphones, used to implement HAR systems, have limited resources (e.g., battery life). They also have difficulty adapting to the device's constraints to work efficiently for long periods. In this work, we present a kNN-based HAR system and an extensive study of the influence of hyperparameters (window size, overlap, distance function, and the value of k) and parameters (sampling frequency) on the system accuracy, energy consumption, and response time. We also study how hyperparameter configurations affect the model's performance for the users and the activities. Experimental results show that adapting the hyperparameters makes it possible to adjust the system's behavior to the user, the device, and the target service. These results motivate the development of a HAR system capable of automatically adapting the hyperparameters for the user, the device, and the service.
2023
Autores
Rodrigues, EM; Baghoussi, Y; Mendes-Moreira, J;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Machine learning models are widely used in time series forecasting. One way to reduce its computational cost and increase its efficiency is to select only the relevant exogenous features to be fed into the model. With this intention, a study on the feature selection methods: Pearson correlation coefficient, Boruta, Boruta-Shap, IMV-LSTM, and LIME is performed. A new method focused on interpretability, SHAP-LSTM, is proposed, using a deep learning model training process as part of a feature selection algorithm. The methods were compared in 2 different datasets showing comparable results with lesser computational cost when compared with the use of all features. In all datasets, SHAP-LSTM showed competitive results, having comparatively better results on the data with a higher presence of scarce occurring categorical features.
2023
Autores
Ferreira, PJS; Mendes-Moreira, J; Rodrigues, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Nowadays, all kinds of sensors generate data, and more metrics are being measured. These large quantities of data are stored in large data centers and used to create datasets to train Machine Learning algorithms for most different areas. However, processing that data and training the Machine Learning algorithms require more time, and storing all the data requires more space, creating a Big Data problem. In this paper, we propose simple techniques for reducing large time series datasets into smaller versions without compromising the forecasting capability of the generated model and, simultaneously, reducing the time needed to train the models and the space required to store the reduced sets. We tested the proposed approach in three public and one private dataset containing time series with different characteristics. The results show, for the datasets studied that it is possible to use reduced sets to train the algorithms without affecting the forecasting capability of their models. This approach is more efficient for datasets with higher frequencies and larger seasonalities. With the reduced sets, we obtain decreases in the training time between 40 and 94% and between 46 and 65% for the memory needed to store the reduced sets.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.