2023
Autores
Rodrigues, EM; Baghoussi, Y; Mendes-Moreira, J;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Machine learning models are widely used in time series forecasting. One way to reduce its computational cost and increase its efficiency is to select only the relevant exogenous features to be fed into the model. With this intention, a study on the feature selection methods: Pearson correlation coefficient, Boruta, Boruta-Shap, IMV-LSTM, and LIME is performed. A new method focused on interpretability, SHAP-LSTM, is proposed, using a deep learning model training process as part of a feature selection algorithm. The methods were compared in 2 different datasets showing comparable results with lesser computational cost when compared with the use of all features. In all datasets, SHAP-LSTM showed competitive results, having comparatively better results on the data with a higher presence of scarce occurring categorical features.
2023
Autores
Ferreira, PJS; Mendes-Moreira, J; Rodrigues, A;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I
Abstract
Nowadays, all kinds of sensors generate data, and more metrics are being measured. These large quantities of data are stored in large data centers and used to create datasets to train Machine Learning algorithms for most different areas. However, processing that data and training the Machine Learning algorithms require more time, and storing all the data requires more space, creating a Big Data problem. In this paper, we propose simple techniques for reducing large time series datasets into smaller versions without compromising the forecasting capability of the generated model and, simultaneously, reducing the time needed to train the models and the space required to store the reduced sets. We tested the proposed approach in three public and one private dataset containing time series with different characteristics. The results show, for the datasets studied that it is possible to use reduced sets to train the algorithms without affecting the forecasting capability of their models. This approach is more efficient for datasets with higher frequencies and larger seasonalities. With the reduced sets, we obtain decreases in the training time between 40 and 94% and between 46 and 65% for the memory needed to store the reduced sets.
2023
Autores
Pedroto, M; Jorge, A; Mendes-Moreira, J; Coelho, T;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II
Abstract
Transthyretin (TTR)-related familial amyloid polyneuropathy (ATTRv) is a life-threatening autosomal dominant disease and the age of onset represents the moment when first symptoms are felt. Accurately predicting the age of onset for a given patient is relevant for risk assessment and treatment management. In this work, we evaluate the impact of combining prediction models obtained from neighboring time windows on prediction error. We propose Symmetric (Sym) and Asymmetric (Asym) models which represent two different averaging approaches. These are incorporated with a weighting mechanism as to create Symmetric (Sym), Symmetric-weighted (Sym-w), Asymmetric (Asym), and Asymmetric-weighted (Asym-w). These four ensemble models are then compared to the original approach which is focused on individual regression base learners namely: Baseline (BL), Decision Tree (DT), Elastic Net (EN), Lasso (LA), Linear Regression (LR), Random Forest (RF), Ridge (RI), Support Vector Regressor (SV) and XGBoost (XG). Our results show that by aggregating predictions from neighbor models the average mean absolute error obtained by each base learner decreases. Overall, the best results are achieved by regression-based ensemble tree models as base learners.
2024
Autores
Pedroto, M; Coelho, T; Fernandes, J; Oliveira, A; Jorge, A; Mendes Moreira, J;
Publicação
AMYLOID-JOURNAL OF PROTEIN FOLDING DISORDERS
Abstract
BackgroundHereditary transthyretin amyloidosis (ATTRv amyloidosis) is an inherited disease, where the study of family history holds importance. This study evaluates the changes of age-of-onset (AOO) and other age-related clinical factors within and among families affected by ATTRv amyloidosis.MethodsWe analysed information from 934 trees, focusing on family, parents, probands and siblings relationships. We focused on 1494 female and 1712 male symptomatic ATTRV30M patients. Results are presented alongside a comparison of current with historical records. Clinical and genealogical indicators identify major changes.ResultsOverall, analysis of familial data shows the existence of families with both early and late patients (1/6). It identifies long familial follow-up times since patient families tend to be diagnosed over several years. Finally, results show a large difference between parent-child and proband-patient relationships (20-30 years).ConclusionsThis study reveals that there has been a shift in patient profile, with a recent increase in male elderly cases, especially regarding probands. It shows that symptomatic patients exhibit less variability towards siblings, when compared to other family members, namely the transmitting ancestors' age of onset. This can influence genetic counselling guidelines.
2024
Autores
Tuna, R; Baghoussi, Y; Soares, C; Mendes-Moreira, J;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT II, IDA 2024
Abstract
Forecasting methods are affected by data quality issues in two ways: 1. they are hard to predict, and 2. they may affect the model negatively when it is updated with new data. The latter issue is usually addressed by pre-processing the data to remove those issues. An alternative approach has recently been proposed, Corrector LSTM (cLSTM), which is a Read & Write Machine Learning (RW-ML) algorithm that changes the data while learning to improve its predictions. Despite promising results being reported, cLSTM is computationally expensive, as it uses a meta-learner to monitor the hidden states of the LSTM. We propose a new RW-ML algorithm, Kernel Corrector LSTM (KcLSTM), that replaces the meta-learner of cLSTM with a simpler method: Kernel Smoothing. We empirically evaluate the forecasting accuracy and the training time of the new algorithm and compare it with cLSTM and LSTM. Results indicate that it is able to decrease the training time while maintaining a competitive forecasting accuracy.
2024
Autores
Kumar, R; Mendes-Moreira, J; Chandra, J;
Publicação
ACM Transactions on Knowledge Discovery from Data
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.