Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2021

Embedding Traffic Network Characteristics Using Tensor for Improved Traffic Prediction

Autores
Bhanu, M; Mendes Moreira, J; Chandra, J;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Techniques for using multi-way traffic patterns for traffic prediction is gaining importance. One of the possible techniques for representing the multi-way traffic patterns is tensors. Tensor decomposition is used to generate low-rank approximations of the original tensor that is subsequently used for traffic volume prediction. However, the existing tensor-based approaches do not consider certain important mutual relationships among the locations like temporal traffic reciprocity that can improve the prediction accuracy. In this paper, we introduce TeDCaN, a "Tensor Decomposition method with Characteristic Network" constraints that generate low rank approximations of the original tensor considering the traffic reciprocity at different pair of locations. Investigations using large traffic datasets from 2 different cities reveal that the prediction accuracy of TeDCaN considerably outperforms several state-of-art baselines for cases when complete traffic data is available as well as situations when a certain fraction of the data is missing - a likely scenario in many real datasets. We discover that TeDCaN achieves around 20% reduction in the RMSE scores as compared to the baselines. TeDCaN is applicable in many operations on such a big traffic network where the existing models would either be inapplicable or hard to perform. As one of the major yields, TeDCaN generates a "reduced dimensional network embedding" that captures the similarity of the nodes considering the traffic volume as well as the reciprocity of traffic between the nodes.

2021

An ensemble of autonomous auto-encoders for human activity recognition

Autores
Garcia, KD; de Sa, CR; Poel, M; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF; Kok, JN;

Publicação
NEUROCOMPUTING

Abstract
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that can learn complex representations of the data and are commonly used for anomaly detection. In this work we propose a novel multi-class algorithm which consists of an ensemble of auto-encoders where each auto-encoder is associated with a unique class. We compared the proposed approach with other state-of-the-art approaches in the context of human activity recognition. Experimental results show that ensembles of auto-encoders can be efficient, robust and competitive. Moreover, this modular classifier structure allows for more flexible models. For example, the extension of the number of classes, by the inclusion of new auto-encoders, without the necessity to retrain the whole model. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

2021

An analysis of Monte Carlo simulations for forecasting software projects

Autores
Miranda, P; Faria, JP; Correia, FF; Fares, A; Graça, R; Moreira, JM;

Publicação
SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021

Abstract
Forecasts of the effort or delivery date can play an important role in managing software projects, but the estimates provided by development teams are often inaccurate and time-consuming to produce. This is not surprising given the uncertainty that underlies this activity. This work studies the use of Monte Carlo simulations for generating forecasts based on project historical data. We have designed and run experiments comparing these forecasts against what happened in practice and to estimates provided by developers, when available. Comparisons were made based on the mean magnitude of relative error (MMRE). We did also analyze how the forecasting accuracy varies with the amount of work to be forecasted and the amount of historical data used. To minimize the requirements on input data, delivery date forecasts for a set of user stories were computed based on takt time of past stories (time elapsed between the completion of consecutive stories); effort forecasts were computed based on full-time equivalent (FTE) hours allocated to the implementation of past stories. The MMRE of delivery date forecasting was 32% in a set of 10 runs (for different projects) of Monte Carlo simulation based on takt time. The MMRE of effort forecasting was 20% in a set of 5 runs of Monte Carlo simulation based on FTE allocation, much smaller than the MMRE of 134% of developers' estimates. A better forecasting accuracy was obtained when the number of historical data points was 20 or higher. These results suggest that Monte Carlo simulations may be used in practice for delivery date and effort forecasting in agile projects, after a few initial sprints. © 2021 ACM.

2021

Benchmark of Encoders of Nominal Features for Regression

Autores
Seca, D; Moreira, JM;

Publicação
Trends and Applications in Information Systems and Technologies - Volume 1, WorldCIST 2021, Terceira Island, Azores, Portugal, 30 March - 2 April, 2021.

Abstract
Mixed-type data is common in the real world. However, supervised learning algorithms such as support vector machines or neural networks can only process numerical features. One may choose to drop qualitative features, at the expense of possible loss of information. A better alternative is to encode them as new numerical features. Under the constraints of time, budget, and computational resources, we were motivated to search for a general-purpose encoder but found the existing benchmarks to be limited. We review these limitations and present an alternative. Our benchmark tests 16 encoding methods, on 15 regression datasets, using 7 distinct predictive models. The top general-purpose encoders were found to be Catboost, LeaveOneOut, and Target. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2021

An Analysis of the State of the Art of Machine Learning for Risk Assessment in Software Projects (S)

Autores
Sousa, A; Faria, JP; Moreira, JM;

Publicação
The 33rd International Conference on Software Engineering and Knowledge Engineering, SEKE 2021, KSIR Virtual Conference Center, USA, July 1 - July 10, 2021.

Abstract
Risk management is one of the ten knowledge areas discussed in the Project Management Body of Knowledge (PMBOK), which serves as a guide that should be followed to increase the chances of project success. The popularity of research regarding the application of risk management in software projects has been consistently growing in recent years, particularly with the application of machine learning techniques to help identify risk levels or risk factors of a project before the project development begins, with the intent of improving the likelihood of success of software projects. This paper provides an overview of various concepts related to risk and risk management in software projects, including traditional techniques used to identify and control risks in software projects, as well as machine learning techniques and methods which have been applied to provide better estimates and classification of the risk levels and risk factors that can be encountered during the development of a software project. The paper also presents an analysis of machine learning oriented risk management studies and experiments found in the literature as a way of identifying the type of inputs and outputs, as well as frequent algorithms used in this research area.

2021

Transportation Mode Detection from GPS data: A Data Science Benchmark study

Autores
Muhammad, AR; Aguiar, A; Mendes Moreira, J;

Publicação
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC)

Abstract
Understanding the distribution of people's transportation mode is a crucial facet of today's urban mobility for proper transportation planning. The penetration of smartphones combined with their sensing capability is an enabler for crowdsourcing large mobility data such as commuters' GPS records. In this paper, we leverage the GPS traces of commuters to infer five different transportation modes frequently used in urban areas including foot, bike, bus, car and metro. We compare three different approaches commonly reported in the literature for transportation mode detection from the family of machine learning algorithms (random forest -RF) and deep learning architectures (convolutional neural network -CNN and ensemble of autoencoders -EAE). By splitting the dataset into train-test by the period of data collection, as well as the conventional 80-20 split, we evaluate the impact of several data pre-processing decisions on overall classifiers' performance. Our results show RF and CNN performing better upon evaluation on classification metrics such as the f1 score and the area under the Receiver Operating Characteristics (ROC) curve.

  • 137
  • 510