Publicacoes - INESC TEC

Publicações

Publicações por João Mendes Moreira

2021

Benchmark of Encoders of Nominal Features for Regression

Autores
Seca, D; Moreira, JM;

Publicação
Trends and Applications in Information Systems and Technologies - Volume 1, WorldCIST 2021, Terceira Island, Azores, Portugal, 30 March - 2 April, 2021.

Abstract
Mixed-type data is common in the real world. However, supervised learning algorithms such as support vector machines or neural networks can only process numerical features. One may choose to drop qualitative features, at the expense of possible loss of information. A better alternative is to encode them as new numerical features. Under the constraints of time, budget, and computational resources, we were motivated to search for a general-purpose encoder but found the existing benchmarks to be limited. We review these limitations and present an alternative. Our benchmark tests 16 encoding methods, on 15 regression datasets, using 7 distinct predictive models. The top general-purpose encoders were found to be Catboost, LeaveOneOut, and Target. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

FecharLer Abstract

2021

Predicting Predawn Leaf Water Potential up to Seven Days Using Machine Learning

Autores
Fares, AA; Vasconcelos, F; Mendes-Moreira, J; Ferreira, C;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
Sustainable agricultural production requires a controlled usage of water, nutrients, and minerals from the environment. Different strategies of plant irrigation are being studied to control the quantity and quality balance of the fruits. Regarding efficient irrigation, particularly in deficit irrigation strategies, it is essential to act according to water stress status in the plant. For example, in the vine, to improve the quality of the grapes, the plants are deprived of water until they reach particular water stress before re-watered in specified phenological stages. The water status inside the plant is estimated by measuring either the Leaf Potential during the Predawn or soil water potential, along with the root zones. Measuring soil water potential has the advantage of being independent of diurnal atmospheric variations. However, this method has many logistic problems, making it very hard to apply along all the yard, especially the big ones. In this study, the Predawn Leaf Water Potential (PLWP) is daily predicted by Machine Learning models using data such as grapes variety, soil characteristics, irrigation schedules, and meteorological data. The benefits of these techniques are the reduction of the manual work of measuring PLWP and the capacity to implement those models on a larger scale by predicting PLWP up to 7 days which should enhance the ability to optimize the irrigation plan while the quantity and quality of the crop are under control.

FecharLer Abstract

2021

An Analysis of the State of the Art of Machine Learning for Risk Assessment in Software Projects (S)

Autores
Sousa, A; Faria, JP; Moreira, JM;

Publicação
The 33rd International Conference on Software Engineering and Knowledge Engineering, SEKE 2021, KSIR Virtual Conference Center, USA, July 1 - July 10, 2021.

Abstract
Risk management is one of the ten knowledge areas discussed in the Project Management Body of Knowledge (PMBOK), which serves as a guide that should be followed to increase the chances of project success. The popularity of research regarding the application of risk management in software projects has been consistently growing in recent years, particularly with the application of machine learning techniques to help identify risk levels or risk factors of a project before the project development begins, with the intent of improving the likelihood of success of software projects. This paper provides an overview of various concepts related to risk and risk management in software projects, including traditional techniques used to identify and control risks in software projects, as well as machine learning techniques and methods which have been applied to provide better estimates and classification of the risk levels and risk factors that can be encountered during the development of a software project. The paper also presents an analysis of machine learning oriented risk management studies and experiments found in the literature as a way of identifying the type of inputs and outputs, as well as frequent algorithms used in this research area.

FecharLer Abstract

2021

Transportation Mode Detection from GPS data: A Data Science Benchmark study

Autores
Muhammad, AR; Aguiar, A; Mendes Moreira, J;

Publicação
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC)

Abstract
Understanding the distribution of people's transportation mode is a crucial facet of today's urban mobility for proper transportation planning. The penetration of smartphones combined with their sensing capability is an enabler for crowdsourcing large mobility data such as commuters' GPS records. In this paper, we leverage the GPS traces of commuters to infer five different transportation modes frequently used in urban areas including foot, bike, bus, car and metro. We compare three different approaches commonly reported in the literature for transportation mode detection from the family of machine learning algorithms (random forest -RF) and deep learning architectures (convolutional neural network -CNN and ensemble of autoencoders -EAE). By splitting the dataset into train-test by the period of data collection, as well as the conventional 80-20 split, we evaluate the impact of several data pre-processing decisions on overall classifiers' performance. Our results show RF and CNN performing better upon evaluation on classification metrics such as the f1 score and the area under the Receiver Operating Characteristics (ROC) curve.

FecharLer Abstract

2021

A Data-Driven Simulator for Assessing Decision-Making in Soccer

Autores
Mendes-Neves, T; Mendes-Moreira, J; Rossetti, RJF;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
Decision-making is one of the crucial factors in soccer (association football). The current focus is on analyzing data sets rather than posing what if questions about the game. We propose simulation-based methods that allow us to answer these questions. To avoid simulating complex human physics and ball interactions, we use data to build machine learning models that form the basis of an event-based soccer simulator. This simulator is compatible with the OpenAI GYM API. We introduce tools that allow us to explore and gather insights about soccer, like (1) calculating the risk/reward ratios for sequences of actions, (2) manually defining playing criteria, and (3) discovering strategies through Reinforcement Learning.

FecharLer Abstract

2021

Applying Machine Learning to Risk Assessment in Software Projects

Autores
Sousa, A; Faria, JP; Mendes-Moreira, J; Gomes, D; Henriques, PC; Graca, R;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II

Abstract
Risk management is one of the ten knowledge areas discussed in the Project Management Body of Knowledge (PMBOK), which serves as a guide that should be followed to increase the chances of project success. The popularity of research regarding the application of risk management in software projects has been consistently growing in recent years, especially with the application of machine learning techniques to help identify risk levels of risk factors of a project before its development begins, with the goal of improving the likelihood of success of these projects. This paper presents the results of the application of machine learning techniques for risk assessment in software projects. A Python application was developed and, using Scikit-learn, two machine learning models, trained using software project risk data shared by a partner company of this project, were created to predict risk impact and likelihood levels on a scale of 1 to 3. Different algorithms were tested to compare the results obtained by high performance but non-interpretable algorithms (e.g., Support Vector Machine) and the ones obtained by interpretable algorithms (e.g., Random Forest), whose performance tends to be lower than their non-interpretable counterparts. The results showed that Support Vector Machine and Naive Bayes were the best performing algorithms. Support Vector Machine had an accuracy of 69% in predicting impact levels, and Naive Bayes had an accuracy of 63% in predicting likelihood levels, but the results presented in other evaluation metrics (e.g., AUC, Precision) show the potential of the approach presented in this use case.

FecharLer Abstract