Publicacoes - INESC TEC

Publicações

2024

Document Level Event Extraction from Narratives

Autores
Cunha, LF;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
One of the fundamental tasks in Information Extraction (IE) is Event Extraction (EE), an extensively studied and challenging task [13,15], which aims to identify and classify events from the text. This involves identifying the event's central word (trigger) and its participants (arguments) [1]. These elements capture the event semantics and structure, which have applications in various fields, including biomedical texts [42], cybersecurity [24], economics [12], literature [32], and history [33]. Structured knowledge derived from EE can also benefit other downstream tasks such as Question Answering [20,30], Natural Language Understanding [21], Knowledge Base Graphs [3,37], summarization [8,10,41] and recommendation systems [9,18]. Despite the existence of several English EE systems [2,22,25,26], they face limited portability to other languages [4] and most of them are designed for closed domains, posing difficulties in generalising. Furthermore, most current EE systems restrict their scope to the sentence level, assuming that all arguments are contained within the same sentence as their corresponding trigger. However, real-world scenarios often involve event arguments spanning multiple sentences, highlighting the need for document-level EE.

FecharLer Abstract

2024

New skills in symbolic data analysis for official statistics

Autores
Verde R.; Batagelj V.; Brito P.; Silva A.P.D.; Korenjak-Cerne S.; Dobša J.; Diday E.;

Publicação
Statistical Journal of the IAOS

Abstract
The paper draws attention to the use of Symbolic Data Analysis (SDA) in the field of Official Statistics. It is composed of three sections presenting three pilot techniques in the field of SDA. The three contributions range from a technique based on the notion of exactly unified summaries for the creation of symbolic objects, a model-based approach for interval data as an innovative parametric strategy in this context, and measures of similarity defined between a class and a collection of classes based on the frequency of the categories which characterize them. The paper shows the effectiveness of the proposed approaches as prototypes of numerous techniques developed within the SDA framework and opens to possible further developments.

FecharLer Abstract

2024

A case study on phishing detection with a machine learning net

Autores
Bezerra, A; Pereira, I; Rebelo, MA; Coelho, D; de Oliveira, DA; Costa, JFP; Cruz, RPM;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Phishing attacks aims to steal sensitive information and, unfortunately, are becoming a common practice on the web. Email phishing is one of the most common types of attacks on the web and can have a big impact on individuals and enterprises. There is still a gap in prevention when it comes to detecting phishing emails, as new attacks are usually not detected. The goal of this work was to develop a model capable of identifying phishing emails based on machine learning approaches. The work was performed in collaboration with E-goi, a multi-channel marketing automation company. The data consisted of emails collected from the E-goi servers in the electronic mail format. The problem consisted of a classification problem with unbalanced classes, with the minority class corresponding to the phishing emails and having less than 1% of the total emails. Several models were evaluated after careful data selection and feature extraction based on the email content and the literature regarding these types of problems. Due to the imbalance present in the data, several sampling methods based on under-sampling techniques were tested to see their impact on the model's ability to detect phishing emails. The final model consisted of a neural network able to detect more than 80% of phishing emails without compromising the remaining emails sent by E-goi clients.

FecharLer Abstract

2024

A Language for Explaining Counterexamples

Autores
Ferreira Moreira, EJV; Campos, JC;

Publicação
13th Symposium on Languages, Applications and Technologies, SLATE 2024, July 4-5, 2024, Águeda, Portugal

Abstract
Model checkers can automatically verify a system’s behavior against temporal logic properties. However, analyzing the counterexamples produced in case of failure is still a manual process that requires both technical and domain knowledge. However, this step is crucial to understand the flaws of the system being verified. This paper presents a language created to support the generation of natural language explanations of counterexamples produced by a model checker. The language supports querying the properties and counterexamples to generate the explanations. The paper explains the language components and how they can be used to produce explanations. © Ezequiel José Veloso Ferreira Moreira and José Creissac Campos.

FecharLer Abstract Ler Publicação Completa

2024

Explainable Multimodal Deep Learning for Heart Sounds and Electrocardiogram Classification

Autores
Oliveira, B; Lobo, A; Botelho Costa, CIA; Carvalho, RF; Coimbra, MT; Renna, F;

Publicação
46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024, Orlando, FL, USA, July 15-19, 2024

Abstract
We introduce a Gradient-weighted Class Activation Mapping (Grad-CAM) methodology to assess the performance of five distinct models for binary classification (normal/abnormal) of synchronized heart sounds and electrocardiograms. The applied models comprise a one-dimensional convolutional neural network (1D-CNN) using solely ECG signals, a two-dimensional convolutional neural network (2D-CNN) applied separately to PCG and ECG signals, and two multimodal models that employ both signals. In the multimodal models, we implement two fusion approaches: an early fusion and a late fusion. The results indicate a performance improvement in using an early fusion model for the joint classification of both signals, as opposed to using a PCG 2D-CNN or ECG 1D-CNN alone (e.g., ROC-AUC score of 0.81 vs. 0.79 and 0.79, respectively). Although the ECG 2D-CNN demonstrates a higher ROC-AUC score (0.82) compared to the early fusion model, it exhibits a lower F1-score (0.85 vs. 0.86). Grad-CAM unveils that the models tend to yield higher gradients in the QRS complex and T/P-wave of the ECG signal, as well as between the two PCG fundamental sounds (S1 and S2), for discerning normalcy or abnormality, thus showcasing that the models focus on clinically relevant features of the recorded data.

FecharLer Abstract

2024

Personalized choice model for forecasting demand under pricing scenarios with observational data-The case of attended home delivery

Autores
Ali, ÖG; Amorim, P;

Publicação
INTERNATIONAL JOURNAL OF FORECASTING

Abstract
Discrete choice models can forecast market shares and individual choice probabilities with different price and alternative set scenarios. This work introduces a method to personalize choice models involving causal variables, such as price, using rich observational data. The model provides interpretable customer- and context-specific preferences, and price sensitivity, with an estimation procedure that uses orthogonalization. We caution against the nalive use of regularization to deal with the high-dimensional observational data challenge. We experiment with the attended home delivery (AHD) slot choice problem using data from a European online retailer. Our results indicate that while the popular non-personalized multinomial logit (MNL) model does very well at the aggregate (day-slot) level, personalization provides significantly and substantially more accurate predictions at the individual-context level. But the nalive personalization approach using regularization without orthogonalization wrongly predicts that the choice probability will increase if the slot price increases, rendering it unfit for forecasting demand with pricing scenarios. The proposed method avoids this problem. Further, we introduce features based on potential consideration sets in the AHD slot choice context that increase accuracy and allow for more realistic substitution patterns than the proportional substitution implied by MNL.

FecharLer Abstract

83
4183