Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Álvaro Figueira

2024

Clustering source code from automated assessment of programming assignments

Autores
Paiva, JC; Leal, JP; Figueira, A;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Clustering of source code is a technique that can help improve feedback in automated program assessment. Grouping code submissions that contain similar mistakes can, for instance, facilitate the identification of students' difficulties to provide targeted feedback. Moreover, solutions with similar functionality but possibly different coding styles or progress levels can allow personalized feedback to students stuck at some point based on a more developed source code or even detect potential cases of plagiarism. However, existing clustering approaches for source code are mostly inadequate for automated feedback generation or assessment systems in programming education. They either give too much emphasis to syntactical program features, rely on expensive computations over pairs of programs, or require previously collected data. This paper introduces an online approach and implemented tool-AsanasCluster-to cluster source code submissions to programming assignments. The proposed approach relies on program attributes extracted from semantic graph representations of source code, including control and data flow features. The obtained feature vector values are fed into an incremental k-means model. Such a model aims to determine the closest cluster of solutions, as they enter the system, timely, considering clustering is an intermediate step for feedback generation in automated assessment. We have conducted a twofold evaluation of the tool to assess (1) its runtime performance and (2) its precision in separating different algorithmic strategies. To this end, we have applied our clustering approach on a public dataset of real submissions from undergraduate students to programming assignments, measuring the runtimes for the distinct tasks involved: building a model, identifying the closest cluster to a new observation, and recalculating partitions. As for the precision, we partition two groups of programs collected from GitHub. One group contains implementations of two searching algorithms, while the other has implementations of several sorting algorithms. AsanasCluster matches and, in some cases, improves the state-of-the-art clustering tools in terms of runtime performance and precision in identifying different algorithmic strategies. It does so without requiring the execution of the code. Moreover, it is able to start the clustering process from a dataset with only two submissions and continuously partition the observations as they enter the system.

2021

Analysing students' interaction sequences on Moodle to predict academic performance

Autores
Cunha, A; Figueira, Á;

Publicação
CEUR Workshop Proceedings

Abstract
As e-Learning systems have become gradually prevalent, forcing a (sometimes needed) physical distance between lecturers and their students, new methods need to emerge to fill this enlarging gap. Educators need, more than ever, systems capable of warning them (and the students) of situations that might create future problems for the learning process. The capacity to give and get feedback is naturally the best way to overcome this problem. However, in e-learning contexts, with dozens or hundreds of students, the solution becomes less simple. In this work we propose a system capable of continuously giving feedback on the performance of the students based on the interaction sequences they undertake with the LMS. This work innovates in what concerns the sequences of activity accesses together with the computation of the duration of these online learning activities, which are then encoded and fed into machine learning algorithms. We used a longitudinal experiment from five academic years. From our set of classifiers, the Random Forest obtained the best results for preventing low grades, with an accuracy of nearly 87%.

  • 18
  • 18