Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CRACS

2013

BigYAP: Exo-compilation meets UDI

Autores
Costa, VS; Vaz, D;

Publicação
THEORY AND PRACTICE OF LOGIC PROGRAMMING

Abstract
The widespread availability of large data-sets poses both an opportunity and a challenge to logic programming. A first approach is to couple a relational database with logic programming, say, a Prolog system with MySQL. While this approach does pay off in cases where the data cannot reside in main memory, it is known to introduce substantial overheads. Ideally, we would like the Prolog system to deal with large data-sets in an efficient way both in terms of memory and of processing time. Just In Time Indexing (JITI) was mainly motivated by this challenge, and can work quite well in many application. Exo-compilation, designed to deal with large tables, is a next step that achieves very interesting results, reducing the memory footprint over two thirds. We show that combining exo-compilation with Just In Time Indexing can have significant advantages both in terms of memory usage and in terms of execution time. An alternative path that is relevant for many applications is User-Defined Indexing (UDI). This allows the use of specialized indexing for specific applications, say the spatial indexing crucial to any spatial system. The UDI sees indexing as pluggable modules, and can naturally be combined with Exo-compilation. We do so by using UDI with exo-data, and incorporating ideas from the UDI into high-performance indexers for specific tasks.

2013

Score As You Lift (SAYL): A statistical relational learning approach to uplift modeling

Autores
Nassif, H; Kuusisto, F; Burnside, ES; Page, D; Shavlik, J; Santos Costa, V;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
We introduce Score As You Lift (SAYL), a novel Statistical Relational Learning (SRL) algorithm, and apply it to an important task in the diagnosis of breast cancer. SAYL combines SRL with the marketing concept of uplift modeling, uses the area under the uplift curve to direct clause construction and final theory evaluation, integrates rule learning and probability assignment, and conditions the addition of each new theory rule to existing ones. Breast cancer, the most common type of cancer among women, is categorized into two subtypes: an earlier in situ stage where cancer cells are still confined, and a subsequent invasive stage. Currently older women with in situ cancer are treated to prevent cancer progression, regardless of the fact that treatment may generate undesirable side-effects, and the woman may die of other causes. Younger women tend to have more aggressive cancers, while older women tend to have more indolent tumors. Therefore older women whose in situ tumors show significant dissimilarity with in situ cancer in younger women are less likely to progress, and can thus be considered for watchful waiting. Motivated by this important problem, this work makes two main contributions. First, we present the first multi-relational uplift modeling system, and introduce, implement and evaluate a novel method to guide search in an SRL framework. Second, we compare our algorithm to previous approaches, and demonstrate that the system can indeed obtain differential rules of interest to an expert on real data, while significantly improving the data uplift. © 2013 Springer-Verlag.

2013

A preliminary investigation into predictive models for adverse drug events

Autores
Davis, J; Costa, VS; Peissig, P; Caldwell, M; Page, D;

Publicação
AAAI Workshop - Technical Report

Abstract
Adverse drug events are a leading cause of danger and cost in health care. We could reduce both the danger and the cost if we had accurate models to predict, at prescription time for each drug, which patients are most at risk for known adverse reactions to that drug, such as myocardial infarction (MI, or "heart attack") if given a Cox2 inhibitor, angioedema if given an ACE inhibitor, or bleeding if given an anticoagulant such as Warfarin. We address this task for the specific case of Cox2 inhibitors, a type of non-steroidal anti-inflammatory drug (NSAID) or pain reliever that is easier on the gastrointestinal system than most NSAIDS. Because of the MI adverse drug reaction, some but not all very effective Cox2 inhibitors were removed from the market. Specifically, we use machine learning to predict which patients on a Cox2 inhibitor would suffer an MI. An important issue for machine learning is that we do not know which of these patients might have suffered an MI even without the drug. To begin to make some headway on this important problem, we compare our predictive model for MI for patients on Cox2 inhibitors against a more general model for predicting MI among a broader population not on Cox2 inhibitors. Copyright

2013

CrowdTargeting: Making Crowds More Personal

Autores
Costa, J; Silva, C; Ribeiro, B; Antunes, M;

Publicação
2013 8TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2013)

Abstract
Crowdsourcing is a bubbling research topic that has the potential to be applied in numerous online and social scenarios. It consists on obtaining services or information by soliciting contributions from a large group of people. However, the question of defining the appropriate scope of a crowd to tackle each scenario is still open. In this work we compare two approaches to define the scope of a crowd in a classification problem, casted as a recommendation system. We propose a similarity measure to determine the closeness of a specific user to each crowd contributor and hence to define the appropriate crowd scope. We compare different levels of customization using crowd-based information, allowing non-experts classification by crowds to be tuned to substitute the user profile definition. Results on a real recommendation data set show the potential of making crowds more personal, i.e. of tuning the crowd to the crowdtarget.

2013

Customized crowds and active learning to improve classification

Autores
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Traditional classification algorithms can be limited in their performance when a specific user is targeted. User preferences, e.g. in recommendation systems, constitute a challenge for learning algorithms. Additionally, in recent years user's interaction through crowdsourcing has drawn significant interest, although its use in learning settings is still underused. In this work we focus on an active strategy that uses crowd-based non-expert information to appropriately tackle the problem of capturing the drift between user preferences in a recommendation system. The proposed method combines two main ideas: to apply active strategies for adaptation to each user; to implement crowdsourcing to avoid excessive user feedback. A similitude technique is put forward to optimize the choice of the more appropriate similitude-wise crowd, under the guidance of basic user feedback. The proposed active learning framework allows non-experts classification performed by crowds to be used to define the user profile, mitigating the labeling effort normally requested to the user. The framework is designed to be generic and suitable to be applied, to different' scenarios, whilst customizable for each specific user. A case study on humor classification scenario is used to demonstrate experimentally that the approach can improve baseline active results.

2013

Defining Semantic Meta-hashtags for Twitter Classification

Autores
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publicação
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, ICANNGA 2013

Abstract
Given the wide spread of social networks, research efforts to retrieve information using tagging from social networks communications have increased. In particular, in Twitter social network, hashtags are widely used to define a shared context for events or topics. While this is a common practice often the hashtags freely introduced by the user become easily biased. In this paper, we propose to deal with this bias defining semantic meta-hashtags by clustering similar messages to improve the classification. First, we use the user-defined hashtags as the Twitter message class labels. Then, we apply the meta-hashtag approach to boost the performance of the message classification. The meta-hashtag approach is tested in a Twitter-based dataset constructed by requesting public tweets to the Twitter API. The experimental results yielded by comparing a baseline model based on user-defined hashtags with the clustered meta-hashtag approach show that the overall classification is improved. It is concluded that by incorporating semantics in the meta-hashtag model can have impact in different applications, e.g. recommendation systems, event detection or crowdsourcing.

  • 117
  • 192