Extracting journalistic narratives from text and representing them in a narrative modeling language
Nowadays, journalistic content is distributed in multiple formats, mostly through the web and specific internet-based applications running on smartphones and tablets. Text is a very important format, but readers (or more accurately users or information consumers) heavily rely on images, videos, slideshows, charts and infographics. Textual content is still the main representation for information. Any journalistic subject (e.g. Trump and Russia) is described in one or more texts produced by journalists and possibly commented by readers. Many of those subjects are followed during days, weeks or months. To grasp a possibly vast and somewhat complex set of interconnected news articles, readers would greatly benefit from tools that summarize those articles by showing main actors, their interplay and their trajectories in time and space, their motivations, main events, causal relations of events and outcomes. In the Text2Story project we use Artificial Intelligence, Computer Science and Linguistics to automatically extract those narrative elements using a well-defined semantic framework and re-represent them in formats that convey the essential story but that are more efficiently consumed by the users. The project is lead by INESC TEC in collaboration with researchers from the Center of Linguistics of U. Porto, the Lusa News Agency and Jornal Público.