UNIPA, first place in the 9th International Challenge of Author Profiling

From Unipa (University of Palermo) – Un research team of the Engineering department ofUniversity of Palermo, consisting of PhD student in ICT, dr. Marco Siino, by profs. Ilenia Tinnirello is Marco La Cascia, and with the collaboration of Dr. Elisa Again, PhD student in Digital Humanities toUniversity of Turin, Yes is ranked first, on over 60 participating research groups from around the world, al 9th International Challenge of Author Profiling announced by PAN Lab, a competition on the automatic analysis of texts and natural languages, organized on the occasion of the conference CLEF 2021.

“PAN is a series of scientific events and shared tasks on digital text forensics and stylometry – explains the research team – This year, among the proposed tasks, the one related to author profiling concerned the automatic recognition of hate speech (HS), defined as any text that expressed hatred towards a person or a group based on some characteristic such as race color, ethnicity, gender, sexual orientation, nationality, religion or otherwise. Given the huge amount of user-generated content on the web, the problem consisted in automatically identifying, and therefore possibly counteracting the spread, of HS, in order to combat, for example, phenomena such as misogyny, xenophobia or cyberbullying. . To this end, for this specific challenge, the task envisaged identifying the possible spreading users of HS on Twitter as a first step to prevent the spread of texts with hate content among online users. Specifically, the goal was to rank a user as likely hate speech spreader or not, based on their last 200 tweets.

The developed model – they continue – belonging to the branch of Artificial Intelligence related to Deep Learning methods commonly used in the field of Computer Vision, but less frequently applied in the field of Natural Language Processing, it is a convolutional neural network applied to a non-pre-trained layer of word embedding. The proposed multilingual architecture managed to correctly classify 85% of the user profiles of the Spanish language dataset and 73% of those present in the English language dataset, totaling an average of 79% on the entire multilingual dataset ”.

This post is also available in: English