Recent Submissions

  • Cardamom: Comparative deep models for minority and historical languages 

    McCrae, John Philip; Fransen, Theodorus (Language Technologies for All (LT4All), 2019-12-05)
    This paper gives an overview of the Cardamom project, which aims to close the resource gap for minority and under-resourced languages by means of deep-learning-based natural language processing (NLP) and exploiting ...
  • A sentiment analysis dataset for code-mixed Malayalam-English 

    Chakravarthi, Bharathi Raja; Jose, Navya; Suryawanshi, Shardul; Sherly, Elizabeth; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
    There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels ...
  • Corpus creation for sentiment analysis in code-mixed Tamil-English text 

    Chakravarthi, Bharathi Raja; Muralidaran, Vigneshwaran; Priyadharshini, Ruba; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
    Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to ...
  • Is trust between AI institutions and the public “morally rotten?” 

    Gleifer, Vaz Alves; Dennis, Louise; Fisher, Michael; Behan, Anthony; Babushkina, Dina; Merdes, Christoph; Archer, Ken; Ní Fhaoláin, Labhaoise; Hines, Andrew; Michael, Loizos; Cardoso, C. Rafael; Ene, Daniel; Evans, Tom; Dennis, Louise; Kaur, Satwant; Carter, Sarah; Grancagnolo, Sergio; Greidinger, Steven (Machine Ethics Research Group, School of Computer Science, University College Dublin, 2020)
    Developing artificial Intelligence (AI) technology has become a business of power. AI innovation is increasingly centralized in a few large companies – mainly, Google, Facebook, and Apple.1 Specialized data scientists - ...
  • A survey of current datasets for code-switching research 

    Jose, Navya; Chakravarthi, Bharathi Raja; Suryawanshi, Shardul; Sherly, Elizabeth; McCrae, John P. (IEEE, 2020-03-06)
    Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together ...
  • A comparative study of different state-of-the-art hate speech detection methods in Hindi-English code-mixed data 

    Rani, Priya; Suryawanshi, Shardul; Goswami, Koustava; Chakravarthi, Bharathi Raja; Fransen, Theodorus; McCrae, John P. (European Language Resources Association (ELRA), 2020-05-11)
    Hate speech detection in social media communication has become one of the primary concerns to avoid conflicts and curb undesired activities. In an environment where multilingual speakers switch among multiple languages, ...
  • NUIG at TIAD: Combining unsupervised NLP and graph metrics for translation inference 

    McCrae, John P.; Arcan, Mihael (European Language Resources Association (ELRA), 2020-05-11)
    In this paper, we present the NUIG system at the TIAD shard task. This system includes graph-based metrics calculated using novel algorithms, with an unsupervised document embedding tool called ONETA and an unsupervised ...
  • A dataset for troll classification of Tamil memes 

    Chakravarthi, Bharathi Raja; Varma, Pranav; Arcan, Mihael; McCrae, John P.; Buitelaar, Paul; Shardul, Suryawanshi (European Language Resources Association (ELRA), 2020-05-11)
    Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people. This exchange is not free from offensive, trolling or malicious contents ...
  • Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text 

    Suryawanshi, Shardul; Chakravarthi, Bharathi Raja; Arcan, Mihael; Buitelaar, Paul (European Language Resources Association (ELRA), 2020-05-11)
    A meme is a form of media that spreads an idea or emotion across the internet. As posting meme has become a new form of communication of the web, due to the multimodal nature of memes, postings of hateful memes or related ...
  • A term extraction approach to survey analysis in health care 

    Robin, Cécile; Isazad Mashinchi, Mona; Ahmadi Zeleti, Fatemeh; Ojo, Adegboyega; Buitelaar, Paul (European Language Resources Association, 2020-05)
    The voice of the customer has for a long time been a key focus of businesses in all domains. It has received a lot of attention from the research community in Natural Language Processing (NLP) resulting in many approaches ...
  • Challenges of word sense alignment: Portuguese language resources 

    Salgado, Ana; Ahmadi, Sina; Simões, Alberto; McCrae, John P.; Costa, Rute (National University of Ireland Galway, 2020-05-16)
    This paper reports on an ongoing task of monolingual word sense alignment in which a comparative study between the Portuguese Academy of Sciences Dictionary and the Dicionario Aberto ´ is carried out in the context of the ...
  • A corpus of the Sorani Kurdish folkloric lyrics 

    Ahmadi, Sina; Hassani, Hossein; Abedi, Kamaladdin (National University of Ireland Galway, 2020-05-16)
    Kurdish poetry and prose narratives were historically transmitted orally and less in a written form. Being an essential medium of oral narration and literature, Kurdish lyrics have had a unique attribute in becoming a ...
  • A multilingual evaluation dataset for monolingual word sense alignment 

    Ahmadi, Sina; McCrae, John P.; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen, Bolette S.; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgård, Thomas; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váradi, Tamás; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe, Yifat Ben; Rudich, Maya; Ahmad, Raya Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransen, Theodorus; Cillessen, David; Lindemann, David; Alonso, Mikel; Salgado, Ana; Sancho, Jose Luis; Urena-Ruiz, Rafael-J.; Zamorano, Jordi Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdih, Andrej; Gabrovsek, Dejan (National University of Ireland Galway, 2020-05-16)
    Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually ...
  • Defying Wikidata: Validation of terminological relations in the web of data 

    Martín-Chozas, Patricia; Ahmadi, Sina; Montiel-Ponsoda, Elena (National University of Ireland Galway, 2020-05-16)
    In this paper we present an approach to validate terminological data retrieved from open encyclopaedic knowledge bases. This need arises from the enrichment of automatically extracted terms with information from existing ...
  • Biological applications of knowledge graph embedding models 

    Mohamed, Sameh K.; Nounu, Aayah; Nováček, Vít (Oxford University Press (OUP), 2020-02-17)
    Complex biological systems are traditionally modelled as graphs of interconnected biological entities. These graphs, i.e. biological knowledge graphs, are then processed using graph exploratory approaches to perform different ...
  • Taxonomy extraction for customer service knowledge base construction 

    Pereira, Bianca; Robin, Cécile; Daudert, Tobias; McCrae, John P.; Mohanty, Pranab; Buitelaar, Paul (Springer, 2019-11-04)
    Customer service agents play an important role in bridging the gap between customers vocabulary and business terms. In a scenario where organisations are moving into semi-automatic customer service, se- mantic technologies ...
  • Detecting bot behaviour in social media using digital DNA compression 

    Pasricha, Nivranshu; Hayes, Conor (AICS (Artificial Intelligence and Cognitive Science) 2019, 2019-12-05)
    A major challenge faced by online social networks such as Facebook and Twitter is the remarkable rise of fake and automated bot accounts over the last few years. Some of these accounts have been reported to engage in ...
  • Back-translation approach for code-switching machine translation: A case study 

    Masoud, Maraim; Torregrosa, Daniel; Buitelaar, Paul; Arčan, Mihael (AICS2019, 2019-12-05)
    Recently, machine translation has demonstrated significant progress in terms of translation quality. However, most of the research has focused on translating with pure monolingual texts in the source and the target side ...
  • Veritas annotator: Discovering the origin of a rumour 

    Azevedo, Lucas; Moustafa, Mohamed (Association for Computational Linguistics (ACL), 2019-11-03)
    Defined as the intentional or unintentional spread of false information (K et al., 2019) through context and/or content manipulation, fake news has become one of the most serious problems associated with online ...
  • Truth or lie: Automatically fact checking news 

    Azevedo, Lucas (ACM, 2018-04-23)
    In the actual scenario of ever-growing data consumption speed and quantity, factors like news source decentralization, citizen journalism and democratization of media, make the task of manually checking and correcting ...

View more