[BioPMDS] Pedagogically Motivated Distantly Supervised Relation Extraction dataset for Biology domain

Download the corpus directly from the GitHub repository. The corpus in JSON format is stored on corpus/ folder.

Citation

@inproceedings{sainz-etal-2020-domain,
    title = "Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction",
    author = "Sainz, Oscar  and
      Lopez de Lacalle, Oier  and
      Aldabe, Itziar  and
      Maritxalar, Montse",
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.270",
    pages = "2213--2222",
    abstract = "In this paper we present a relation extraction system that given a text extracts pedagogically motivated relation types, as a previous step to obtaining a semantic representation of the text which will make possible to automatically generate questions for reading comprehension. The system maps pedagogically motivated relations with relations from ConceptNet and deploys Distant Supervision for relation extraction. We run a study on a subset of those relationships in order to analyse the viability of our approach. For that, we build a domain-specific relation extraction system and explore two relation extraction models: a state-of-the-art model based on transfer learning and a discrete feature based machine learning model. Experiments show that the neural model obtains better results in terms of F-score and we yield promising results on the subset of relations suitable for pedagogical purposes. We thus consider that distant supervision for relation extraction is a valid approach in our target domain, i.e. biology.",
    language = "English",
    ISBN = "979-10-95546-34-4",
}