Search
Now showing items 1-10 of 38
µRaptor: A DOM-based system with appetite for hCard elements
(CEUR-WS.org, 2014)
This paper describes µRaptor, a DOM-based method to extract hCard microformats from HTML pages stripped of microformat markup. µRaptor extracts DOM sub-trees, converts them into rules, and uses them to extract hCard ...
The ACL RD-TEC: Annotation Guideline (Ver 1.0)
(Insight Centre for Data Analytics, 2014)
Annotation Guidelines for the ACL RD-TEC (ver 1.0) is set out in this document. The annotator is required to understand the meaning of term, technology term, and invalid term before commencing the annotation task. A de ...
The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics
(2014)
This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational ...
GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research
(2014)
Cancer genomics researchers have greatly benefited from high-throughput technologies for the characterization of genomic alterations in patients. These voluminous genomics datasets when supplemented with the appropriate ...
Discovering Domain-Specific Public SPARQL Endpoints: A Life-Sciences Use-Case
(2014)
A significant portion of the LOD cloud consists of Life Sciences data sets. The LOD cloud contains billions of clinical
facts linked together forming an interlinked Web of Clinical Data . However, tools for new publishers ...
Using linked data to mine RDF from wikipedia's tables
(ACM, 2014)
The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of ...
SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the texts
(ACMCEUR-WS.org, 2014)
Despite of advances in digital document processing, exploration of implicit relationships
within large amounts of textual resources can still be daunting. This
is partly due to the ‘black-box’ nature of most current ...
Learning content patterns from linked data
(CEUR-WS.org, 2014)
Linked Data (LD) datasets (e.g., DBpedia, Freebase) are used in many knowledge extraction tasks due to the high variety of domains they cover. Unfortunately, many of these datasets do not provide a description for their ...
Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction
(2014)
Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, high-dimensional vectors represent linguistic entities. In an ...
Where is the News Breaking? Towards a Location-based Event Detection Framework for Journalists
(Springer, 2014)
The rise of user-generated content (UCG) as a source of information in the journalistic lifecycle is driving the need for automated methods to detect, filter, contextualise and verify citizen reports of breaking news ...










