Show simple item record

dc.contributor.advisorSchuhmann, Dietrich Rebholz
dc.contributor.authorHasnain, Syed Muhammad Ali
dc.date.accessioned2017-05-15T13:20:49Z
dc.date.available2017-05-15T13:20:49Z
dc.date.issued2017-05-12
dc.identifier.urihttp://hdl.handle.net/10379/6518
dc.description.abstractDuring recent years the increasing adoption of Open Data Initiatives and Lined Data principles have lead to the creation of a globally distributed space of Linked Data that covers various domains such as Government, Libraries, Life Sciences, Media, Geographic and Social web. Approaches that conceive this data space as a huge distributed data sources and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. The Life Sciences domain has been one of the early adopters of Linked Data, and at present a considerable portion of the Linked Open Data cloud is comprised of datasets from Life Sciences Linked Open Data, known as LS-LOD. Although the publication of datasets as RDF is a necessary step towards achieving unified querying of biological datasets, it is not enough to achieve the interoperability necessary to enable a query-able Web of Life Sciences data. This can be achieved either by “a priori integration”, by ensuring multiple datasets make use of the same vocabularies and ontologies, or, alternatively using “a posteriori integration”, which makes use of mapping rules that change the topology of graphs such that integrated queries become possible. “a posteriori integration”, in Biomedical and Life Science data sources is the topic of this thesis. This dissertation first provides an analysis of freely and openly available data sources (SPARQL endpoints). Public SPARQL endpoints were analysed with two considerations i. What is the content of a public SPARQL endpoint? and ii. How self descriptive are these endpoints? For analysing public SPARQL endpoints we defined a set of self descriptive SPARQL queries. After this analysis we introduce the notion, namely Autonomous Resource Discovery and Indexing (ARDI), for facilitating “a posteriori integration”, in Biomedical and Life Science data sources. In particular, we introduce a Cataloguing and Linking mechanism that enables us to formally query Biomedical and Life Sciences Linked Open Data on the World Wide Web (WWW). As of 31st March 2016, the ARDI consists of 263,731 triples representing 12,658 distinct classes, 1,792 distinct properties and 13,027 distinct Orphan classes catalogued from 137 public SPARQL endpoints. Based on these Cataloguing and Linking approaches, we propose BioFed which is a federated query processing engine for Life Sciences Linked Open Data. BioFed offers a single-point-of-access for distributed Life Science data which enables scientists to access the data from reliable sources without extensive expertise in SPARQL query formulation. BioFed federates SPARQL queries over more than 137 public SPARQL endpoints. After demonstrating ARDI and its practical applications, this dissertation focuses on presenting Linked Biomedical Dataspace (LBDS) that enables the semantically-enriched representation, exposure, interconnection, querying and browsing of Biomedical data and knowledge in a standardised and homogenised way. We provide three practical scenarios known as workflows for using proposed LBDS and also list the Lessons Learned and Recommendations for developing different components of LBDS as we believe our gained insights will be useful for LD practitioners and researchers working on the topics similar to those covered in this thesis.en_IE
dc.subjectSPARQLen_IE
dc.subjectSemantic weben_IE
dc.subjectLinked open dataen_IE
dc.subjectLinked biomedical dataspaceen_IE
dc.subjectFederated query processingen_IE
dc.subjectBiomedical dataen_IE
dc.subjectaPosteriori integrationen_IE
dc.titleCataloguing and linking publicly available biomedical SPARQL endpoints for federation - addressing aPosteriori data integrationen_IE
dc.typeThesisen_IE
dc.local.noteSPARQL Endpoints Federation addressing “a posteriori integration” using mapping rules that change the topology of graphs such that integrated queries become possible in Biomedical and Life Science data sources.en_IE
dc.local.finalYesen_IE
nui.item.downloads10790


Files in this item

Attribution-NonCommercial-NoDerivs 3.0 Ireland
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.

The following license files are associated with this item:

Thumbnail

This item appears in the following Collection(s)

Show simple item record