Thursday, September 19, 2013

Meet the ontop team!

Babak, Martin, Timi, Mariano, Mindaugas, Guohui

-ontop- v1.9 released!

We are happy to announce the availability of  -ontop- v1.9

a Java framework to query RDBMS using SPARQL over RDF(s) and OWL ontologies. Get
it at:


 HIGHLIGHT OF CHANGES in v1.9

This release continues with the project's refactoring, cleaning of code, dependencies and internal documentation. We also added two very exciting new features. And last, a bunch of implementation improvements related to, both, performance and bug fixing. Here goes a summary:


  • FEATURE: Hybrid RDF graphs! This is a unique new feature in ontop that no other R2RML system has (TTBOMK). Now you have combine data coming from mappings and still have some RDF triples. This lets you store most of the data on the DB, but keep some facts of knowledge in the ontology as triples (ABox facts). For more info see this post:

    http://ontop-obda.blogspot.it/2013/09/hybrid-rdf-graphs-or-hybrid-aboxes-as.html
  • FEATURE: Now ontop supports mappings with URI templates in Class or Properties! For more info check this link:
     
    http://ontop-obda.blogspot.it/2013/09/uri-templates-for-properties-and-classes.html
       
  • IMPROVEMENT: We cleaned a lot of dependencies in ontop, you will see that our package is now half the size as before.
  • IMPROVEMENT: Upgraded libraries. Now we link to Sesame 2.7.6, OWLAPI 3.4.5 and Protege 4.3
  • BUF FIXES: Critical bug fixes in CONSTRUCT and DESCRIBE queries, as well as in the code that matches URI's to URI templates. Several bugfixes in ontopPro and the SPARQL end-point.

   

NEW ONTOP TUTORIAL

We also prepared a new tutorial that guides you through the first steps of using ontop and ontopPro (the Protege 4 plugin), and how it can be used for data access and data integration. Find it here:

http://ontop-obda.blogspot.it/2013/09/new-ontop-tutorial.html
   

Cheers,
The -ontop- team

URI Templates for properties and classes

Since v1.9 ontop supports mappings with URI templates or variables in property or class locations. This means now you can write mappings like:

:person/{ID} rdf:type :Person{OCCUPATION}
SELECT ID, OCCUPATION FROM tbl_person

or like:

:person/{id} <{attribute}> {value}
SELECT id, attribute, value FROM tbl_data

This kind of mappings are very useful when some of the vocabulary of the ontology is in the DB.

Semantics

The new mappings are just syntactic sugar for normal mappings. What the ontop does internally is that, during initialisation time, it will transform these mappings into traditional mappings with fixed predicates/classes by inspecting the values in the DB. This is done at initialistation time, and changes to the DB in columns related to these mappings will not be taken into account by the system. For example, suppose the table tbl_person is as follows:


ID OCCUPATION
1 Researcher
2 Researcher
3 Doctor
4 Driver
5 Driver

If your mappings looks like:

:person/{ID} rdf:type :{OCCUPATION}
SELECT ID, OCCUPATION FROM tbl_person

ontop will translate it into the following 3 mappings (one for each distinct value of OCCUPATION):

:person/{ID} rdf:type :Researcher
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Researcher"

:person/{ID} rdf:type :Doctor
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Doctor"

:person/{ID} rdf:type :Driver
SELECT ID, OCCUPATION FROM tbl_person WHERE OCCUPATION="Driver"

If a new row is inserted in tbl_person in which a new occupation is introduced, e.g., (6,"Singer"), the system will not update itself. You need to restart it.

Limitations and Performance

The code that implements this is not very robust and the system may fail to create the real mappings if the original SQL query is not a simple SELECT-PROJECT-JOIN query. 

Also, each mapping of these form requires that ontop queries the database to find out the values of the DB. This could be expensive, depending on the database.

Give it a try and let us know how it goes!

Hybrid RDF Graphs (or, Hybrid ABoxes, as you want to see it ;-) )

This is a exciting new feature that allows to combine virtual RDF (mappings) with real RDF (or, virtual ABoxes with ABox assertions). This is a unique feature in ontop, in other systems either you have mappings and everything is about SPARQL to SQL, or you have triples and you have a triple store. 

With hybrid RDF graphs you can have an ontology with axioms and data as follows (in turtle syntax):

Axiomatic triples
:ceoOf rdfs:domain        :CEO
:CEO   rdfs:subClassOf    :BusinessMan
:ceoOf rdfs:subPropertyOf :worksFor

Data triples
:Bill_Gates :ceoOf :Microsoft 

Mappings
:person/{ID} :knows :Bill_Gates
SELECT ID FROM tbl_microsoft_employees


Note how the mapping states that all people that are created from IDs in tbl_microsoft_employees know Bill Gates. Bill gates is a sort of "global" individual. Moreover, we also know some things about Bill Gates, i.e., that he is the CEO of Microsoft. And we know some things about the business world, i.e., that the domain of ceoOf is a CEO, that a CEO is a kind of BussinesMan, and that being a ceo of a a company is one way of working for that company. 

Now we execute queries like the following and get the answers that we expect:

SELECT ?x ?y WHERE {
   ?x :knows ?y. ?y a :BusinessMan ; :worksFor :Microsoft 
}

As always, ontop will translate this SPARQL query into an SQL query, and in this particular case the query will look something like this:

SELECT "person/{ID}" as x, ":Bill_Gates" as y
FROM tbl_microsoft_employees

Notice that there is a lot going on here, this is not just query translation. There was reasoning going, involving all axioms in the ontology, the data triples and the mappings. In the end, we arrive to the simple, efficient query that we would write manually, and that will get you great performance even in the presence of large volumes of data.

Why to use hybrid RDF graphs?

This functionality is useful when you have large volumes of data, which wouldn't be efficient to translate into RDF and you want to keep in the original RDBMS, but at the same time you have some (not so large volume of) data that you want to use during query answering. The smaller dataset is to little to bother to insert it into the RDBM and make mappings for it, or it simply belongs in the ontology, i.e., it is domain knowledge, not application data.

Limitations

This functionality is available only for Class and Object Properties. That is, you may not have data triples like: 

:Bill_Gates :age "57"^^xsd:integer
:Bill_Gates :name "William Henry Gates"

Performance

Using hybrid RDF graphs may slow down the query rewriting process. The system deals with rdf triples as if they where mappings that require nothing from the DB. That means that all those facts are considered during the SQL generation, and having too many of them may slow things down during query translation.

Free variables: Particularly, query rewriting maybe become slow in queries that have "free classes" or "free properties" in the graph patterns, for example:

SELECT ?x ?p WHERE { ?x ?p :mariano }

or

SELECT ?x ?c WHERE { ?x rdf:type ?c }

If you are experiencing slow query rewriting because of this, try to avoid having these "free" patterns in isolation. Use them only if there is a "non-free" section of the query with which you can JOIN them. This will restrict the query and will limit the facts that are involved in answering your query, making everything faster. For example:

SELECT ?x ?c 
WHERE {?x :hasFather ?y. ?x :hasAge ?z. ?x rdf:type ?c }

JOIN order: At the moment, make sure that any triple patterns in SPARQL that are related to data triples are at the end of the query. Specially those with free predicates. For example, this is not good

SELECT ?x ?c 
WHERE { ?x rdf:type ?c . ?x :hasFather ?y. ?x :hasAge ?z. }

but this is good:

SELECT ?x ?c 
WHERE {?x :hasFather ?y. ?x :hasAge ?z. ?x rdf:type ?c }


A good join order is the one in which triple patterns which are more "restricted" come first. For virtual RDF graphs (pure mappings) this doesn't matter, but for Hybrid it might matter a lot. In the future we hope to improve this, but for the moment you should take it into account.

Number of data triples: The number of facts (data triples) will affect performance of query rewiring. How much is "too big" and when query rewriting may become slow depends on your memory, machine, the SPARQL query and how much the ABox interacts with the Tbox. However, the current implementation should allow for a few thousand ABox assertions in normal hardware.  


Give this kind of modelling a try and let us know how it goes!

Wednesday, September 18, 2013

New ontop tutorial

A few weeks ago there was the short-Protege course in Vienna. We were invited to give a talk on ontopPro and now I want to share the material.

It's tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.

The material includes all the worked out mappings, SPARQL queries and SQL databases and it's full with useful hints on how to use OBDA for different purposes. 

Please take a look and send me any feedback you may have :)


Monday, September 2, 2013

Issues in last release

Hi, in the last release we introduced two critical bugs that are affecting DESCRIBE and general SELECT queries. In particular, depending on the URI templates used in mappings, queries might return empty results when they shouldn't.

We are now fixing this and we will make a release ASAP to fix this issues.