What is your RDF browser’s Accept header?

March 17th, 2008

I was debugging some content negotiation related issue the other day and made a little tool that allows me to find out what Accept header different RDF-aware HTTP clients send. If you ever need to know the Accept header of a particular RDF-aware HTTP client, just make it show the RDF loaded from this URI:

http://richard.cyganiak.de/2008/03/rdfbugs/accept.php

The RDF contains a dc:description with the browser’s Accept header. If you have the Tabulator Firefox extension installed, you can simply click the link and see the output.

I tried this with a couple of tools and here are the results:

Tabulator Firefox Extension 0.8.2:

application/rdf+xml, application/xhtml+xml;q=0.3, text/xml;q=0.2,
application/xml;q=0.2, text/html;q=0.3, text/plain;q=0.1, text/n3,
text/rdf+n3;q=0.5, application/x-turtle;q=0.2, text/turtle;q=1

Jena’s Model.read(…) method:

application/rdf+xml, application/xml; q=0.8, text/xml; q=0.7,
application/rss+xml; q=0.3, */*; q=0.2

Disco Hyperdata Browser:

application/rdf+xml;q=1,text/xml;q=0.6,text/rdf+n3;q=0.9,
application/octet-stream;q=0.5,application/xml q=0.5,
application/rss+xml;q=0.5,text/plain; q=0.5,application/x-turtle;q=0.5,
application/x-trig;q=0.5,text/html;q=0.5

OpenLink RDF Browser:

application/rdf+xml, text/rdf+n3, application/rdf+turtle,
application/x-turtle, application/turtle, application/xml, */*

SindiceBot:

application/rdf+xml, application/xml;q=0.6, text/xml;q=0.6

Some of these are pretty funny actually, but that’s a post for another day.

Tabulator does N3

March 17th, 2008

In my podcasted chat with Danny Ayers the other day I said that Tabulator doesn’t support N3, the highly readable RDF serialization syntax developed by Tim Berners-Lee.

It turns out I was wrong. I thought Tabulator supported RDF/XML only. But it turns out Tabulator has excellent support for N3 as well. I’m not sure how I managed to miss this. Seems like the Tab’r team sneaked that feature in while no one (or at least I) was not looking!

To make Tabulator eat your N3, you just have to make sure it’s served with the right content type: text/rdf+n3. If you publish N3, you can test this by installing cURL (see my earlier cURL for SemWebbers tutorial) and running:

curl -I "http://example.com/myfile.n3"

If your server is set up correctly, then the output contains a line like this:

Content-Type: text/rdf+n3; charset=utf-8

If you see e.g. text/plain instead, then your server is misconfigured. If you serve static files with Apache, you can fix this by adding this line to your Apache’s httpd.conf file or to a file called .htaccess in your DocumentRoot:

AddType text/rdf+n3;charset=utf-8 .n3

And then make sure that your filenames end in .n3.

Actually, this is really good news. As far as I’m concerned, the Tabulator Firefox extension, being the most readily accessible Semantic Web client currently out there, defines what is on the Semantic Web and what isn’t. If you can’t browse it in Tabulator, then it isn’t. (Insert caveat about RDFa and SPARQL here, which Tabulator probably will support in the future.)

I really like N3, because it is a much friendlier format than the alternative RDF/XML. It is very easy to read, generate, and even hand-write. More so than, say, JSON. A Semantic Web built on N3 would be a much nicer place than a Semantic Web built on RDF/XML. Tool support for N3 is still not quite as good as for RDF/XML, but we are getting there.

So, what does this mean in practice?

  1. Do you develop or maintain an application that consumes RDF from the Web? If you do, then you should make sure that it understands both RDF/XML and N3.
  2. Do you develop or maintain a library, framework or API that can load and parse RDF from the Web, from a URI? If you do, then you should make sure that it invokes the right parser, depending on the Content-Type header of the response. This should happen completely transparently from your users’ point of view.
  3. Do you author educational material, tutorials or slides that use RDF in examples? Do your audience a favor and do them in N3, not RDF/XML!
  4. If you already produce RDF/XML, as an information publisher, you shouldn’t worry. RDF-aware clients won’t stop supporting RDF/XML.

Making these things happen in the tools and documents that I myself maintain will be quite a bit of work. But I think it’s worth it. N3 is a bit like the old days of HTML, you can actually view source and understand what’s going on. N3 is the human-friendly way of writing RDF.

QOTD: timbl is a document

January 24th, 2008

Simon Spero on the SKOS list:

The meaning of “document” in this context is extremely broad; if we follow Otlet’s definition of a document as anything which can convey information to an observer, the term would seem to cover anything which can have a subject.

By this standard, timbl is a document, but only when someone’s looking.

Ah, the Semantic Web community! Please leave your common sense at the door …

I named 61 HTML elements in 5 minutes.

November 26th, 2007

61

(via Dominik Wagner)

A challenge for semantic query wizards

November 13th, 2007

Bernard Vatant:

A challenge for semantic query wizzards: Find the smallest connected graph having at least a node in each LOD data set.

Interesting. The hardest part might be finding paths through the FOAF bubble.

Any takers?

Finding Rahoon

November 13th, 2007

Rahoon is a nondescript area of Galway, the third-largest city in the Republic of Ireland. Unsurprisingly, I had never heard about Rahoon before last week, when I moved there, to join Giovanni Tummarello’s team at DERI Galway.

Moving is stressful. But being an RDF geek, I not only have to drag physical stuff around, but I also have to update a few triples. For example, by joining DERI I got yet another URI:

http://www.deri.ie/fileadmin/scripts/foaf.php?id=313#me

That’s a new owl:sameAs for my FOAF file. I also have to update my foaf:based_near triple. Until now, my FOAF file stated:

<#cygri> foaf:based_near >http://dbpedia.org/resource/Berlin< .

The obvious new value for this property would be http://dbpedia.org/resource/Galway. But I want to be a bit more specific. That’s where this post’s title comes in. What is the URI of Rahoon?

The big datasets: Unfortunately, Wikipedia doesn’t have an article for Rahoon, hence it isn’t in DBpedia.

Then there’s Geonames, the most comprehensive source of geographical information on the Semantic Web. It has an entry for a nearby structure called Rahoon House, but not for the area itself.

The search engines: Next I tried all the Semantic Web search engines from the Linking Open Data project’s list.

Of the seven available services, only three produced any results. I was quite disappointed that the venerable Swoogle, one of the first large-scale Semantic Web indices, didn’t return any results at all.

SWSE (a DERI project) returned two hits, but neither of them was semantic (a web page mentioning Rahoon and a Bollywood-related RSS item).

Sindice (disclosure — it’s another DERI project, and I am now a team member) did much better, it found Geonames’ Rahoon House, and a bunch of loosely Rahoon-related things from DBpedia, but it didn’t turn up any URI for the Rahoon area itself.

Finally there is Falcons, a recently announced project developed at Nanjing’s Southeast University. Its results are similar to Sindice’s, except that it missed the Geonames entry.

In summary, only Falcons and Sindice found anything of interest, but neither struck gold.

Mint your own? At this point, I perhaps have to accept that the only existing relationship between RDF and Rahoon is the fact that Giovanni has been living here for a while, and that I’m not going to find a URI for the place.

The usual advice at this point is to mint a new URI for the thing in question. But I don’t want to go down this road, because I simply do not feel sufficiently competent in matters of Irish geography. What ”is” Rahoon, really? An administrative area? A geographical region? A postal code? A bus stop? I don’t know.

Based near a blank node: My solution is to ignore the widely accepted wisdom that RDF blank nodes are considered harmful. I will state that I’m living near something, and describe that something as good as I can:

foaf:based_near [
    a pos:Point;
    pos:lat "53.27702";
    pos:long "-9.09019";
    rdfs:label "125 Rosan Glas, Rahoon, Galway, Ireland";
    geo:parentFeature <http://sws.geonames.org/2964180/>
];

The skeleton of this N3 fragment was easily created using my FOAF geolocator (which I recently extended with address search and N3 output — have a look at it if your FOAF file still lacks geolocation!). I added a label and a link to the next-largest Geonames feature (Galway), which I easily found with Sindice.

Rahoon is still without a URI, but I guess I should let it be for now and rather worry about applying for a social security number, registering for taxation, and so forth.

DERI! At any rate, I’m happy to be here and look forward to working with a great team on some very exciting projects.

LazyWeb request

April 11th, 2007

a) A Sudoku solving web service.
b) A Sudoku generating web service.
c) Hook them up to each other.

Objectviewer: Yet another linked data browser

April 3rd, 2007

Via Troy Self’s introduction to the Linking Open Data list I came across the Objectviewer, yet another Semantic Web browser based on the linked data principles. This increases the number of available Semantic Web browser prototypes to four: Tabulator, Disco, the OpenLink Ajax Toolkit browser, and now ObjectViewer.

ObjectViewer has quite a nice visualization of the browsed resource as a simple graph, which isn’t really all that useful in practice, but always makes for stunning demos. A live example on some dbpedia data is here (the browser chrome is missing, I couldn’t figure out how to make a proper direct link), and a screenshot is below.

Objectviewer screenshot

Webby data everywhere!

Neil Bartlett: “StatSVN helps startups get funded”

March 15th, 2007

Neil Batlett has an interesting take on StatSVN and StatCVS:

One problem that startup companies often have is demonstrating to investors that they’re actually doing something productive rather than just pouring away money on office plants, Herman Miller chairs, and playing foosball all day. … One thing you can do is show the evolution of your code over a period of time using a tool like StatSVN.

Lines of code are certainly not the most meaningful numbers, but they are a nice and simple way of demonstrating activity. Sometimes that’s all you need.

Less code: eRDF templates for RDF-driven web sites

March 15th, 2007

Keith Alexander experiments with using eRDF markup to populate HTML templates:

I was writing a php template, marking it up with eRDF, and I realised that what I was doing was describing variables with triples - which is essentially what I would be doing to write a SPARQL query to retrieve data for the template.

So the core of the idea is: using semantic markup in a template to generate queries, retrieve data and populate the template.

I have started to implement the idea, using eRDF for the semantic markup, SPARQL as the query language I generate to, and Smarty as the templating language. (I use the ARC RDF PHP classes for parsing the eRDF into triples, and for running the SPARQL queries).

Keith has blogged this in much more detail here, including code and template samples.

This is quite a clever idea. Let’s assume you have a web application driven by data from an RDF triple store. You generate HTML pages by querying the triple store and inserting the bits and pieces into an HTML template. Now if you add eRDF or RDFa annotations to the HTML template, in a way that reflects the original RDF data, then by definition the annotations completely specify what data you need to populate the page. And the template itself therefore must be sufficient to extract all the required triples from the store. No coding needed!

So, generalising the approach and glossing over many details: Just take a big ball of RDF data (dump or behind a SPARQL endpoint), and throw a bunch of HTML templates with embedded annotations at it, and you get a dynamic web site without writing any code. And the web site will have complete semantic annotations.

That’s an example of what becomes possible after you’ve payed the RDF tax.

Keith points out that this is similar to what Fresnel is designed to do, but I have to say that I find this template-based approach more appealing.

(Via [simile-general])

Trilingual word mashup

March 13th, 2007

The German readers will appreciate my mix of surprise and horror when I realized I had just typed this word in an email:

folksonomymäßig

A word that is certain to hurt the sensibilities of every lover of either the English, Latin, or German language. Now if I manage to sprinkle a little bit of french into the mix …

SPARUL—SPARQL Update Language

March 9th, 2007

Andy Seaborne announces a first draft of SPARUL, the SPARQL/Update Language:

This document describes SPARQL/Update (nicknamed “SPARUL”), an update language for RDF graphs. It uses a syntax derived form SPARQL. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs with the Graph Store. A binding of SPARQL/Update using HTTP POST is described.

Max Völkel and I did a very rough proposal for a similar language back last year. We received some criticism over this: Tunneling application protocols over HTTP is not an optimal use of the web. Case in point: the WS-* stack. I tried to work out the issues by asking how RESTful SQL would look like, a potentially illuminating analogy. I found the results inconclusive—I understand the concerns raised by REST proponents, but haven’t seen a better alternative.

The main question, I think, is one of scope: Is SPARQL Update intended as an SQL-like language that applications use to communicate with their local or nearby data store? Or is it intended as public web infrastructure, similar to Web 2.0 APIs and HTTP PUT?

The SPARUL proposal doesn’t really take a position here, although this might be interpreted as a nod towards the former:

An update service that is separate from the query service has the advantages that different security mechanisms can be applied and that the query interface remains a legal, SPARQL service.

So, public query service and local update service?

An answer to all (well, some) of your URI questions

March 5th, 2007
  • Aren’t URNs much more elegant than those brittle HTTP URIs?
  • Why is everyone yapping about 303 redirects?
  • Hash vs. slash?
  • What’s the deal with content negotiation and the Semantic Web?
  • Shouldn’t we use blank nodes anyway?

There’s a lot of confusion around URIs on the Semantic Web. You have to do quite a bit of reading and trial-and-error to arrive at effective solutions. Leo, Max and I wrote Cool URIs for the Semantic Web (Leo’s announcement) to take some of the pain out of this process.

A couple of random companion posts from my archives:

And, always worth a link:

URIs for exceptions?

March 4th, 2007

Over in the comments to Henry Story’s bug ontology post, I wrote:

There should be RDF representations of program error reports, such as Java exceptions. Then I could SPARQL for “NullPointerException in class so-and-so of project foobar”, and possibly a solution has been filed, or at least I will find a related bug.

Drew Perttula adds:

As to Richard’s “RDF representations of program error reports”, see http://themongoose.sourceforge.net for one of several projects that hash up the stack trace into an error id. Those seem like they could lead to excellent automatic URLs which can be later associated with the tracking of the bug that makes that stack trace. I’d love to get an error and paste its url directly in my browser to see “this error has [n] frequency in the last few weeks; [these] other users have been experiencing it; [this] developer is working on the bug fix, and the details for that bug are [here]”.

This would be very useful and is entirely doable. Exceptions should have URIs that resolve to the project’s issue tracker or web-based support forum.

Dr. Chris Bizer

February 15th, 2007

Congrats, Chris!

Getting FOAF files from the desktop to the Web

February 8th, 2007

Henry Story considers design alternatives for a FOAF-enabled personal address book that works as a desktop application. How will it publish the users’ FOAF profile to the Web?

The first scenario considered by Henry is an individual who wants to publish to her own webspace. Here, in my eyes, FTP is king. Henry is right when he says:

[FTP is] a little tricky for the end user as he would have to understand the relation between the directory structure of the ftp server and its mapping to the web server urls.

But FTP is everywhere, and Web geeks are able to figure it out. This 75% user experience will be much better than the current “write RDF/XML by hand or use FOAF-a-matic” approach. (Anyway, it’s what I use to publish my FOAF file.)

The next thing could be WebDAV because it is fairly common and could provide a 90% user experience. As for the other options: scp has not enough users, APP is still too obscure, and HTTP PUT has already failed in the marketplace.

Henry also wonders about server configuration. Servers have to be set up for the correct MIME type, 303 redirects and so on. This has to be done differently depending on the server.

Don’t bother. Put foaf.rdf on the server, take foaf.rdf#me as the person’s URI. When this works, then you can think about adding server type detection code and a “Use cool URIs” checkbox that drops the proper .htaccess file on the server. Keep in mind what’s possible.

The enterprise is Henry’s second scenario:

These companies already have a huge amount of information on their employees in their ldap directory. This is reliable and authoritative information, and should be used to generate foaf files for each employee. … Now the question is: should this foaf file be read only or read/write? If it is read/write then an agent … could overwrite the file with different information from that stored in ldap, which could cause confusion, and be frowned upon.

Both the user’s desktop application and the company’s LDAP server can contribute useful information. How to combine them? Henry suggests two solutions. The first one – the server could compare the client’s file to his own data, and reject any contradictory bits – doesn’t convince me, it puts too much complexity on the server.

The second one is an external link in the read-only company-generated RDF. It points to another RDF file that can be edited by the desktop application just as in the other scenario. I like it. And there’s already a perfect property for the link: the good old rdfs:seeAlso.

Multiple files with links between them are much simpler than files of mixed ownership. That’s why we call it Linked Data.

It may seem easy, but you have to practice before you become good at it

February 8th, 2007

Ze Frank explains the art of procrastination.

Procrastinating may seem easy, but you have to practice a bit before you become good at it. … Beginning procrastinators should start with small, solvable tasks that are related to, but not identical to the thing that’s being put off … If you get very good at procrastinating, you will find that you have many things you want to put off at once! If this happens, you are ready for more generalized procrastination techniques that can be applied to any situation. These are called addictions. …

Insightful and funny, as usual.

My top three favourite procrastination techniques:

  1. Get an RSS reader and build a huge subscription list.
  2. Get an instant messenger and build a huge buddy list.
  3. Start a blog. For bonus points, start a blog on procrastination.

Apple - Thoughts on Music

February 7th, 2007

Steve Jobs: Thoughts on Music

The third alternative is to abolish DRMs entirely. Imagine a world where every online store sells DRM-free music encoded in open licensable formats. In such a world, any player can play music purchased from any store, and any store can sell music which is playable on all players. This is clearly the best alternative for consumers, and Apple would embrace it in a heartbeat. If the big four music companies would license Apple their music without the requirement that it be protected with a DRM, we would switch to selling only DRM-free music on our iTunes store.

(via Doc Searls)

Debugging Semantic Web sites with cURL

February 6th, 2007

Here at our group we spend a lot of time preaching the benefits of dereferenceable URIs. We often want to know if a certain URI supports all the fancy HTTP tricks that are the cornerstones of RDF publishing best practices, like 303 redirects and content negotiation.

My tool of choice for this is cURL, a command-line HTTP client that makes a useful addition to any Semantic Web developer’s toolbox. This tutorial shows how to use cURL to test Semantic Web URIs and to diagnose some common problems.

Getting cURL: Windows users can get cURL binaries from here, the first “non-SSL binary” version will work. Find curl.exe in the archive and drop it somewhere on the path, e.g. in C:\Windows. On Mac OS X and most Linux versions cURL is pre-installed.

To test cURL, open a command prompt and invoke

curl http://example.com/

You should see the HTML source code of the Example Web Page.

So let’s see some of the things we can do with cURL.

Checking content types: On the Web, content types are used to distinguish between content in different formats, e.g. human-readable HTML (Content-Type: text/html) and machine-readable RDF/XML data (Content-Type: application/rdf+xml). When you request a URI, the server sends the content type and other HTTP headers along with the response. Many Semantic Web clients don’t work properly unless RDF content is served with the correct content type.

To check this with cURL, use the -I parameter. This will show the HTTP headers sent by the server.

curl -I http://sites.wiwiss.fu-berlin.de/suhl/bizer/foaf.rdf

The URL is the FOAF file of Chris Bizer. Result:

HTTP/1.1 200 OK
Content-Length: 13746
Content-Type: application/rdf+xml
Last-Modified: Thu, 18 Jan 2007 10:27:22 GMT
Accept-Ranges: bytes
ETag: “bf3d723deb3ac71:54d”
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Tue, 06 Feb 2007 10:52:51 GMT

The important line is the Content-Type header. We see that the file is served as application/rdf+xml, just as it should be. If we would see text/plain here, or if the Content-Type header was missing, then the server configuration would need fixing.

Checking for 303 redirects: RDF publishers often use 303 redirects to distinguish between URLs for Web documents and URIs for Semantic Web resources. The idea is that when I fetch the URI of a non-document thing (e.g. a person or country or OWL class), then the response will send me to the location of a document describing the thing. Let’s see if the FOAF vocabulary correctly implements 303 redirects. What happens if I fetch foaf:knows?

curl -I http://xmlns.com/foaf/0.1/knows

Response:

HTTP/1.1 303 See Other
Date: Mon, 05 Feb 2007 19:09:55 GMT
Server: Apache/1.3.37 (Unix)
Location: http://xmlns.com/foaf/0.1/
Content-Type: text/html; charset=iso-8859-1

There’s the 303 status code, and the Location header gives the URL of the document that describes the foaf:knows property. In this case the FOAF specification.

If we got a 200 OK status code instead, then the URI would need fixing because foaf:knows is an RDF property and not a document.

Content negotiation: Good Semantic Web servers are configured to do another trick: They will redirect Semantic Web browsers to RDF documents, while plain old Web browsers are sent to HTML documents. To simulate a Semantic Web browser, we have to send an HTTP header Accept: application/rdf+xml along with the request. This is done using cURL’s -H parameter:

curl -I -H "Accept: application/rdf+xml" http://www4.wiwiss.fu-berlin.de/dblp/resource/person/103481

Response:

HTTP/1.1 303 See Other
Date: Tue, 06 Feb 2007 11:23:55 GMT
Server: Jetty/5.1.10 (Windows 2003/5.2 x86 java/1.5.0_09
Location: http://www4.wiwiss.fu-berlin.de/dblp/sparql?query=DESCRIBE+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fdblp%2Fresource%2Fperson%2F103481%3E
Content-Type: text/plain

If we send the same request without the header, we get:

HTTP/1.1 303 See Other
Date: Tue, 06 Feb 2007 11:25:20 GMT
Server: Jetty/5.1.10 (Windows 2003/5.2 x86 java/1.5.0_09
Location: http://www4.wiwiss.fu-berlin.de/dblp/page/person/103481
Content-Type: text/plain

And checking the two locations we will find that the first one serves RDF/XML, while the second one serves HTML.

Summary: So here’s how to examine URIs with cURL.

Check the contents that a normal web browser will see:

curl <uri>

Check the response headers that a normal web browser will see:

curl -I <uri>

Check the contents that a Semantic Web browser will see:

curl -H "Accept: application/rdf+xml" <uri>

Check the response headers that a Semantic Web browser will see:

curl -I -H "Accept: application/rdf+xml" <uri>

You can’t tell if a URI will work on the Semantic Web just by opening it in a Web browser. But you can tell with cURL.

The Web in five minutes

February 4th, 2007

This excellent (in content and style) short video by Michael Wesch is making the rounds. It perfectly captures the essence of the Web at its state circa 2007.

I’d love to see this extended with another 30 seconds devoted to RDF. Or maybe not, because RDF on the Web is still more a vision than a reality, but we are getting there …

(And a whacky prediction: This kind of fast, visual propaganda flick will be the PowerPoint of the future.)

(via Christian Katzenbach)

(Oops, I got Michael’s name wrong, now fixed.)