This started as a comment on Danny Ayers’ Crystal post, in which he defends RDF against Les Orchard’s punch:
Why don’t we have a Xanadu web run on Lisp serving up perfect, crystalline RDF?
Danny:
RDF is pretty much at the point of the minimum structure needed to express data – the simple 3-part statement.
I somewhere have a hefty printout that vehemently disagrees with this assertion. It’s called “RDF Semantics.”
If all you want is the minimum structure needed to express data, then that’s about 45 pages of semantic conditions too much. True, I don’t have to know all of them to use RDF. But think of the implementers!
And don’t even get me started on RDF/XML.
RDF is in the Xanadu-Lisp-SGML-ISO/OSI camp. Let’s figure out which parts of it are fit for the purpose and which are not.
[Update:] Danny answers, and I continue the thread. But Bill de hÕra says it best:
“The model itself is conceptually simple”
Yeah, but if what you said was true, we’d be awash in RDF backed systems. We’re not. Crap like RSS and microformats can come along and wipe the floor with RDF in terms of deployments. RDF is not getting adopted en masse and RDF is as old as XML as makes no difference. Something’s fundamentally wrong.
And invoking the benefits of the open world assumption won’t change these facts. RDF is not getting adopted en masse. Something’s fundamentally wrong.
Actually, it’s not the complexity of *understanding* the SemWeb that I think will kill it.
Although I do agree it’s pretty damned hard to understand. Ultimately I think the SemWeb people have a point when they say “relational database theory is hard to understand too”.
The complexity of understanding *how* to do the SemWeb can be hidden by technology and practices. A good triple-manipulation tool, and some easily understood conventions (like the first-three “normal forms” in database design) can make that part fairly straightforward.
As a comparison, how different is doing the data-model for a SemWeb application and doing a UML model for some object oriented system? (Aside : I wonder whether anyone considers adapting Poseidon etc. for SemWeb modelling.)
What I think is the more serious SemWeb problem is the complexity of understanding *what* to do. Actually doing the data-modelling itself.
The SemWeb only makes sense when we consider using it for data-modelling “in the abstract” ie. independently of any particular application. Built into the ideal of the SemWeb is that you will sit down with all the people interested in your application area and thrash-out the appropriate data-model between you. (Remember Shirky’s comment about politics masquerading as technology.) And, of course, a general model for all applications (including future ones yet to be written) is much harder than a model for your more-or-less-understood application here-and-now.
Now, of course I know that, you *can* use RDF for application-specific modelling, by yourself, with no standards-body involved, but then it offers no practical advantage over just using plain-old-XML for your file-format. (Try and name a practial advantage and you’ll notice that they all assume someone else is sharing the same data-model as you.)
This is why it’s not getting adopted en masse. The majority of people are building data-models for *their* applications. They have zero incentive to do any *extra* work required by the SemWeb. They don’t need it. (This is what’s different from the RDBMS case where data-modelling *is* focused on the application itself.)
As there’s no practical advantage to using the SemWeb for its own sake, tools would have to be substantially better and easier than your existing ones to persuade you to switch to using them.
Phil, thanks for the comment! You’re right, modelling with RDF and OWL isn’t all that much harder than with UML for OOP, and UML is indeed often used by semantic web types in documentation and tutorials, although I don’t know if any UML editor emits OWL.
But that was not what I meant. I, for example, do understand RDF just fine. But I find working with it still really hard. And I think that’s because RDF has plenty of weird idiosyncracies that make simple data integration problems way harder than necessary.
Your point about shared or private data models doesn’t convince me. RDF should make sense whenever you want to share your data. And there’s plenty of evidence that lots of folks want to do that. Just have a look at REST, XML-RPC and all the Web 2.0 APIs. Publishing data with RDF doesn’t mean you have to sit down with anybody to work out a shared model. Just make up your own, reusing existing vocabulary where possible. If your data is interesting then folks will use it. You don’t have to buy into someone else’s data model to consume some RDF. So let’s talk about data, not data models.
Your argument sounds very much like someone dissing the nascent WWW in 1993 by saying: “The majority of people are writing documents for their use. They have zero incentive to do any extra work required by the WWW. They don’t need it.” Which, of course, was a reasonable thing to say in 1993, but was spectacularly wrong with hindsight.
(Possibly you refer to the “upper case” Semantic Web involving ontology engineering and reasoning. There’s a whole bunch of reasons why this doesn’t take off on a large scale, and I don’t care at all. My post was about plain RDF.)
OK, I half accept your criticism.
If there was literally no difference in using RDF vs. using Plain Old XML then I’d expect chosing one or the other to be pretty much random for internal projects. So, I agree, it is the extra difficulty of using RDF over POX in practice which holds it back.
I think the difficulty in RDF comes from three sources :
<fribble>kdfdksh</fribble>
<boonk>2</boonk>
”
which took me all of about 30 seconds to think up. I do have to think a little bit more to use RDF. So this is another difficulty.
Good tools and documentation for RDF would solve some of these difficulties. But it’s really hard to see how they could solve them entirely. I’d say all are at least flavoured by the upper-case SemWeb culture.
I’m not sure I see the difference. I suspect what you mean by data, not data-models is thinking of your data as little meaninful atoms. “Hey, this string is marked up as a ‘dc:name’ tag, I know what that is” without needing to worry about the shape of the container it occured in.
If so, I couldn’t disagree more. My major scepticism of the SemWeb project is that it suggests you can apply meanings to the individual atoms outside the context of their containers.
Tosh! :-p
No one would say that about the nascent web, because it was solving an existing problem : how physicists could share documents. People nearly always write documents for other people to read. And physicists already had an existing need to communicate with each other.
Secondly, how much expectation was there for it to “take off” in other application fields or become a mainstream phenomenon?
The SemWeb is playing for different stakes in a different environment. Its advocates claim it will create a revolution as profound (if not more so) than the original web. And it has a direct rival in the form of “naive” / “funky” / POXy data-formats which seem to be more succesful in creating that revolution.
Good points about why RDF is difficult. It’s true that the grandiose promises from the “upper case” SemWeb folks can distract from the more practical uses for RDF.
I don’t see why it shouldn’t be possible to treat many (not all) pieces of data as isolated atoms (or triples, as we would say). We identify things with URIs, and that’s a expensive in terms of complexity, but it buys us some cool things. The ability to just dangle any new property off an existing URI is one of them.
(I would agree that the URI-as-identifier approach in RDF is half-baked, and you need an awful lot of experience or best practices to make it work properly.)
I have put 30 minute introductory video online. It could be shortened quite a lot. What will really make the difference is when someone opens up a valuable database to Sparql. That will immediately get a lot of people to play with and learn RDF.