No, it’s not simple

This started as a comment on Danny Ayers’ Crystal post, in which he defends RDF against Les Orchard’s punch:

Why don’t we have a Xanadu web run on Lisp serving up perfect, crystalline RDF?

Danny:

RDF is pretty much at the point of the minimum structure needed to express data – the simple 3-part statement.

I somewhere have a hefty printout that vehemently disagrees with this assertion. It’s called “RDF Semantics.”

If all you want is the minimum structure needed to express data, then that’s about 45 pages of semantic conditions too much. True, I don’t have to know all of them to use RDF. But think of the implementers!

And don’t even get me started on RDF/XML.

RDF is in the Xanadu-Lisp-SGML-ISO/OSI camp. Let’s figure out which parts of it are fit for the purpose and which are not.

[Update:] Danny answers, and I continue the thread. But Bill de hÕra says it best:

“The model itself is conceptually simple”

Yeah, but if what you said was true, we’d be awash in RDF backed systems. We’re not. Crap like RSS and microformats can come along and wipe the floor with RDF in terms of deployments. RDF is not getting adopted en masse and RDF is as old as XML as makes no difference. Something’s fundamentally wrong.

And invoking the benefits of the open world assumption won’t change these facts. RDF is not getting adopted en masse. Something’s fundamentally wrong.

This entry was posted in General, Semantic Web. Bookmark the permalink.

5 Responses to No, it’s not simple

  1. phil jones says:

    Actually, it’s not the complexity of *understanding* the SemWeb that I think will kill it.

    Although I do agree it’s pretty damned hard to understand. Ultimately I think the SemWeb people have a point when they say “relational database theory is hard to understand too”.

    The complexity of understanding *how* to do the SemWeb can be hidden by technology and practices. A good triple-manipulation tool, and some easily understood conventions (like the first-three “normal forms” in database design) can make that part fairly straightforward.

    As a comparison, how different is doing the data-model for a SemWeb application and doing a UML model for some object oriented system? (Aside : I wonder whether anyone considers adapting Poseidon etc. for SemWeb modelling.)

    What I think is the more serious SemWeb problem is the complexity of understanding *what* to do. Actually doing the data-modelling itself.

    The SemWeb only makes sense when we consider using it for data-modelling “in the abstract” ie. independently of any particular application. Built into the ideal of the SemWeb is that you will sit down with all the people interested in your application area and thrash-out the appropriate data-model between you. (Remember Shirky’s comment about politics masquerading as technology.) And, of course, a general model for all applications (including future ones yet to be written) is much harder than a model for your more-or-less-understood application here-and-now.

    Now, of course I know that, you *can* use RDF for application-specific modelling, by yourself, with no standards-body involved, but then it offers no practical advantage over just using plain-old-XML for your file-format. (Try and name a practial advantage and you’ll notice that they all assume someone else is sharing the same data-model as you.)

    This is why it’s not getting adopted en masse. The majority of people are building data-models for *their* applications. They have zero incentive to do any *extra* work required by the SemWeb. They don’t need it. (This is what’s different from the RDBMS case where data-modelling *is* focused on the application itself.)

    As there’s no practical advantage to using the SemWeb for its own sake, tools would have to be substantially better and easier than your existing ones to persuade you to switch to using them.

  2. Phil, thanks for the comment! You’re right, modelling with RDF and OWL isn’t all that much harder than with UML for OOP, and UML is indeed often used by semantic web types in documentation and tutorials, although I don’t know if any UML editor emits OWL.

    But that was not what I meant. I, for example, do understand RDF just fine. But I find working with it still really hard. And I think that’s because RDF has plenty of weird idiosyncracies that make simple data integration problems way harder than necessary.

    Your point about shared or private data models doesn’t convince me. RDF should make sense whenever you want to share your data. And there’s plenty of evidence that lots of folks want to do that. Just have a look at REST, XML-RPC and all the Web 2.0 APIs. Publishing data with RDF doesn’t mean you have to sit down with anybody to work out a shared model. Just make up your own, reusing existing vocabulary where possible. If your data is interesting then folks will use it. You don’t have to buy into someone else’s data model to consume some RDF. So let’s talk about data, not data models.

    Your argument sounds very much like someone dissing the nascent WWW in 1993 by saying: “The majority of people are writing documents for their use. They have zero incentive to do any extra work required by the WWW. They don’t need it.” Which, of course, was a reasonable thing to say in 1993, but was spectacularly wrong with hindsight.

    (Possibly you refer to the “upper case” Semantic Web involving ontology engineering and reasoning. There’s a whole bunch of reasons why this doesn’t take off on a large scale, and I don’t care at all. My post was about plain RDF.)

  3. phil jones says:

    OK, I half accept your criticism.

    If there was literally no difference in using RDF vs. using Plain Old XML then I’d expect chosing one or the other to be pretty much random for internal projects. So, I agree, it is the extra difficulty of using RDF over POX in practice which holds it back.

    I think the difficulty in RDF comes from three sources :

  4. confusion with the aims of upper-case SemWeb, means that it’s hard to find documentation that doesn’t throw you into the deep-end. (Maybe someone should produce a “how to hack RDF in 10 minutes without thinking about what it really means” tutorial.)
  5. but one of the intentions of the SemWeb is to make you do more thinking about your “really means” up-front. I don’t see how to avoid this. I can’t just say “hey, my data contains a fribble and a boonk so my data-model will be
  6. <fribble>kdfdksh</fribble>
    <boonk>2</boonk>

    which took me all of about 30 seconds to think up. I do have to think a little bit more to use RDF. So this is another difficulty.

  7. Re-using existing vocabulary where possible means going and researching what that existing vocabulary is. This is time consuming, requires me to research and read the documentation (which might be non-existant, too technical or badly written) for the other vocabulary and make an evaluation whether it is suitable for my application, decide how to represent my application’s data using it (including translating my terminology into that of the pre-existing vocabulary) etc.
  8. Good tools and documentation for RDF would solve some of these difficulties. But it’s really hard to see how they could solve them entirely. I’d say all are at least flavoured by the upper-case SemWeb culture.

    You don’t have to buy into someone else’s data model to consume some RDF. So let’s talk about data, not data models.

    I’m not sure I see the difference. I suspect what you mean by data, not data-models is thinking of your data as little meaninful atoms. “Hey, this string is marked up as a ‘dc:name’ tag, I know what that is” without needing to worry about the shape of the container it occured in.

    If so, I couldn’t disagree more. My major scepticism of the SemWeb project is that it suggests you can apply meanings to the individual atoms outside the context of their containers.

    Your argument sounds very much like someone dissing the nascent WWW in 1993 by saying: “The majority of people are writing documents for their use. They have zero incentive to do any extra work required by the WWW. They don’t need it.” Which, of course, was a reasonable thing to say in 1993, but was spectacularly wrong with hindsight.

    Tosh! :-p

    No one would say that about the nascent web, because it was solving an existing problem : how physicists could share documents. People nearly always write documents for other people to read. And physicists already had an existing need to communicate with each other.

    Secondly, how much expectation was there for it to “take off” in other application fields or become a mainstream phenomenon?

    The SemWeb is playing for different stakes in a different environment. Its advocates claim it will create a revolution as profound (if not more so) than the original web. And it has a direct rival in the form of “naive” / “funky” / POXy data-formats which seem to be more succesful in creating that revolution.