Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Categories

Archives

License

Creative Commons License

Archive for the 'Conference Reports' Category

Lessons for Ontology Writers

I’m in a session called Taming the Open World, being run by Tim Swanson of Semantic Arts. I’m particularly interested in understanding how we can develop open world applications and issue 2 of Nodalities contains an article by Nadeem Shabiron just this issue. Since I have power and wifi which are both in scarce supply, I thought I’d take the opportunity to liveblog the session. Looks like it’s going to be a contrast to the Metaweb/Freebase tutorial earlier since we’re straight into OWL ontologies, running through the notation he’s going to be using. There are no standard notations for diagramming ontologies which is pretty surprising considering the wealth of research activity in this area. It’s looking rather interesting… must see if I can find some examples on the web.

First example is of two classes: Contractor and Employee and a single instance “Joe” who is an Employee. A reasoner will actually create two interpretations of these three facts: one where Joe is an Employee but not a Contractor and one where Joe is both. As more assertions are added, exponentially more interpretations are generated by the reasoner for all possible combinations. The reasoner is looking at all possible solutions and will assume that any fact is potentially true unless it has been explicitly told otherwise. This is the open world assumption - anything unknown could be true or false and a reasoner has to consider both possibilities.

A fact is provable if it is true in every possible interpretation. It is satisfiable if it is true in at least one model. These are the two main uses of a reasoner: to prove a statement or to discover if a statement is possible. However the huge number of possible interpretations massively complicates the problem. To make reasoning problems tractable we have to clean up the open world by removing facts that cannot possibly be true.

Some techiniques that can be used when writing ontologies:

  • Class disjointness - saying that two classes have no members in common such as a Living Thing and a Scheduled Event. This is especially useful with deep ontologies where the root classes are declared disjoint. This disjointness then cascades down the ontology tree to the more specific classes at the bottom eliminating many possible interpretations.
  • Domain and range - this makes the disjointness more effective by adding more information about instance types.
  • Individual differentness - OWL provides a differentFrom predictate but you have to say every individual is different from every other one by one. Some of this can be inferred using functional properties, so if two individuals have different values for a functional property then they can be inferred to be different individuals. Also we can use inverse functional properties but this is not possible with datatype properties, e.g. social security numbers. A workaround is to create a URI scheme for the value.

Some more advanced techniques include stating that an individual does not have a particular property. To do this you have to create a class for the individual resource and define that class as the complement of things that have the property in question. You have to do that for every individual, a massive explosion of triples, but a corresponding reduction in possible interpretations.

In the discussion after the session a few reasoner implementors were discussing some of these ideas. I learnt that a tableau reasoner will take all the URIs in a graph and combine them all to create all possible triples and then start eliminating them using the OWL constraints! I wonder what implications that has for Linked Data’s assignment of URIs to everything?

Working in the open world enforces a different kind of discipline in data modelling. You need to define what is not true as well as what is true. It’s best to work at the highest level possible which ends up being a supporting case for upper ontologies.

Excited

Here I am in the self-proclaimed “Capital of Silicon Valley”, surrounded by corporate folks, VCs and geeks of various persuasions, discussing the Semantic Web. Yesterday Ivan Herman of the W3C did an intro talk on the State of the Semantic Web. Good concrete material, but I couldn’t help thinking he could have saved a lot of effort by pointing to the audience. Clear visible evidence that the Web model of indepedent innovation is alive and kicking. There’s a good selection of people that were early creators/drivers of this tech along with current implementers and seriously interested parties. Noses meet food trough.

On a personal level it’s great. Plenty of friends of old, quite a lot of colleagues in this big adventure I’ve never met in person - golly gosh: Uche Ogbuji, David Booth, John Breslin, Paul Gearon, Jim Hendler, Pat Hayes… (Hmm, Pat claimed only his mother calls him Patrick, but Google does too). Every single one of them is giving positive messages on how things are going webwise.

It’s really nice being here with a team - Talisians Ian, Paul and Ceri have been doing a grand job of keeping me out of trouble on message. My commitments have been mounting per hour, though fortunately my ego has now swelled enough that it can divide, split and become an entity in its own right (blogging gets you known kids!) so I’ll let that do the chatting while I go look for a restaurant that does refried beans.

There is a lot I should be reporting here, but that’ll have to wait - I’m diving into the scariest session this afternoon, on Common Logic. Or maybe I’ll get a cab to the garlic theme park. California rocks. Literally.

XTech 2008 - The Web on the Move

I’ve just been to XTech 2008 in Dublin, a great conference covering web development, open source, Web 2.0 and open standards.

One of the main themes that emerged was that the semantic web is already here if you know where to look - it’s appearing in parallel with the existing web, with RDFa and microformats like hCard and XFN embedded in pages. Ontologies like SIOC and FOAF are enabling semantically-rich data to be moved from one system to another, and standards like OpenId and OAuth are making it possible to provide secure access to data across api boundaries.

There are new tools that help visualise and navigate the parallel web that increasingly exists alongside older document-based data, like the Tabulator plugin for Firefox that displays RDF hidden in pages, and Google’s Social Graph api that make it easier to navigate linked data programmatically.

In terms of existing content sites, data can be immediately made more open by redesigning the site to have consistent uris and embedded RDFa (London Gazette), or in a more dynamic site like the Guardian Online, pages can be constructed automatically to pull in disparate sources of data (sports results, 3rd party content) and then referenced as persistent uris.

Data can increasingly be mixed and enriched - one interesting project used socially authored content to augment content on the BBC archive site (Yahoo for term extraction, Wikipedia for providing further information on those related topics, and then DBpedia for disambiguation - by examining each of several potential matching pages for one that contains additional terms from the original page, to confirm the context).

Other types of site are appearing like FireEagle, which are not really “sites” in the conventional sense. FireEagle is a location broker - somewhere that you can maintain and update a record of your location (determined by mobile mast locations, by wifi access point, or by setting it programmatically), which can then be used by any number of location-based services. OAuth underpins this, as a mechanism for allowing apis to be given permissions to query each other. One interesting point was that OAuth may lead to difficulties with building a strictly RESTful site - for example, a site may want uris that represent the location of a specific user, which would typically include the users’ id as part of the uri. However, using OAuth means that there may only be a token that represents the user for a given session, rather than a user name, so forming uris based on that token may not be appropriate.

One of the most exciting demonstrations of how new applications can be built from reworking data with semantic techniques was Andrew Walkingshaw’s session on Golem and CrystalEye, in which he showed how you can apply an ontology to existing data, then mine that data for relationships and re-visualise it as geographical distributions or an academic social graph.

Overall, a very impressive conference, with plenty of food for thought, and plenty of examples of how the semantic web is here around us already.

Talking about WWW2008 on the BBC

The Talis contingent (sans Tom Heath, who is now wandering around China for a couple of weeks) returned to the UK from Beijing yesterday, and today begins the long job of documenting and following up on connections made during last week’s WWW2008 conference.

A few of my own impressions were captured on Friday in a conversation with the BBC World Service for their weekly programme, Digital Planet, which went out on air in various places around the world today. Have a listen to the podcast version, which I was pleasantly surprised to note adds up to about half of our recorded conversation in a number of segments during the show.

Beijing was well worth the trip, and we received plenty of validation for what we’re doing, as well as a lot of new ideas and contacts to help move things forward. More on some of that later, no doubt.

We took along some copies of the first issue of Nodalities Magazine, and they seemed well received; it’s been great to see the names of people I handed a copy to appearing on the subscription list for subsequent issues.

Anyone who’s watching our ‘Talisians‘ aggregator will have seen the flood of Talis-tagged photographs from the event and our trips over the weekend, as well as initial blog posts from myself, Nadeem Shabir and Rob Styles. Chris was clearly too busy talking to people to blog, and Tom was a victim of the Chinese firewall, but will no doubt have plenty to say upon his return. I’ve also been blogging on ZDNet, and have several almost-finished posts to publish there over the next couple of days.

Alongside the Linked Data on the Web workshop on Tuesday and Sir Tim Berners-Lee’s keynote in Beijing’s Great Hall of the People, the Programme highlights for me were undoubtedly dipping in and out of the ‘Web in China‘ track, Carsten Ullrich’s paper on Friday and (although I moderated it, and am therefore biased) the Commercialising the Semantic Web panel in the final session. I must admit, though, to being vaguely disappointed that more papers didn’t leap out as (to me) excellent or noteworthy.

Corridor, bar and restaurant conversation did their usual job of enriching the good sessions yet further… and shedding light on topics and people that maybe didn’t come across as well as they should have in the formal proceedings… Beijing had the advantage - outside the conference hotels, at least - of cheap, plentiful food and beer to grease the wheels of conversation.

A Semantic Multimedia Web - Create, Annotate, Present and Share your Media

I attended the second half of Raphael Troncy and Lynda Hardman’s excellent tutorial on the importance of using metadata to semantically describe multimedia content. The technique they presented attaches metadata to the multimedia content it describes and thus enable users to reuse and repurpose this previously published content for their own needs.

 

Raphael introduced the audience to the Core Ontology for Multimedia (COMM), which they have been using to annotate different types of multimedia. Both Raphael and Lynda stressed the importance of linking these annotations together with other data sets which can only be achieved by using Resource URI’s rather than Literals wherever it is possible to. This certainly resonates with the research that Rob, Danny and myself documented in our Semantic MARC Paper, where we came to the same conclusion.

Raphael and Lynda also introduced the audience to the K-Space Annotation Tool which has been implemented to allow people to easily annotate multimedia content. Once multimedia is annotated you can build some impressive applications that can navigate this graph of linked data.

Web 3G: The Third Generation of the Web

I’m at the BlogTalk conference in Cork where I’m meeting an eclectic mix of bloggers, technologists and “Interesting People” gathering to share a common interest in the social web. There’s also a good representation from the Semantic Web folks including a group from DERI Galway.

Paul gave a talk on the potential of the web of relationships, alluding to the possibilities we’re seeing the more things become connected. It’s not just about connecting pages together with hyperlinks but using Semantic Web technologies we can also connect people with the things they produce, need and use. Tomorrow Nova Spivack is giving a talk on semantic social software, hopefully giving us a new view of his company’s application Twine.

Twine, and our own Talis Engage are the first in a new breed of applications founded on Semantic Web technologies that expose large parts of their data for reuse by other similar applications. We were discussing all this over dinner tonight and I suggested that a good label for this would be Web 3G since these applications were part of what we were calling the third generation of the Web.

Web 3G is what happens when you fuse the social participation of Web 2.0 with the decentralized structured information of the Semantic Web. The result is a smarter way of organising information in a network of interwoven semantic links and content, enhanced with feedback from usage and participation. We’re coming up to the end of two decades of the Web, the first of which was spent seeding the bare essentials of the web of documents. The second decade saw widespread broadband adoption enable mass participation and creation of content by millions. The next decade is going to radically change how we find, create, use and relate to that information.

Three generations of the Web

The Web right now is built from the generic hyperlink, which says nothing more than “look over here”. But even this weak semantic was enough to enable Google’s Pagerank to organise and score the Web. Imagine how much more powerful the hyperlink could be if it were possible to express sentiment or meaning in the link. Even if that were limited to positive or negative endorsement of the target of the link, the value to the relevance ranking of search engines and applications would be huge. However, the possibilities for expressing the intention of a link between two pages are endless. For example, it could be possible for writers to say whether they support or reject the views expressed in the target of the link, or whether they are linking to conflicting evidence or alternative versions of the same information. These simple expressions of intention could provide an entirely new dimension of metadata. The links between things are fundamental to the existence of the Web and the value of understanding why things are related is huge.

Web 3G is an evolution of Web 2.0 enhancing it through the appropriate use of light semantics. Links between things become more clearly typed, embedded data on pages becomes more easily understood by machines, all the while retaining the ability for people to connect and link and critique the quality and relevance of the data. It becomes the semantic graph, open to participation by everyone without having to ask anyone’s permission. It is not Artificial Intelligence, there are no formal ontologies or logic reasoning, but some of the tools and techniques of AI are needed: neural networks, classifiers, heuristics, Bayesian networks and statistical analysis.

A whole new generation of applications are emerging that feature huge levels of interconnections and we hope to enable many of those to be built using the Talis Platform. Many of these connections will be internal to the application but by exposing raw data, in the ways suggested by the Linking Open Data project, every application can link to and reuse information managed by every other application. This is a step beyond data portability: rather than copying data from one application to another the norm will be to reuse data in situ. That way the data never gets out of date because it’s shared and we can use the best application to manage each piece of our data, depending on our situation. This is what Tim Berners-Lee meant by the Giant Global Graph: a world-wide network of links with meaning.

I like this generational view of the evolution of the Web. It makes it clear that there is no big bang switchover from one type of application to another. Even now we can see many Web applications being created and used that aren’t socially enabled, but they look hollow when compared to their Web 2.0 peers. The is likely to be true of the third decade, where we’ll see new applications being created that can’t talk to their peers and they too will feel shallow and unexciting when compared to their Web 3G counterparts. This isn’t an increment to Web 2.0, it’s a radical step forward!

Cindy Ché and other interesting people

Last Friday a few of us went down to HP Labs offices in Bristol for a great free event hosted by Andy Seaborne - including free lunch. Free food always seems to make a difference, is that just me?

Nadeem’s blogged some of the sessions in detail over at Virtual Chaos and Ian Davis over at Internet Alchemy gives his own perspective. Other people covering off much of the detail is one of the great benefits of leaving the blogging for a few days after the event ;-)

A few of us are down at HP Labs in Bristol. Andy Seaborne is hosting a great free event for those interested in Semantic Web developments in the UK.

After Andy’s welcome, Ian opened the day with a presentation on where we’ve got to with the Talis Platform - over the past 3 years we’ve come a very long way, as you can from Ian’s Slides and Nadeem’s Summary. Our platform is an example of PaaS (Platform as a Service) - that is, we hope to do the heavy lifting of managing large volumes of data, indexing it, making sure it’s backed up and so on so you can concentrate on building applications. That’s a message that seemed to go down really well with lots of people grabbing us for more information during breaks and lunch.

For the rest of the day there were a good handful of very interesting sessions from a whole host of people trying to do real, practical things with semantic web technologies.

There were a few things that seemed to stand out as threads through the day - a lot of people using Jena, Redland got a couple of mentions, but mostly it was Jena. I had a great chat with Chris Dollin and it’s obvious that they take great pride in Jena, not only in the codebase and what it can do but also in the developer and user community that has formed around it. There was also a lot of interest in ontologies with people focussing on the use of ontologies to assist in user-interactions and various people mapping overlapping ontologies to allow semantic relationship to be recognised between disparate datasets.

In essence this was about people starting to do very real things, a point emphasised by Alberto Reggiori of @semantics when in one slide he announced that RDF is dead, only to have it resurrected 3 days later, complete with a slide featuring the risen Christ - award for best laugh of the day goes to Alberto. To hear more from Alberto, listen to the podcast he just did with us.

Most worthy project of the day has to go Health-e-Child, a project that is helping paediatric medical research by providing federated search services across medical data at several participating European hospitals. The hospitals have to keep their own data, due to confidentiality concerns and this data is in any number of different schemas with varying vocabularies. Ontologies feature heavily in what Peter Bloodsworth has been doing and it will be interested to see how this project progresses. It’s great to hear more from him in his podcast with us.

The back-channel chat on IRC (#swig on irc.freenode.net) was, as usual, a light-hearted and useful tool, with people sharing the links from the presentations in almost real-time. It even resulted in Sir Tim pitching in with a correction for Ordnance Survey’s site:

14:40:24 <timbl> ooops http://www.ordnancesurvey.co.uk/oswebsite/ontology/SpatialRelations.owl is Content-Type: application/octet-stream

IMG_0621.JPG

Catherine Dolbear did a great job of describing the ways in which OS are playing with small RDF datasets. With Ordnance Survey’s current business model (which she stated was unlikely to change without Government changing it for them) the data is their crown jewels, so you don’t get to play with new technologies with; especially when they’d be talking about more than ten billion triples. They have published some ontologies, see timbl quote above, but unfortunately these have been released under cc-nc-sa license, making it hard for them to be widely adopted. In questions later she told me that wasn’t something they would change. Unwittingly, Catherine also provide us with the great slide on the right.

The point of the slide was to indicate the complexity of some of the queries that geo-data requires to be useful, things like “inside” - it just made me laugh inside that “Every island is a kind of land that is surrounded by water” constituted a complex statement. That little laugh, of course, belies much of the problem we have developing the semantic web - stating the bleeding obvious in ways that are complete and unambiguous.

As a nice counter-point to Catherine’s presentation, Richard Cyganiak presented on Sindice (or Cindy Ché as it came up on the #swig back-channel). Sindice is pulling in data from Linked Data sources such as dbpedia, geonames and everyone’s foaf files and indexing them in a semantic search engine offering. What makes the nice contrast with Catherine’s presentation is the scale, 20M+ documents, 80M+ URIs, 4M+ IFPs, 2B+ triples - that’s 2 billion triples… indexing is using Solr, and there’s some hadoop in there for parallel data processing.

It’s great to see the UK semweb community thriving like this. Get-togethers are so important in allowing people the time to do show ‘n’ tell to their friends and peers. Perhaps we should organise another one soon, making sure to find a good caterer for lunch of course.

Web 2.0 Summit - reflections after a trans-Atlantic flight and a day off!

1703945188 812Dbde53D M

Last week’s Web 2.0 Summit in San Francisco was pretty intense, all things considered. It’s therefore lucky that this week is the Half Term school holiday in this particular corner of the UK, and peppered with days off to do various non-work things.

During the conference (sorry, ’summit’) I managed to live-blog most of the sessions I attended, and the corpus can be found here. O’Reilly/CMP are also doing a great job of getting session videos up.

Now I’ve had time to reflect without the need to type and listen and keep an eye on the office, what were the trends and highlights for me?

I noticed two big switches since 2005 when I last attended this particular gathering. Firstly, although I didn’t see much evidence of a credible alternative, there was far less of an assumption that Google AdWords were the business model of choice. And secondly the lobby conversations just seemed much less desperate than last time, when everyone and everything was frenetically for sale.

The iPhone was everywhere. I saw lots of people using Apple’s latest, but don’t think I saw anyone actually talking into the thing, which means that Nokia’s phone-less (?) alternative may do well. We get iPhones in the UK in a couple of weeks, and Talis will be raffling one at our conference the week before that launch. Something tells me that my chances of winning that iPhone are about as high as those for Nokia to send me an N810.

There seemed less of an emphasis upon scheduled evening entertainment than previously. Richard MacManus comments on this, too. From my perspective it was a good thing, as it made my packed schedule of dinner engagements (and a trip to a real San Francisco home) so much easier to manage. In many ways, these (including one with Mr MacManus) were the highlight of the trip.

The main auditorium was a truly unpleasant place to spend time; way too crowded. The overflow room upstairs was a far better bet, complete with comfy sofas, power, wifi (which you could also get downstairs, if your battery was up to the job), and easy access to food and drink. It would have been nice to be able to ask questions with a video link to the sweatshopauditorium downstairs that was bi-directional, though. A second display showing the whole stage would also have been good. The main monitor kept zooming in to provide detail on faces/slides etc; it wasn’t always focussed on the thing I considered important.

So what about the meat?

Well, in case you hadn’t noticed, Facebook is going to be big. I don’t just mean suggestions that Zuckerberg may be ‘selling himself short‘ at a mere $15bn, or evidence that Facebook’s platform is delivering profit for third party developers. More than both of those, there was an underlying - often implicit - recognition that growth opportunities lie in pushing content and functionality off our individual websites and into the cloud. Although I’ve argued before that Facebook is a very long way from being open, it’s ‘Platform’ remains a compelling example of ways in which external content can be aggregated and consumed elsewhere. Imagine what would be possible in a more open ecosystem, an ecosystem of which Facebook could be a part? Were others (MySpace, anyone?) to seed such an ecosystem whilst Facebook remained off to one side, would the rate of fall in Facebook numbers equal or exceed their recent growth?

‘Semantic’ has arrived; the Metaweb/ Radar Networks/ Powerset pow wow with Tim O’Reilly (pictured) on the final afternoon was great, and was just beginning to go places when they ran out of time. More debate and analysis would have been nice, with (a lot) less demo. This was followed up by John Doerr recognising the whole space as a compelling investment opportunity, echoing trends that Brad Feld highlighted in his recent podcast with me. I found Danny Hillis’ explicit distancing of himself from the Semantic Web odd (Shelley just found it funny…); I’ll admit that I’ve done a little of the same, but more to demonstrate that there is plenty that the Semantic Web’s building blocks (RDF, GRDDL, etc) can do right now, without needing to await the arrival of The Semantic Web. We do need to find better ways to describe this space, though; ‘Web 3.0′ can be unnecessarily confrontational/epochal, and ‘Semantic Web’ carries way too much baggage…

Jonathan Zittrain had some interesting things to say, and they’re not nearly as contrarian as they might at first have appeared.

Mary Meeker was good value, as always… although impossible to blog! I was surprised by the lack of reaction to her figures illustrating the fall in US growth, relative to competitors to the east.

The Launch Pad, that gathering of exemplary startups, was hugely disappointing. I can’t believe that was the cream of the crop.

Gene sequencing needs to be watched… very closely.

Real people don’t think (quite) like geeks and venture capitalists! Craigslist, rejoice…

(Almost) everyone had a Platform, with some more black hole sucking-ish than others. It does appear, all too often, that the web is actually becoming less open than it has been of late. All these Platforms are sucking data and users and developers to themselves, and letting very little flow back out. It certainly fulfils short-term goals around eyeballs, advertisers, and the like. But it’s bad for the web and, in the long term, it’s got to be bad for (most of?) the guilty.

(Almost) everyone was recognising the power of intention/attention, and seeking ways to implicitly or explicitly harness both. Social and semantic graphs have something to say, here.

Photograph © James Duncan Davidson/O’Reilly Media

Technorati Tags: , ,

Web 2.0 Summit - John Doerr

John Doerr

John Doerr is up for the last session of the conference (except the drinks).

John B - “when you invested in Google, did you have any idea of where this would go?”

John D - “No”

John B - “do you worry that the company is trying to do too many things?”

John D - “No.” Google is about ads, and about applications. [what, no search?] 70/10/20 - ads, search, applications. Over 100million downloads of Google toolbar.

John B - “what do you worry about?”

John D - “keeping the quality of the culture” as the company grows rapidly.

John B - “why didn’t you invest in Facebook?”

John D - loyalty. We’d backed Friendster, and we don’t back competitors. Friendster is the number 13 web site by traffic, and it’s growing. Number 3 in China.

Talking about Green issues… and the scale of the problem. Can’t wait to see what innovative disruptors in the room do about it.

Technorati Tags: , ,

Web 2.0 Summit - the Semantic Edge

Sw-Rubik

And now, the panel I’ve been waiting for. Tim O’Reilly takes to the stage with three early examples of real companies betting the farm on semantic technologies. Danny Hillis of Metaweb [listen to the podcast with his Minister of Information, with whom Ian and I had a great dinner last night], Barney Pell of Powerset and Nova Spivack of Radar Networks [who also did a podcast with us, is a member of our Advisory Group, and launched twine last night…]

Tim - “Web 2.0 is all about collective intelligence. The Semantic Web is all about collective intelligence too”.

“There’s something really interesting cooking - platforms for building intelligent applications.”

Before discussion, demo…

Freebase-Venturespin-1

Visualising the data in Freebase… “An extreme example of opening up access to data”.

“Because this is an application based on this open database [Freebase], if I find a piece of information that’s missing, then I can add that information”. “All applications that use Freebase are smarter because of that one correction”.

“Freebase is not just about people; it has geographical stuff, media stuff, sports…”

Freebase offers a free api so that people can use this data. Eg open source code on Google Code that allows any text box on any website to look up Freebase for text completion…

Powerset-Splash

Powerset building a natural language search. Reads every document, page by page, and builds a more powerful search. eg “Politicians who were killed by disease” - document may not mention a person as a politician, and may not say ‘disease’; look-ups into (their copy of?) Freebase etc able to work out that ‘Edward Heath’ was a politician, and ‘pneumonia’ was a disease.

1394320797 E3C13E4Fd3

Similar example here on Flickr… uploaded by Powerset.

Twine-Screen-1

And now Nova is up to show twine. Information is out of control. Information overload. Things are not connected. Collaboration is more complex.

There must be a better way, and there is; twine. For the end-user side of the Semantic Web. Lets you share, organise, and find information.

Now showing twine…

As the user uses twine, it builds a profile of you and your interests; identifying people, places, etc. Created by you, and by your friends and peers.

A twine is a place. Everyone gets their own private twine. You can also make twines for groups, teams, networks.

Semantic graph powers twine; the social graph is people and relationships. The semantic graph is everything.

Create a note. Twine reads it, and recognises people, places, organisations etc in the text… tagging the content with semantic tags automatically.

Twine provides simple browser bookmarklet to add objects into twine as you browse the web. Tries to extract structure from the page… Also capability to ‘twine’ emails.

“Databases don’t have to be in one place” anymore.

When twine sees a url, it mines and crawls the site and infers which things are the most useful to recommend ; “a user-generated bottom-up crawl of the web”

Google is about organising the world’s information. twine is about organising your information.

Everything in twine is shareable and editable. All permissions-based.

Important concept in twine is using tags to search and find things. It’s not just a tag, it’s a semantic tag; linked to concepts.

We analyse properties of the semanti graph to find things that are most interesting for you, based upon your background, your connections, etc.

Peter Rip [Radar investor] doing his due diligence in twine.

Marketing team using twine to leverage the collective knowledge of the group.

Now questions from Tim. “Can we get our hands on this stuff today? What’s the real state of availability of semantic technologies today?”

Danny Hillis - “Don’t necessarily characterise our stuff as Semantic Web. The Semantic Web was a particular case of a way to try to do things a few years ago”

Tim - “semantics ties these things together; there’s meaning in this data. I’d always thought that was one of the things about Web 2.0 too… eg Pagerank - links having an additional level of meaning… Flickr ‘interestingness’, etc. The difference is that in Web 2.0 startups it’s tacit, and hidden and proprietary. What you’re doing is making this structure of meaning explicit, and portable, and usable by other applications; you’re all platform plays.”

Danny - “about connections between things”… leads to“ network effect in value”. Silos don’t make sense.

Tim - you’re all extracting entities. Do you all have to do it yourselves.

Nova - open standards, a la W3C. The semantic web as a concept means a certain set of open standards. There are semantic technologies that don’t necessarily support them. If you want an open network effect you have to support open standards.

Danny - we’re not all reinventing things. We’re not all re-extracting it.

Barney - Freebase is generally about people creating content. Powerset is about machines making implicit meaning already in text explicit.

Danny - Freebase is explicitly designed to be used through other applications. Barney’s demonstration, for example, was another application using Freebase data.

Nova - people can create everything, or machines can create everything. Or there’s something in the middle harnessing the wisdom of machines to the wisdom of crowds. That’s what’s interesting.

Nova - With the semantic web, the data becomes portable and connectable - the web of data. This notion of a database, and this notion of a Platform is changing. The web is the platform, the web is the database.

Danny - I’m saying WE are the Platform…

And it’s a wrap.

Tim - “I think there’s really something here”

Tongue-in-cheek reinterpretation of the new semantic web logo from W3C by Laurian Gridinoc

Technorati Tags: , , , , , ,