Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Author Archive

Getting Connected

|This post will feature in Nodalities Magazine, Issue 5

Web 2.0, social networking, cloud computing, SaaS, PaaS, Web 3.0, the Semantic Web, Smart Phones, 3G, wifi, convergence…. the list of buzzwords or memes  goes on—meme being the buzzword for buzzwords.

There is nothing new in a long list of industry buzzwords. However, I think this list is different. It comprises a set of ideas which are each huge and transformative in its own right. The fact that they are all happening more or less at once and are all interconnected should give us serious pause for thought.

Maybe they are better considered as symptoms of some deeper, more fundamental change. It is tempting to focus in on a single symptom and try to understand what that will mean for the future—perhaps even take a risk and build a new business around it. But to focus on a single aspect is to miss the bigger picture. The interaction of several different trends tends to produce serious game-changing disruption. In this climate, it is dangerous to become myopic.

Here is how I would describe the fundamental shift:

“Everything is getting connected.”

Obvious? Just to be sure, let me put it another way: EVERYTHING IS GETTING CONNECTED! And I mean everything. I don’t mean every blog, every piece of software, every web page, every database—those are just pieces of which software people think everything is made. I mean everything in the world outside the computer screen.

Since the birth of the computer we have begun to build open, generalised infrastructures. The PC is an open and generalised infrastructure for digitising, processing and materialising data. We use the keyboard to digitise text, a mouse to digitise a set of hand gestures, monitors and printers to turn the data back into physical reality; and software organises all of these processes. After all, software is nothing more than a set of instructions which affects data. But the key word here is generalised. We have built machines for thousands of years but they have tended to address specific needs. The PC is a generalised infrastructure for interacting with digital representations. We might use it to manage content such as pictures, music or video. We might use it to write a novel or a business plan. We might use it to organise a supply chain between people and organisations, track financial information, and assess and analyse inventories.

A generalised infrastructure can reduce or eliminate huge costs involved in getting a job done: factoring out some fixed costs and affecting the residual marginal costs of the project. Another way of saying this is that generalised, open infrastructures have huge spill-over effects. If I buy a computer equipped with MS Office in order to organise my personal accounts, my accounts have cost me maybe £1,000. But, of course, I can now word-process a business plan at a marginal rate (i.e. my time). I can also play a game, listen to music and  surf the web. That £1,000 actually buys me a generalised piece of infrastructure for a huge range of tasks and functions. I’ll leave further discussion of the economics of the spill-over effects created by generalised infrastructures for another time.

Due to the complex nature of these infrastructures, they work much better together when we can agree on some standards. MS Windows and Intel formed a de facto standard which allowed hardware and software to work well together. This partnership has factored out huge complexity by delivering a set of software instructions and processing power to an end user which enables them to manage their data and content. You may argue how much better the world would have been if this standard had been open rather than proprietary, but the point is that the use is generalised and part of the user’s infrastructure.

The internet was the birth of an open, generalised infrastructure for connecting computers. Following this, standards have made these networks work much better and the World Wide Web has provided a set of open standards which made the job of connecting human-readable documents much easier. So, the web has provided a generalised infrastructure for connecting documents.

Yet the web isn’t limited to connecting human-readable documents. Although it may have come to be thought of as an extension of the PC, it is actually a generalised infrastructure for connecting data: html, mp3, streaming video, xml, rdf—anything in fact. To date, it has mostly been used for html and media content but that is changing rapidly. Connected data is the next logical step and with that we must think of devices and standards.

Take a look at the list of buzzwords again. We are in the process of building a generalised infrastructure for connecting anything to everything. Wifi, 3G and bluetooth allow any electronic device to join the conversation Smart phones ensure the human being is always connected. Thinking about all the digital devices in your life, I expect most of them are currently disconnected. They have to solve all the problems themselves: user interaction achieved through some obscure buttons and a tiny display with odd symbols. They are conceived to be isolated.

But wouldn’t it be much better to program your central heating timer with a nice iPhone app that can react to the fact you have left the office? 10 years ago, it would have been impossibly expensive for a heating manufacturer to build a proprietary system allowing customers remotely to programme and adjust their heating. Now, the generalised infrastructure to connect anything is being built, and the huge fixed-cost barrier is being removed. Adding a wifi connection to the central heating controller and exposing the sensor and input data for third party control is already economically doable.

To illustrate this further, imagine how much more valuable it would be for a mass manufacturer to be a bit more connected with their customers. Why, for example, isn’t there a big red help button on my washing machine I can press to talk directly to customer support? The washing machine would know its own model number and any error codes it may be displaying and how to contact help. This morning, I was looking at my washing machine and wondering how to control the temperature. You have to select a specific programme, and each has a certain temperature; but it also boasts a separate temperature dial. Does this override the temperature of the programme? Does it add this temperature to the programme, and I’ll end up with washing soup? Why can’t I literally press a button on the washing machine and immediately ask someone that question and get an answer?

As a products company, that kind of intimate connection with the actual users of my products could be very valuable. For a start, after getting the same questions from many users, they would undoubtedly redesign the temperature control. If the system malfunctioned, the customer service person could give specific advice for an error code, saving time, complaints and dissatisfied customers. This kind of direct relationship has been impossible up to now. With a generalised infrastructure for connecting everything, however, it becomes practicable.

It would cost very little to put a wifi connection on a washing machine and a little Skype-like piece of software which also relays machine status data. I am sure the question is answered in the printed manual, but I dutifully lost that 5 minutes after opening the box. Further, I would not want to go through the hassle of finding the customer services number, finding the model number then reading all that out to the service staff when the machine itself should know all of that. The difference between the effort of simply pressing a button and having to find and relay all that information is functionally massive.

When you think about it, data and devices are everywhere. But they are not connected and they are dumb: they don’t know each other; they don’t know me; and many can’t even recognise themselves.

Almost by definition, there is vastly more localised personal data in the world than generally useful data. Wikipedia is generally useful, and it’s helpful to be able to access a postcode-to-location database. But when I think about the data that is really important to me, it is practically all localised to me.

I would dearly like to have a record of my blood pressure and heart rate from the past year. I would share it with my doctor and maybe my pharmacy. But as useful as it would be, there is no way I would spend the effort of taking my blood pressure and writing it down to put in a computer. However, I do take my blood pressure at home with a digital gauge. It is a device that knows something about me, but it isn’t connected. My blood pressure, the temperature of my house, the location of my children, miles before a service on my car, the error code on my washing machine, the channels I watch on television, all the local restaurants I have been to in the last year—these are all more valuable to me, my family or my friends than they are generally useful to the world. Just think of the number of people in the world times the personal data relevant to them. I would hazard a guess that it is vastly larger than the generic data in the world.

You can see this effect on Facebook, iff you look at most of what is “published”. It is easy to dismiss Facebook as just being full of useless rubbish. If you do, I expect you have fallen into the trap of thinking that just because something is “published” it is meant for you. In fact, most social network content is not a publication, but a conversation that happens to have been digitised. It is intended and meaningful only to a few. It is the same with data. Most data about what I, my family and friends are doing, the things we have, the places we go and the things we need are localised and personal. They are relevant to the few people, organisations and companies I interact with and I want to choose who gets to know what about me.

Up to this point, connecting the devices and data closest to us has been prohibitively expensive, and only a very few people have ever bothered to gather and use this kind of data in a useful way. As the costs of this technology are driven ever-further down, it is becoming increasingly feasible for anyone to have access to these bits and pieces of their lives in a form which can interact and benefit from the infrastructure we build with our devices, houses, cars and companies.

As EVERYTHING gets connected, it is time to get up close and personal.

Network Effects

Last week in my Utility Cloud Computing post, I promised to write a follow-up exploring network effects for platforms as a service (PaaS). This was sparked by an interesting exchange between Nick Carr and Tim O’reilly with commentary on the Smoothspan blog.
I think the important point to explore can be summarised as “software versus data” for the next big network effect driven opportunity.  Smoothspan argues for the software angle and Tim for the data angle.
For me this turns on a critical distinction between platforms for direct versus indirect network effects. (Take a few minutes to read http://en.wikipedia.org/wiki/Network_effect if you want a good introduction to network effects.)

The software industry is well aware of the defensibility of network effects; and no one can possible miss the incredible defensibility of the Windows empire. It is tempting to take a pattern that has succeeded before and copy it. But it is wise to consider whether certain critical factors may be different now. Microsoft built a platform business model that harnessed indirect network effects. One way to think of indirect network effect is value built by the availability of complimentors rather than by the activity of individual users. The more windows applications: the more reason for a user to choose windows, the more users, the more reason for developers to build windows-only applications. But my usage of MS Word does not affect yours.

A direct network effect arises when there is some form of shared system which means that one users activity directly effects the experience of another. Communications networks naturally create direct network effects, and the internet is the greatest communications network ever built. MS Windows, on the other hand, is not a shared system. No matter how many people are writing documents on MS Word, it doesn’t effect me in the least. This contrasts with people writing on Wikipedia—the most extreme counter example to MS Word. With MS Word, each user has an isolated copy of the same software. With Wikipedia, each user is sharing both the same software and content. Nearly the entire value of Wikipedia to me is the trail of activity other people have left on it. User activity on a shared system can effect other users because each user leaves a trail of new data as they interact with the existing data. Be that data a text message, a blog post, a web page, a video on youtube or a users clickstream on Google, it is data generated by user activity on a shared system which creates direct network effects. Web 2.0, collective intelligence and telecoms all create direct network effects based on data.

The Windows model worked because Windows was a single-homed platform. If I as a customer put windows on my computer, I could not easily also have OS/2 or Mac OS. That means that my choice of OS was heavily reliant on the world of applications that were available and going to be available on that platform. That was fair enough, it really was hard to have multiple operating systems, even down to having to learn different user interaction patterns.

MS understood that their indirect network effects hinged on the developer community because these were the complimentors.

But there are two huge differences now compared to when Microsoft started building its software ecosystem back in the day.

The first is open source. Software can be built in a modular fashion which makes it very easy to build more software through recombination of pieces (e.g. LAMP stack, apache etc). Open source software is a massive toolkit that through recombination just keeps growing. By taking a shared approach to solving software problems, it becomes easier to solve the next by drawing on the pool. This is a huge indirect network effect not limited by the friction of license cost. Because a relatively small number of people can solve general problems which everyone can then share in, any generally applicable software problem is very hard to make money from because it is getting easier and easier for the solution to be assembled from open source components. When you add utility cloud computing into the mix, it becomes easy for SaaS offerings to be attacked by open source also.

There is no general solution to the data problem. Your best guys in silicon valley can solve the problem of how to graph a data series easily, but they can’t create for me a graph of my personal calorie intake for the last week. That data is specific and localised. If  I want that specific graph, my problem is not the software required to generate a graph, it is the data problem of my calorie intake. Some data is general, but the vast majority of data is specific. This makes the data problem very different from the software problem.

The data is where the users are, the software is where the developers are.

Secondly the important thing to note is that PaaS  based on open standards are NOT single homed for the consumer of applications; especially with the rise of standards for interoperability like web services, REST, XML and RDF etc. If I use BaseCamp, it doesn’t make it any harder for me to also use SalesForce.com or EC2.

This fundamentally makes the API level lock in combined with a developer community model very unstable compared to the good old days of deployed software (e.g. Oracle, SAP, and similar models). In fact for SalesForce.com, I would say the greatest thing that draws people in is that their central business data is in the platform, rather than the wide choice of complimentors.

I don’t think software and developer communities provides the same kind of strong platform network effects today. Look around at the successful network effect players. It isn’t the developer community that has made them defensible (e.g. eBay) but the direct network effects between users. This is a very different game and one I think MS will have a very hard time really understanding. The dichotomy is that software is essential for data-centric direct network effects, but isn’t where the defensible value really is.

Putting the approaches another way:
Developers are where the software is.
Users are where the data is.

So with a direct network effect, it is the usage of the shared system by users which makes the system itself more valuable to each user through the trail of data they leave. Software, just like hardware, is required to enable data but it is the data which generate the value. You can see this exact effect with Wikipedia, Facebook, Skype, IM, Twitter and Google.

Let me expand on Google a bit. I don’t pretend to fully understand Google, maybe no one does (including Larry and Sergey). It is easy to see that the web is an open network with very strong direct and indirect network effects, but understanding how Google turns it into a profitable business is less so. Google harnesses the content and collective intelligence in the links to create relevance data but it also watches the activity of its users to derive further collective intelligence and improve the experience. The first is an open system which anyone with enough compute power and bandwidth could copy; the second is based on the large volume of users coming to Google and is closed. Only those who can generate that kind of search traffic can have access to that. So, those are two direct network effects: one based on an open network of users (people who create web content), the other on a closed network of users (people who search on Google).

Google’s great fortune was to keep (by luck or judgement) their software to themselves. Instead of selling search technology to power other search services and becoming a technology supplier, they used their best in class technology to take huge market share in search and created a global consumer BRAND. Let me say it again: global BRAND. There are no magic network effects that keep people glued to using Google for search. Maybe Google is a bit better than the others but I wouldn’t know because I haven’t tried them. It’s mass market: Want a cold drink? Coke please. Want to search the web? Google please. Google uses its infrastructure scale and its talent scale to keep wowing the public—to keep Google top of mind for amazing things like search. I switched on a 24hrs news channel the other day and noticed the map behind the presenters head had a big Google logo on it, it was Google Earth. You can’t buy that kind of brand visibility. The Google brand is everywhere. With—by far—the largest search traffic, Google is naturally where advertisers want presence because that is where the eye balls are. Yes, Google has used network effects again to maximise its advertising revenue; but that is only possible because of the huge search traffic its infrastructure scale, talent scale and BRAND give it.

So, Google creates huge value from network effects but doesn’t actually tie users in through a closed network. Google harnessed the value created by the open platform of the web, its defence is scale of infrastructure and talent in order to maintain itself as THE BRAND in the public mind for search on the web to drive ad sales. With scale, talent and brand built on an open platform—its key levers—it is no surprise that Google wants to play the same game by ensuring the mobile platform (Android) is open and the web platform stays open (Chrome). Apple, on the other hand wants to own a closed mobile platform. I can’t help but think Google will be the long-run winner.

Which do you need more: the community of complimentors or the users, their activity and data? But be under no illusion that the software game has changed for ever. SaaS & PaaS platforms are not naturally single-homed, and that was a crucial part of the old game. How are you going to create software-based network effects with out hurting your customers through lock in? And, how do you stop your community being outgrown by the open source ecosystem on open utility computing infrastructure?

My advice: it isn’t about the software anymore, its about the data.

Utility computing in the Cloud

It is usually more interesting and educational to see a good heart-felt debate than complete agreement so you are in for a treat if you take the time to read the following from Nick Carr, Tim O’Reilly and the Smoothspan blog.

You can see from the debate that economics is at the heart of the discussion yet not understood in the same way by the three. I find myself pretty much in agreement with Tim, but it might be worth pulling out some of the strands to clarify. I think there is real confusion between economies of scale, direct and indirect network effects.

In this post I will focus on the utility computing layer in the cloud. I think the economics of platform as a service (PAAS), especially the cruical distinction between direct versus indirect network effects for defensibility, needs its own post.  

It’s pretty clear that utility cloud computing is highly capital intensive so it should come as no surprise that there are powerful economies of scale to be had. But the bottom line is that you are talking about plant and power. These are rival goods, scarce resources that are created and consumed. This is not different from many utility industries with one exception: the distribution network has global reach, already exists and is very cheap compared to existing utility distribution networks. It is a lot cheaper to access a computing resource on the other side of the planet than it is to send electricity or gas across the globe. So maybe Hugh McLeod ) is right. What is to stop economies of scale turning this into a global natural monopoly?

Actually, unless there are some large network effects, quite a lot stops single companies ruling entire industries. For a start, without network effects, economies of scale tend to run out: the curve is usually U-shaped ( take a look at http://en.wikipedia.org/wiki/Economies_of_scale ). Telecoms, Gas, rail companies have strong network effects from their infrastructure—it makes little sense to have duplicate rail networks or gas networks in a country. Utility computing does not have this advantage because the distribution network is not owned by them.

Smoothspan argues there are two potential network effects that could cause a single winner.

1) Lower costs of data exchange between apps in the same cloud

2) Elasticity

There is a network effect based on increased costs for cross-cloud interoperability, exactly as we have with mobile phone networks today. I don’t think this is a significant, long-term issue because we are talking about a relatively small number of cloud providers thanks to capital costs. Ironically, that means the cost of providing massive high speed bandwidth BETWEEN different cloud providers is actually very small; especially when compared with the cost of providing large bandwidth to every single home and mobile phone in the world. And, of course, the backbone telecoms providers are already geared up to provide exactly this kind of point to point, high capacity infrastructure.

If a cloud provider artificially inflated their cross cloud costs, they would directly cut the available data-sharing applications for a customer and would suffer a big negative network effect compared to providers that ensured their cloud was as open to cross cloud use as possible. Would you choose the walled garden?

Regarding the second point: I think Smoothspan is confusing economies of scale with network effects. A larger provider can more easily deal with variation of demand, but this is an economy of scale (the cost of providing variable demand of size X to a customer is lower for a bigger player) and in fact is a negative network effect; just like your Internet connection at home. If every other customer stopped using the service there would be more capacity available for you. If everyone is using the service there is less capacity available: a negative network effect. Just as with the power grid, dealing with variation of demand is more easily managed with multiple providers that can be called on when require. In the single supplier model, they have no one to share demand peaks with and must over-provide capacity far in excess of a shared model.

For me the bottom line on utility computing is that it is very much like the provision of telecoms and power but without the network effect of owning the network. I would not be surprised to see backwards integration along the supply chain in this industry (i.e. a power generator and a bulk telecoms provider might have the infrastructure and capital structure to build data centres more cost effectively than Google, Amazon or MS as the market matures).

This market is no where near mature. I expect that Google, Amazon and MS are still there own biggest cloud customers.

With the rise of utility computing in the cloud, it will soon become very easy to create a PAAS offering because the utility computing provider absorbs the large fixed costs and rents the infrastructure to the PAAS provider on an incremental marginal cost basis. This is very similar to the virtual mobile network operators (like Virgin) which ride on the back of the network providers. The difference here is that the PAAS has the chance to create powerful network effects. 

So to summarise, utility cloud computing is firmly built on economies of scale where as I think cloud based platforms (PAAS) need to be firmly built on the economics of network effects to be defensible. An interesting battle ground for PAAS seems to be centred around the difference between software centric and data centric network effects, but more on that in a later post. 

web2.0 and the Innovators Dilemma

Web 2.0 is a vision of the web where content and functions can be remixed and reused to create new content or new applications. Web services and the semantic web are two of the key enablers for this vision but there appears to be dual approaches to web services emerging. Why is that?

SOAP & WSDL - opens up new vista of possibilities by solving some of the real hard problems (WS-this that and the other), requires expertise and new infrastructure e.g. toolkits app servers to manage complexity. Unsurprisingly the app server vendors are driving the new standards in enterprise software.

REST – open up a new vista of possibilities by making it very easy to use web application APIs, so new audiences can get involved and doesn’t require much in the way of changes to existing software stack. This is largely being driven by a very different community from the enterprise web services lot.

It seems to me that the difference in complexity and cost between the approaches is actually a symptom of something deeper.

SOAP Webservices are trying to go beyond what expert developers could already do with RMI, DCOM etc.

By its nature it must compete with what is already possible which is mission critical software systems that are Trusted, secure, reliable, accountable, and typically have a high cost of failure. Most of these developers could not buy into a new way of working if it mean going backwards in any of those critical areas.

If you are familiar with the work “The Innovators Dilemma” by Clayton M. Christensen, you may recognise this as the classic description of sustaining innovation. It must be better than what went before because it competes along the same dimensions with the same audience.

Clayton also describes what he terms as “Disruptive Innovation” of which one type is the low-end disruption. This is where a technically inferior innovation radically reduces the barrier(be that skill, cost or location) to entry thereby allowing an audience that was previously excluded to participate. This competes on new dimensions with a new audience.

This massive new audience is currently excluded from the traditional solution so the disruptive innovation only competes against being better than nothing for this audience.

So disruptive innovation allows a new, less skilled community to participate and do new kinds of things. Almost by definition this community is larger than the community of experts i.e. it is the long tail.

If we consider REST we see that it is not technically as advanced as SOAP based Web Services. But it is significantly easier with lower skill and cost barriers for both producer and consumer. And sure enough Amazon and others are finding that the vast majority of the users of their platform are using the REST APIs.

Software standards have always had a massive network effect. What good is a standard if nobody else uses it. This makes the size of the community around any standard or approach hugely important. The pace of innovation is also deeply linked to the size of the community that can innovate. Consider the number of web authors(including bloggers) who can probably get their heads around REST. It is vastly larger than the community of hard core software developers on the planet.

Clayton describes, with many examples, how low-end disruptions rapidly become better and better until the complex high end solutions are pushed off the map.

It wouldn’t be the first time that innovation become de facto outside the corporate firewall but eventually become good enough to be adopted by the enterprise.

Am I saying that SOAP Webservices are doomed. I’m sure they have their niche as Ian suggests here. But, on the other hand, vastly more innovation is likely when ordinary people gain the ability to do what before they had to employ an “expert” to do.Sound familiar, it should. Disintermediation is one of the hall marks of disruptive innovation. Disintermediation of “developers” or a redefinition of what it means to be a developer. Either way I would put money on Web2.0 emerging first from the innovative web user rather than the software industry establishment.