Connecting the Dots

Connecting the dots
By Mark Frauenfelder
October 19, 2004
Next

Tim Berners-Lee…worries that poorly conceived changes to the web’s organisation and governance could compromise its functionality and “universality”. Photo: AP

Creating the World Wide Web didn’t make Tim Berners-Lee instantly rich or famous. That’s partly because the web sprang from relatively humble technologies.

Berners-Lee’s invention was based on an information retrieval program called Enquire (named after a Victorian book, Enquire Within Upon Everything), which he wrote in 1980 while working as a programmer at the European Organisation for Nuclear Research (CERN) in Geneva, Switzerland. In part, the lack of riches is because Berners-Lee did the unthinkable when he finished writing the tools that defined the web’s basic structure more than 10 years later: he gave them away, with CERN’s blessing, no strings attached.

While others made millions off his invention, the soft-spoken programmer went on to found the World Wide Web Consortium (W3C) at MIT in Boston, which he still directs, to promote global web standards and development.

Berners-Lee is finally getting his reward: in July he was knighted and the previous month he received Finland’s million-euro Millennium Technology Prize.

Now the 49-year-old is busy overseeing hundreds of projects at W3C. He is also personally engaged in developing his second big idea: the Semantic Web.

The Semantic Web adds definition tags to information in webpages and links them in such a way that computers can discover data more efficiently and form new associations between pieces of information, in effect creating a globally distributed database. Though part of Berners-Lee’s original intention for his invention, the Semantic Web has been 15 years in the making and has met its share of scepticism. But Berners-Lee believes it will soon win acceptance, enabling computers to extract meaning from far-flung information as easily as today’s internet simply links individual documents.

The Semantic Web, coupled with other specifications and tools being developed at W3C, including accessibility standards for disabled people and software for mobile devices, is part of Berners-Lee’s grand vision of “a single web of meaning, about everything and for everyone”. But is it a tangled web we weave?

Despite his excitement about the future, Berners-Lee worries that poorly conceived changes to the web’s organisation and governance could compromise its functionality and “universality”.

For several years he has been promoting the concept of the Semantic Web. But it is proving a hard notion to sell.”

It’s not the first time I’ve had this paradigm-shift problem,” he says. “Early on, people really didn’t understand why the web was interesting. They saw it in the smaller scale, and it’s not interesting in the smaller scale. Same thing with the Semantic Web.”

Right now we are just starting by putting applications onto the Semantic Web one by one and linking them up where it seems useful. But what’s exciting is the network effect. The vision is that we will get to a critical mass where everything starts getting linked into an unimaginably large whole. Then the incentive to add more to it rises exponentially as the value of what is out there also does. Because few people initially get this great ‘aha!’ of connecting to a huge mass of Semantic Web data, it all has to be done by people who are convinced, who understand that it’s worth putting the effort into getting the thing off the ground.”

As Berners-Lee describes it, the common thread to the Semantic Web is that there is a lot of financial information, weather information, corporate and other information on databases, spreadsheets, and websites that you can read but not manipulate. The key thing is that this data exists but the computers don’t know what it is and how it interrelates. But when there’s a web of interesting global semantic data, then ideally users will be able to combine the data they know about with other data they don’t know about.

“Suppose you’re browsing the web and you find a seminar advertised and you decide to go,” Berners-Lee says. “There is all sorts of information on that page, which is accessible to you as a human being, but your computer doesn’t know what it means. So you must open a new calendar entry and paste the information in there. Then get your address book and add new entries for the people involved in the seminar. And then, if you wanted to be complete, find the latitude and the longitude of the seminar and program that into your GPS (Global Positioning System) device so you could find it.

“It’s very laborious to do all this by hand. What you would like to be able to do is just tell the computer ‘I’m going to this seminar’. If there were a Semantic Web version of the page, it would have labelled information on it that would tell the computer ‘this is an event,’ and what time and date it is. And it would automatically add your travel to your event book. It would add the people to your address book and it would program your GPS to give you directions. It would have the relationships between the event and the various people chairing it. And those people would have Semantic Web personal pages, which contained information about how you could contact them.

“Your address book can now grow from a closed repository of private data to a view on the people-related data in the world.”

It sounds a bit like something a personal assistant might do. Not according to Berners-Lee, who says the human assistant would have the human mind’s ability to suddenly think of correlations across the whole spectrum of his or her experience. In contrast, the Semantic Web gives you a program that can do all the things your MIS department could write programs to do if it had the time. Nevertheless, it is still a program. Just as the World Wide Web is still a document. Berners-Lee believes that in the future the Semantic Web will be a great place to develop artificial intelligence in the strong sense. But right now he is making something quite mechanical – even if he is using bits and pieces of the machinery developed by the AI community over the years.

The Semantic Web technology tackles the problem in two stages. “The more mundane is a common data format,” he says. “You can take a database or a calendar or an address book or a bank statement or a weather reading – basically anything with hard data in it – and make the machine write it in the basic Semantic Web language instead of some proprietary or application-specific format. This solves the ‘syntactic’ problem.

“It still doesn’t solve the ‘semantic’ one, though. For that, the Semantic Web first gives names to the basic concepts involved in the data: date and time, an event, a check, a transaction, temperature and pressure, and location. These are all defined just to mean whatever they mean in the system that produces the data: for example, ‘Transaction date as I get on a bank statement’, and so on.

“This set of concepts is called an ontology. Then, where there are connections between ontologies, such as when the date and time on a photograph is the same concept as the time on a weather report, we write rules to take advantage of these connections. This allows one to query the Semantic Web agent for photos taken on sunny days, for example. Bit by bit, link by link, the data becomes connected, interwoven. The exciting thing is serendipitous reuse of data: one person puts data up there for one thing, and another person uses it another way.”

The way the Semantic Web works is by defining new languages for computers to exchange information. Phase one, which is complete, involved getting those first languages, for both syntax and semantics, to the state in which they became standards supported by W3C’s members. Interoperability is the key.

“Now there is this foundation,” says Berners-Lee, “and anybody who wants to make a new application and publish data can do that, and everybody else’s program will be able to read the data.”

What kinds of Semantic Web applications are people making for the next phase? “Exciting things are happening in the life sciences. The big challenges such as cancer, AIDS, and drug discoveries for new viruses require the interplay of vast amounts of data from many fields that overlap – genomics, epidemiology, and so on. Some of this data is public, some very proprietary to drug companies, and some very private to a patient. The Semantic Web challenge of getting interoperability across these fields is great but has huge potential benefits.”

There are also challenges around maintaining privacy and intellectual property. There is clearly a great deal to gain, which is why a lot of people are getting very fired up about working on the life sciences with Semantic Web applications.

But other industries stand to benefit enormously too. Tool kits from Hewlett-Packard and IBM, authoring applications from Adobe, smart content management products from Profium and Brandsoft, and search engines from Network Inference are all working to create a Semantic Web at various scales. These and other technologies are being adopted by communities that in turn revolutionise how these groups collaborate and communicate.

“In Britain the Semantic Web Environmental Directory is a prototype of a new kind of directory of environmental organisations and projects,” Berners-Lee says. “Rather than centralising the storage, management, and ownership of the information, SWED simply harvests data and uses it to create the directory.”

From a social perspective, there’s a Semantic Web application nicknamed Fatcats from FoafCorp that allows users to pick a company and see who is on its board via a graph of connected people. When you click on one of the people, you see all the boards they are a member of. You can start exploring the spheres of influence in American corporate culture.”

In all this talk of a Semantic Web, the importance of what Berners-Lee calls “web universality” is central.

“One of the fundamental properties of the web is the fact that it is just one space, and it’s a consensual space,” he says. “It should be independent of the hardware you use. It should be independent of the software you use or the operating system it’s running on. It should also be independent of what culture you’re in, or whether you’re writing a wonderful, carefully edited document, or whether you’re scribbling something on the back of the proverbial envelope. And it should be independent of what language you’re using, what character set, whether your letters go up and down, left to right, or right to left. Also, people should be able to access that information even if they have disabilities. At W3C we call this concept “one web for anyone, everywhere, on anything.”

A short history of the World Wide Web

1945: In The Atlantic Monthly, director of the US Office of Scientific Research and Development, Vannevar Bush, describes the Memex, a hypothetical device for linking microfiche documents.
1968: The Stanford Research Institute’s Douglas Engelbart demonstrates an “oNLine System” (NLS) with features that include hypertext browsing, editing, and email. To enable it, he invents the mouse.
1980: Tim Berners-Lee, a consultant with the European Centre for Physics Research CERN, writes software that allows electronic documents to link to each other.
1990: Berners-Lee dubs his global hypertext program “WorldWideWeb”. The number of websites in existence: one.
1993: Marc Andreessen releases the Mosaic web browser, which becomes the basis for Netscape.
1994: The World Wide Web Consortium (W3C) is founded. The number of websites reaches 10,000. Berners-Lee presents the idea of the semantic web.
1998: W3C releases the eXtensible Markup Language (XML) specification. It allows webpage text to be tagged with descriptive labels – critical for the semantic web.
2000: By year’s end, 25,675,581 websites have been identified.
2004: Standards that allow computers to exchange semantic web information are finalised. Berners-Lee is knighted.

– MIT Technology Review