Week 7: Approaches to Web Services

Earliest approaches to networked applications

Earliest approaches did not involve XML or HTTP at all. Instead, client and server applications exchanged binary (non XML) data using custom (non HTTP) protocols: examples include CORBA, RMI (Sun), DCOM (Microsoft).

The philosophy was very much to develop distributed applications in the same manner as standard programs: the clients for these applications were typically standalone programs, rather than web applications, such as a Visual Basic or Java desktop application which needed to communicate with a remote database over the network. Therefore, non-network programmers felt relatively "at home" with these technologies.

The arrival of the true Web Service (XML over HTTP)

The arrival of XML as a data format led to the development of distributed approaches which exchanged XML, rather than binary data. Furthermore, the XML was sent using the standard protocol of the web, HTTP, delivered over port 80

So why XML over HTTP?

Remote Procedure Call Methods (XML-RPC and SOAP)

The first XML/HTTP-based approach we will look at is the Remote Procedure Call (RPC) approach. RPC-based web services share similarities with CORBA and the like in that they seek to allow the developer to develop the application in the same manner as a non-networked application. The basic unit is the function (also known as method or procedure); when you want to call the web service, you call a function much like you would in a regular program

The Web Service specifies a list of functions which clients may call over the net

Example of the RPC approach

If HitTastic! developed an RPC-based Web Service, it might offer a list of functions such as:

Each client could then call any one of these functions, depending on what they wanted to do.

Here is an example of code which calls an RPC web service. (note that this is a 'pseudolanguage' based on PHP: the code would not look exactly like this, but I've simplified it to illustrate the point)


$artist = $_POST["artist"]; 
$h = WebService('hittastic.com').getBiggestHit($artist);
echo "Biggest hit of $artist is: ";
echo  $h.title + " released in " + $h.year;

$songs = WebService('hittastic.com').searchForSongByArtist($artist);

for($i=0; $i < $songs.length; $i++)
{
    echo $songs[$i].title + ' came out in ' + $songs[$i].year;
}

Note how in the code we are calling the functions of the Web Service

Note also how the code illustrates the guiding principles of RPC-based approaches:

How is the data transported using RPC?

In RPC, not only the data itself (e.g. the artist), but also the type of the data (e.g. number or string), and the function we're calling, is encoded as XML and sent across the web This makes RPC messages both:

Here is an example of some data which would be transferred across the web using XML-RPC:


Note how the XML includes not only the data itself, but also its structure and the type (integer, string, etc) of the data

URL-based approaches

RPC based approaches distinguish different operations of the Web Service with different function calls. However, this is ignoring the fact that there is an existing Web feature that can be used to distinguish between different operations on the same server - namely the URL.

Why URL based approaches?

Plain old XML ("POX") over HTTP

POX is the most basic, and simplest, URL-based approach. Each web service operation is specified by a URL; you call the URL, using cURL for example, and get the XML back. For example, for HitTastic! :

hittastic.com/webservice.php?action=searchForSongByTitle&searchterm=Rock DJ
hittastic.com/webservice.php?action=searchForSongByArtist&searchterm=Beatles
hittastic.com/webservice.php?action=getWeekOfRelease&searchterm=Wonderwall,Oasis
hittastic.com/webservice.php?action=getBiggestHit&searchterm=Beatles

Notice how one script (webservice.php) is performing all the different Web Service operations. We tell the script which operation to perform using the action attribute. This is a common method of implementing "POX" web services, rather than having a separate script for each action.


Representational State Transfer: REST

REST is a more formal and structured extension to the basic URL-based web service idea. The basic idea is that each item you might wish to retrieve from the web (e.g. a song, a list of all songs by a given artist, a flight, a biography of an actor, etc) can be represented by a single and highly-descriptive resource or URL. For example:

http://www.hittastic.com/artist/Oasis
http://www.hittastic.com/song/Snow_Patrol/Run
http://www.hittastic.com/biography/Madonna
http://www.solentairways.com/flight/SA101
http://www.solentairways.com/flights/June_1/Southampton/New_York
The idea is that we can manipulate the resource in different ways depending on the type of message that we send. For example if we send a "GET" message we can retrieve information (e.g. the details of flight SA101), while if we send a "PUT" message we can update data (e.g. we could send a "PUT" message to http://www.solentairways.com/flight/SA101) to change the departure time of the flight.

REST takes the view that web services can be fully designed using the standard architecture of the web. What do we mean by that? The standard architecture of the web consists of :

HTTP - revision

Recall from last year that HTTP is a set of instructions which allow clients and servers to communicate with each other Recall also that HTTP requests and responses consist of two sections:

HTTP methods

HTTP comes with a set of standard methods to retrieve and manipulate URLs, which are specified in the HTTP request header. The two you've probably met are:

However there are other methods which are part of the specification but which are not normally used for standard web transactions. These are:

HTTP status codes - revision

Recall that the first line of the HTTP response is a status code which indicates whether the request was successful or not. There are a large number of HTTP status codes including:

REST and HTTP

REST takes the view that HTTP methods and status codes are under-used and can be exploited in web services. The idea is that one single web resource (URL) can be used for retrieving, adding, and deleting data associated with a particular item, e.g. a particular song in the HitTastic! database. What we can do is to do different things with the song depending on the type of HTTP method we use to communicate with the URL. A number of examples are shown below.

REST example 1

Imagine we have the URL:

http://hittastic.com/song/1009
to represent the song with the ID of 1009 in the HitTastic! database. (Note - see "Clean and unchanging URLs", below, for more details on why we use a URL like this, rather than something like http://hittastic.com/song.php?id=1009)

We could send:

This example underlines the key REST philosophy - One URL per item. The idea of REST is that we represent each item (e.g. a song, an artist) with one URL. We then send that URL a GET, PUT or DELETE request to retrieve, modify or delete that item.

The REST web service would also make use of HTTP status codes to indicate whether our transaction was successful. For instance:

REST example 2

Another example: our URL could represent a particular artist in the HitTastic database e.g.

http://hittastic/artist/Oasis
We could send:

The URL then sends back an HTTP status code to indicate success, or indicate the type of error that occurred,e.g. a 404 Not Found if the artist does not exist.

REST example 3

Our URL could represent a particular song and artist e.g.

http://hittastic/track/Oasis/Wonderwall
We could send: The URL then sends back an HTTP status code to indicate success, or indicate the type of error that occurred, e.g. a 404 Not Found if Wonderwall by Oasis was not in the database.

Difference in usage of HTTP codes in REST versus normal usage

Imagine we had a URL to look up a given song:

http://hittastic/song/1009
With REST, the URL could return "404 Not Found" to the client if the song with that ID was not on the HitTastic! database, or, if an invalid ID (0 or less) was supplied, the URL could return "400 Bad Request", another standard HTTP error code. Note that this use of error codes differs from the normal usage: This illustrates a key principle of REST: reuse the existing standards of the web, rather than invent new ones

Clean and unchanging URLs

Another key principle of REST, illustrated by the examples above, is that of clean, unchanging URLs. URLs which show implementation details (e.g. the fact that it's a PHP script) are prone to continuous change, causing problems in bookmarking them and linking to them; what if we decide to change from PHP to ASP for example, or even just change the location of the script on the server?

Also, such URLs are long-winded to type out and difficult to remember. With REST, we hide the implementation details with a simple, clean and easily-remembered URL. e.g. rather than

http://www.hittastic.com/track.php?title=Wonderwall&artist=Oasis
we could use:
http://www.hittastic.com/Oasis/Wonderwall
If we changed the underlying URL, all we'd need to do is change the mapping of our clean, easily remembered URL, and clients of the web service could continue to use our web service unchanged; they wouldn't have to alter their code to reflect the new underlying URL.

How do we set up a clean REST-style URL?

Apache comes with a module called mod_rewrite. mod_rewrite allows you to map one URL to another, e.g. a REST-style URL to the real URL. For example, you can tell mod_rewrite to map the REST-style URL (where T is the title and A is the artist)

http://www.hittastic.com/T/A
to the real URL:
http://www.hittastic.com/search.php?title=T&artist=A
More here.

Summary - advantages and disadvantages of REST

Advantages

Versus RPC:

Versus both RPC and POX:

Disadvantages

Versus RPC:

Versus POX:

Some resources

For the log book...