Week 7: Approaches to Web Services
- Remote Procedure Call (RPC) approaches
- URL based approaches
- Plain Old XML: POX
- Representational State Transfer: REST
Earliest approaches to networked applications
Earliest approaches did not involve XML or HTTP at all.
Instead, client and server applications exchanged binary (non XML)
data using custom (non HTTP) protocols: examples include CORBA, RMI (Sun),
DCOM (Microsoft).
The philosophy was very much to develop distributed applications in the
same manner as standard programs: the
clients for these applications were typically standalone programs,
rather than web applications, such as
a Visual Basic or Java desktop application which needed to
communicate with a remote database over the network.
Therefore, non-network programmers felt relatively "at home"
with these technologies.
The arrival of the true Web Service (XML over HTTP)
The arrival of XML as a data format led to the development of
distributed approaches which exchanged XML, rather than binary data.
Furthermore, the XML was sent using the standard protocol of the web,
HTTP, delivered over port 80
So why XML over HTTP?
- We know that XML will follow certain syntax rules, so parsers
(e.g. DOM or SAX) can be used to interpret the data
- By contrast, each older approach (CORBA, RMI, DCOM etc) used its own
data format, each of which was different, and also typically exchanged
binary data (non human readable)
- The use of HTTP over port 80 means that they are unlikely to be blocked
by corporate firewalls
- Traditional distributed approaches such as CORBA use their own
protocols and ports so firewall blocking is more likely
Remote Procedure Call Methods (XML-RPC and SOAP)
The first XML/HTTP-based approach we will look at is the Remote Procedure Call
(RPC) approach. RPC-based web services share similarities with
CORBA and the like in that they seek to allow the developer to develop the
application in the same manner as a non-networked application.
The basic unit is the function (also known as
method or procedure); when you
want to call the web service, you call a function much like you
would in a regular program
- Reminder: a function is a reusable module of code, such
as the JavaScript functions you wrote when we did AJAX
The Web Service specifies a list of functions
which clients may call over the
net
Example of the RPC approach
If HitTastic! developed an RPC-based Web Service, it might
offer a list of functions such as:
- searchForSongByTitle(title)
- searchForSongByArtist(artist)
- searchForSongByAll(title,artist,year)
- getWeekOfRelease(title,artist)
- getBiggestHit(artist)
Each client could then call any one of these functions, depending on what
they wanted to do.
Here is an example of code which calls an RPC web service.
(note that this is a 'pseudolanguage' based on PHP: the code would not
look exactly like this, but I've simplified it to illustrate the point)
$artist = $_POST["artist"];
$h = WebService('hittastic.com').getBiggestHit($artist);
echo "Biggest hit of $artist is: ";
echo $h.title + " released in " + $h.year;
$songs = WebService('hittastic.com').searchForSongByArtist($artist);
for($i=0; $i < $songs.length; $i++)
{
echo $songs[$i].title + ' came out in ' + $songs[$i].year;
}
Note how in the code we are calling the functions of the
Web Service
- Firstly we are looking up the biggest hit of that artist, then
looking up all hits by that artist
Note also how the code illustrates the guiding principles of
RPC-based approaches:
- Individual operations of the Web Service are defined by
functions
- Complex data structures or objects
(such as the data structure
representing a hit, returned by both method calls) can be transferred
intact across the web from application to application
- The goal is to enable the development of
seamless web applications
where the fact that you are calling a web service is invisible to
the developer
- ... thus making non-network programmers feel "at home"
How is the data transported using RPC?
In RPC, not only the data itself (e.g. the artist), but also
the type of the data (e.g. number or string), and the
function we're calling, is encoded as XML and sent across the web
This makes RPC messages both:
- Very structured, which means RPC web services can be very
robust as they can reject data of incorrect type (e.g. if you supply a
string when the web service is expecting a number)
- but also very bulky as a lot of information is sent as well
as the actual data
Here is an example of some data which would be transferred across the web
using XML-RPC:
Note how the XML includes not only the data itself, but also its structure
and the type (integer, string, etc) of the data
URL-based approaches
RPC based approaches distinguish
different operations of the Web Service with different function calls.
However, this is ignoring the fact that there is an existing Web feature
that can be used to distinguish between different operations on the same
server - namely the URL.
- Rather than specifying different operations through functions, we
can specify different operations by varying the URL
- Each URL typically points to a server-side script which generates XML
in response to the user's request
- The web services you've seen so far take this approach
Why URL based approaches?
- More in the spirit of the Web - you can use the Web's existing mechanism
without having to deal with an extra layer
- Less difficult to implement: to build a RPC-based application, you have to
install RPC libraries on both the client (for formatting the message to send)
and the server (for interpreting the message and formatting the response).
By contrast, with URL-based approaches, all you need to do on the server side
is provide appropriate XML in response to that URL, and on the client
side use standard libraries (e.g. cURL) to send data to that URL.
- There are no new standards to learn, so easier for beginners to get
started: the only difference between developing a standard web application and
a URL-based web service is that the script specified by the URL generates
XML data, not HTML content
- Typically we send just the data itself, and no extra information
(see above) meaning data sent using URL-based approaches is less bulky
cannot!
Plain old XML ("POX") over HTTP
POX is the most basic, and simplest, URL-based approach.
Each web service operation is specified by a URL; you call the URL,
using cURL for example, and get the XML back. For example,
for HitTastic! :
hittastic.com/webservice.php?action=searchForSongByTitle&searchterm=Rock DJ
hittastic.com/webservice.php?action=searchForSongByArtist&searchterm=Beatles
hittastic.com/webservice.php?action=getWeekOfRelease&searchterm=Wonderwall,Oasis
hittastic.com/webservice.php?action=getBiggestHit&searchterm=Beatles
Notice how one script (webservice.php) is performing all the
different Web Service operations.
We tell the script which operation to perform using the action
attribute. This is a common method of implementing "POX" web services,
rather than having a separate script for each action.
Representational State Transfer: REST
REST is a more formal and structured extension to the basic URL-based
web service idea. The basic idea is that each item you might wish to
retrieve from the web (e.g. a song, a list of all songs by a given artist,
a flight, a biography of an actor, etc) can be represented by a single
and highly-descriptive resource or URL. For example:
http://www.hittastic.com/artist/Oasis
http://www.hittastic.com/song/Snow_Patrol/Run
http://www.hittastic.com/biography/Madonna
http://www.solentairways.com/flight/SA101
http://www.solentairways.com/flights/June_1/Southampton/New_York
The idea is that we can manipulate the resource in different ways depending
on the type of message that we send. For example if we send a "GET" message
we can retrieve information (e.g. the details of flight SA101), while if we
send a "PUT" message we can update data (e.g. we could send a "PUT" message
to http://www.solentairways.com/flight/SA101) to change the
departure time of the flight.
REST takes the view that web services can be fully designed
using the standard architecture of the web. What do we mean by that?
The standard architecture of the web consists of :
- URLs, or resources
- A standard method to retrieve and manipulate those resources
(HTTP requests and responses)
HTTP - revision
Recall from last year that HTTP is a set of instructions which allow
clients and servers to communicate with each other
- e.g. when you enter a URL in a web page, you construct an HTTP
request for that page, and the server sends back a response
Recall also that HTTP requests and responses consist of two
sections:
- the header, which describes the content
- the content itself, such as form POST data in the
request, or the requested web page in the response
HTTP methods
HTTP comes with a set of standard methods to retrieve and
manipulate URLs, which are specified in the HTTP request header.
The two you've probably met are:
- GET - retrieve data from a URL. Query strings are GET requests.
- POST - send data to a URL (typically form data)
However there are other methods which are part of the specification but which
are not normally used for standard web transactions. These are:
- PUT - send data to a URL (not necessarily through a form)
- DELETE (not used in standard web page retrieval)
HTTP status codes - revision
Recall that the first line of the HTTP response is a status code
which indicates whether the request was successful or not.
There are a large number of HTTP status codes including:
- 200 OK - the page was retrieved successfully
- 400 Bad Request - the request is in an invalid format
- 401 Unauthorized - we try to access a page which we don't
have rights to view
- 404 Not Found - the page could not be found
- 500 Internal Server Error - there was some sort of
internal error on the server
REST and HTTP
REST takes the view that HTTP methods and status codes are under-used
and can be exploited in web services.
The idea is that one single web resource (URL) can be used for retrieving,
adding, and deleting data associated with a particular item, e.g. a
particular song in the HitTastic! database. What we can do is to do different
things with the song depending on the type of HTTP method we use to
communicate with the URL. A number of examples are shown below.
REST example 1
Imagine we have the URL:
http://hittastic.com/song/1009
to represent the song with the ID of 1009 in the HitTastic! database.
(Note - see "Clean and unchanging URLs", below, for more details
on why we use a URL like this, rather than something like
http://hittastic.com/song.php?id=1009)
We could send:
- a GET request to the URL if we wish to retrieve the data about
song 1009
- a PUT request to the URL if we wish to add or change data
relating to song 1009
- a DELETE request to the URL if we wish to remove song 1009
This example underlines the key REST philosophy - One URL per item.
The idea of REST is that we represent each item (e.g. a song,
an artist) with one URL. We then send that URL a GET, PUT or DELETE
request to retrieve, modify or delete that item.
The REST web service would also make use of HTTP status codes to indicate
whether our transaction was successful. For instance:
- The REST web service would send back 200 OK if all went well;
- The web service might send back 404 Not Found if the song with the
ID of 1009 did not exist;
- The web service might send back 400 Bad Request if we sent invalid
data to the URL, e.g. an ID of 0 (assuming the lowest ID is 1) or, if we
were updating or adding data, a year that hasn't happened yet, like
2010.
- The web service might send back 401 Unauthorized if we try to update
data but do not have the rights to do so (e.g. we are not the site
administrator, or we supply an invalid password)
REST example 2
Another example: our URL could represent a particular artist in the
HitTastic database e.g.
http://hittastic/artist/Oasis
We could send:
- a GET request to the URL if we wish to retrieve all Oasis hits
- a PUT request to the URL if we wish to add a new Oasis hit
- a DELETE request to the URL if we wish to remove all Oasis hits
The URL then sends back an HTTP status code to indicate
success, or indicate the type of error that occurred,e.g.
a 404 Not Found if the artist does not exist.
REST example 3
Our URL could represent a particular song and artist
e.g.
http://hittastic/track/Oasis/Wonderwall
We could send:
- a GET request to the URL if we wish to retrieve
all the details about Wonderwall by Oasis
- a PUT request to the URL if we wish to modify its details e.g
change the quantity in stock
- a DELETE request to the URL if we wish to remove
Wonderwall by Oasis from the database
The URL then sends back an HTTP status code to indicate
success, or indicate the type of error that occurred, e.g.
a 404 Not Found if Wonderwall by Oasis was not in the
database.
Difference in usage of HTTP codes in REST versus normal usage
Imagine we had a URL to look up a given song:
http://hittastic/song/1009
With REST, the URL could return "404 Not Found"
to the client if the song with that ID was not on the HitTastic! database,
or, if an invalid ID (0 or less) was supplied, the URL could return
"400 Bad Request", another standard HTTP error code.
Note that this use of error codes differs from the normal usage:
- Normally, "404 Not Found" is used to indicate that a URL
does not exist
- With REST, the URL does exist,
but it could not find the artist the user asked for
- The error codes are being re-used by REST to indicate an error
generated by the script rather than the web server
This illustrates a key principle of REST: reuse the existing standards
of the web, rather than invent new ones
Clean and unchanging URLs
Another key principle of REST, illustrated by the examples above,
is that of clean, unchanging URLs.
URLs which show implementation details (e.g. the fact that it's a PHP
script) are prone to continuous change, causing problems in bookmarking them
and linking to them; what if we decide to change from PHP to ASP for example,
or even just change the location of the script on the server?
Also, such URLs are long-winded to type out and difficult to remember.
With REST, we hide the implementation details with a simple, clean and
easily-remembered URL. e.g. rather than
http://www.hittastic.com/track.php?title=Wonderwall&artist=Oasis
we could use:
http://www.hittastic.com/Oasis/Wonderwall
If we changed the underlying URL, all we'd need to do is change the mapping
of our clean, easily remembered URL, and clients of the web service could
continue to use our web service unchanged; they wouldn't have to alter their
code to reflect the new underlying URL.
How do we set up a clean REST-style URL?
Apache comes with a module called mod_rewrite.
mod_rewrite allows you to map one URL to another, e.g. a
REST-style URL to the real URL.
For example, you can tell mod_rewrite to map the REST-style URL
(where T is the title and A is the artist)
http://www.hittastic.com/T/A
to the real URL:
http://www.hittastic.com/search.php?title=T&artist=A
More
here.
Summary - advantages and disadvantages of REST
Advantages
Versus RPC:
- Use of existing web standards (URLs, HTTP) rather than
"re-inventing the wheel"
- The "everything is a resource" philosophy enables bookmarking,
caching, linking
- Less development time than RPC approaches
- More lightweight messages
Versus both RPC and POX:
- Clean, easily-remembered URLs with well-defined
HTTP methods (GET, PUT, DELETE) to operate on them
and well-defined HTTP error codes to indicate success (or otherwise)
- Security:
Firewalls can be set up to block certain HTTP methods, so if you wanted
the public to be able to GET an item but not PUT or DELETE it, you could
easily set your firewall up to do this
Disadvantages
Versus RPC:
- Requires a fully "web-orientated" approach to development, so
people coming from non-web programming backgrounds will find it harder to
pick up than RPC
- Some claim it is less suitable to large scale, "enterprise"
environments than RPC
Versus POX:
- It can be a pain to write the client code to a REST web service,
as PUT and DELETE are little used outside of REST and consequently
poorly-supported by client libraries
- For this reason a "hybrid" approach is sometimes used, where
HTTP error codes are used (easy to implement), but all requests are GET or
POST requests (GET to retrieve data, POST to add/modify/delete)
- However this does not allow the firewalling advantage of REST described
in the previous slide to be as flexible (what if we wanted to allow
modification but not deletion?)
Some resources
For the log book...
- Research further into REST and RPC to get more of a feel for each
approach
- Examine, and ideally critically evaluate, articles which compare and
contrast the two approaches
- Beware of opinionated articles!
- Which do you think is best - REST, RPC and POX, and why?