Session 5. Writing Web Service Clients

So far, we have looked at XML, Web Services and AJAX, and have connected to our web service from Week 1 from a JavaScript-based AJAX application. This week we are looking at how we can connect to a remote web service (provided by someone else) from our own website. A real-world example of this might be an airline booking site such as Expedia contacting web services provided by different airlines, parsing the XML returned and integrating the data in its own website.

There are two questions we need to ask here:

  1. How do we connect to another website from a server-side script?
  2. How do we parse (interpret) the XML returned by the web service?

Network Programming with cURL

Every contemporary programming language typically features a library which allows you to connect to a remote server over the network from a client application. Such a client application might be a Java program making connections to a remote server, or alternatively a server-side script running on a given server which wishes to connect to another server.

cURL is one such library used to connect to remote servers, and is the standard way of communicating with a server from PHP. Here is an example.

$connection = curl_init();
curl_setopt($connection, CURLOPT_URL, "http://remoteserver/script.php");
curl_setopt($connection,CURLOPT_RETURNTRANSFER,1);
curl_setopt($connection,CURLOPT_HEADER, 0);
$response = curl_exec($connection);
This code makes a connection to a given remote server (here, http://remoteserver/script.php) and the response sent back is stored in $response. If the remote script sends back XML, $response will contain XML. If the remote script sends back an HTML page, $response will contain HTML. This is standard code that can be copied and pasted every time you want to make a remote connection just change the URL.

Parsing the XML returned

Having obtained the XML returned from the server, it is stored in $response. The next thing we need to do is parse (interpret) it. There are two different general approaches to parsing XML:

We considered DOM last time when parsing XML from JavaScript; so we will look at SAX today. (PHP DOM support is somewhat unreliable)

Simple API for XML (SAX) based approaches

SAX (Simple API for XML) is a Java library for parsing XML, however similar approaches are adopted for other languages. The key feature of SAX-like approaches are that they are event driven parsers.

What do we mean by that? The XML is read one line at a time. Each time a line is read, an event - such as encountering an opening tag, encountering a closing tag, and encountering the text between the opening and closing tag - might occur. The idea is to write a piece of code to react to each of these events occurring. So we could write one section of code to react to encountering an opening tag, another to react to encountering a closing tag, and yet another to react to encountering the text in between an opening and closing tag.

One of the things that the code to react to opening and closing tags must do is determine which tag we are within, as we are likely to want to treat different tags differently.

This is illustrated by the diagram below.

SAX parsing

The data is read into arrays

What would this code actually do? Remember our aim is to read in the data from the XML. Typically, each item of data would be read into an array. So if we had some XML containing data about students, we could read the student names into one array and the student addresses into another.

The need for a series of Boolean variables

Typically we would also need to set up some Boolean (true/false) variables to indicate whether we are inside a particular tag. This would work as follows:

Example of using the PHP SAX parser

Here is an example which reads student data from some XML. In this example the XML is just contained within a string, however it could equally well be read in from the web.

First we set up our arrays to store the student names and courses. We also set up Boolean (true or false) variables $inName and $inCourse. The reason for these was discussed above.

<?php
$inName=false; 
$inCourse=false;
$names=array();
$courses=array();

Then we initialise our XML and store it in the variable $xml. This XML would normally be read in from the web, using cURL (see above), or from a file. However here, for illustration purposes, I have just stored the XML in a variable, $xml.

$xml ="<students>".
"<student><name>Rob Stevenson</name>" .
     "<course>Computer Network Management</course></student>".
          "<student><name>Jamie Bailey</name>".
    "<course>Computer Studies</course></student>".
    "</students>";

Next we initialise the parser.

$parser=xml_parser_create();
We then tell the parser the names of the functions which handle opening and closing tags, and the text between tags:

xml_set_element_handler($parser, "foundAnOpeningTag","foundAClosingTag");
xml_set_character_data_handler($parser, "foundSomeText");
The code above is saying that:

We then start the parsing with xml_parse. This process will call the three functions foundAnOpeningTag, foundAClosingTag and foundSomeText whenever it encounters an opening tag, a closing tag and some text between tags respectively.

xml_parse($parser,$xml);
xml_parser_free($parser);
Having parsed the XML, the arrays will contain the data. So we loop through the arrays and write the data out.

for($count=0; $count < count($names); $count++)
{
    echo $names[$count]. " ". $courses[$count]. "<br/>";
}
Now for the actual event handling code. Firstly here is foundAnOpeningTag, the function which runs when an opening tag is encountered.

function foundAnOpeningTag($parser,$tag,$attributes)
{
    global $inName, $inCourse;
    if($tag=="NAME")
    {
        $inName=true;
    }
    elseif($tag=="COURSE")
    {
        $inCourse=true;
    }
}
How does this work? The variable $tag will contain the tag that was encountered. So we test which tag was encountered, and set up the appropriate Boolean variable to indicate that we are within that tag. We will need this later, see below.

Note also the global declaration of $inName and $inCourse. This is because these two variables are declared outside of the function, in the "global" area of the script. In order for the function to use them, we have to declare them as global.


Here is foundAClosingTag, the code which reacts to closing tags:

function foundAClosingTag($parser,$tag)
{
    global $inName, $inCourse;
    if($tag=="NAME")
    {
        $inName=false;
    }
    elseif($tag=="COURSE")
    {
        $inCourse=false;
    }
}
foundAClosingTag works in the opposite way. The appropriate Boolean variable is set to false, to indicate we are no longer inside that tag.


Finally here is foundSomeText, the function which reacts to encountering text within a tag, e.g. <name>Rob Stevenson</name>.
function foundSomeText($parser,$characters)
{
    global $inName, $inCourse, $names, $courses;
    if($inName==true)
    {
        $names[] = $characters;
    }
    elseif($inCourse==true)
    {
        $courses[] = $characters;
    }
}
?>
The first thing we need to do is to discover which tag we are within, so that we know what to do with the text is it a name or a course? Unlike the last two functions, we do not have a $tag variable available to us. So we must make use of the Boolean variables $inName and $inCourse, which will be true if we are inside the corresponding tag remember we set these up in foundAnOpeningTag.

So if $inName is true, we add the text ($characters) to the array of names (the code $names[]=$characters; does this). Similarly, if $inCourse is true, we add the text ($characters) to the array of courses.

That's our complete parsing code! Here's the entire code for reference.

<?php
// Set up our variables.
$inName=false; 
$inCourse=false;
$names=array();
$courses=array();

// Initialise the XML to parse. In a real application, this would be read
// from the web or a file.
$xml ="<students>".
"<student><name>Rob Stevenson</name>" .
     "<course>Computer Network Management</course></student>".
          "<student><name>Jamie Bailey</name>".
    "<course>Computer Studies</course></student>".
    "</students>";

// Parse the XML.
$parser=xml_parser_create();
xml_set_element_handler($parser, "foundAnOpeningTag","foundAClosingTag");
xml_set_character_data_handler($parser, "foundSomeText");
xml_parse($parser,$xml);
xml_parser_free($parser);

// Print out the data read in.
for($count=0; $count < count($names); $count++)
{
    echo $names[$count]. " ". $courses[$count]. "<br/>";
}

// Function to handle opening tags.
function foundAnOpeningTag($parser,$tag,$attributes)
{
    global $inName, $inCourse;
    if($tag=="NAME")
    {
        $inName=true;
    }
    elseif($tag=="COURSE")
    {
        $inCourse=true;
    }
}

// Function to handle closing tags.
function foundAClosingTag($parser,$tag)
{
    global $inName, $inCourse;
    if($tag=="NAME")
    {
        $inName=false;
    }
    elseif($tag=="COURSE")
    {
        $inCourse=false;
    }
}

// Function to handle characters within tags.
function foundSomeText($parser,$characters)
{
    global $inName, $inCourse, $names, $courses;
    if($inName)
    {
        $names[] = $characters;
    }
    elseif($inCourse)
    {
        $courses[] = $characters;
    }
}
?>

Packaging all the data together with associative arrays

In the example above we read the names and the courses into two separate arrays. However it would make sense to package the whole lot into one array - i.e. an array of arrays. With PHP we can do this effectively using associative arrays. An associative array is an array which can be indexed using non-numerical indices. Here is an example of code which could be used to package the names and courses into one big associative array:

$studentdata = array();
$studentdata["names"] = $names;
$studentdata["courses"] = $courses;
Here we are creating an associative array $studentdata, and giving it two members, indexed by names and courses respectively. We set those two members to the array of names and the array of courses, respectively. We could then loop through all the values like this:
for($count=0; $count < count($studentdata["names"]); $count++)
{
    echo "Name: " . $studentdata["names"][$count] . "<br />"; 
    echo "Course: " . $studentdata["courses"][$count] . "<br />"; 
}


Exercises

The idea of this exercise is to write a remote website, IndieWorld, which specialises in indie/alternative music. IndieWorld wants to offer its users the opportunity to view all the songs released by its artists and their chart positions. However, rather than providing this information itself, it wishes to reuse the web service provided by HitTastic! (which you wrote in Week 1)

At the moment IndieWorld has pages on Oasis, Radiohead and the Kaiser Chiefs. A user of IndieWorld should be able to select one of these three artists and view all the singles released by that artist.

Part 1 - Requesting the data using cURL

Write a script which reads in the user's chosen artist from a form (see below), and sends a cURL request to your HitTastic! web service to search for all hits by that artist. Try it out and simply display the XML returned from the cURL call, with:

echo htmlentities($response);
(htmlentities encodes the < and > characters so that you can see them in the browser)

When you are done, upload the code to Aquarius (not Edward) using the username and password that I gave you by email for Aquarius. The idea of uploading to Aquarius is to emphasise the fact that the web service client and web service are on different machines. Make sure you create your own folder on Aquarius so you don't overwrite anyone else's work!

Here is the form you can use for IndieWorld.

<html>
<head>
<style type='text/css'>
body { background-color: black; color: white }
</style>
</head>
<body>
<h1>IndieWorld - your top indie music site</h1>
<form method="post" action="week5.php">
Artist:
<select name="artist">
<option>Oasis</option>
<option>Radiohead</option>
<option>Kaiser Chiefs</option>
</select>
<input type='submit' value='Go!' />
</form>
</body>
</html>

Part 2 - Parsing the XML returned

You have a choice of ways to parse the XML returned.

Log book work

  1. Answer this question. Why is XML an ideal format for a web service to provide from the point of view of reuse of the data on other websites?
  2. Can you see a problem with the use of web services in this way? Suggest possible solutions.
  3. Look up articles which compare and contrast the different ways of parsing XML, such as SAX and DOM. Try and find advantages and disadvantages of both, and ideally, offer your own opinion on the articles.