Topic 6. Writing Web Service Clients

So far, we have looked at XML, Web Services and AJAX, and have connected to our web service from a JavaScript-based AJAX application. In this topic we are looking at how we can connect to a remote web service (provided by someone else) from our own website. A real-world example of this might be an airline booking site such as Expedia contacting web services provided by different airlines, parsing the XML returned and integrating the data in its own website.

There are two questions we need to ask here:

  1. How do we connect to another website from a server-side script?
  2. How do we parse (interpret) the XML returned by the web service?

Network Programming with cURL

Every contemporary programming language typically features a library which allows you to connect to a remote server over the network from a client application. Such a client application might be a Java program making connections to a remote server, or alternatively a server-side script running on a given server which wishes to connect to another server.

cURL is one such library used to connect to remote servers, and is the standard way of communicating with a server from PHP. Here is an example.

$connection = curl_init();
curl_setopt($connection, CURLOPT_URL, "http://remoteserver/script.php");
curl_setopt($connection,CURLOPT_RETURNTRANSFER,1);
curl_setopt($connection,CURLOPT_HEADER, 0);
$response = curl_exec($connection);
This code makes a connection to a given remote server (here, http://remoteserver/script.php) and the response sent back is stored in $response. If the remote script sends back XML, $response will contain XML. If the remote script sends back an HTML page, $response will contain HTML. This is standard code that can be copied and pasted every time you want to make a remote connection just change the URL.

Parsing the XML returned

Having obtained the XML returned from the server, it is stored in $response. The next thing we need to do is parse (interpret) it. In PHP, one can use the DOM, just as for JavaScript. One disadvantage of the DOM is that it loads the whole XML document into memory, so if a lot of data is sent back from the server, this may not be the best approach. An alternative is SAX - the Simple API for XML. With SAX, each line of the XML document is loaded into memory one at a time, and code is written to process each line as it comes in. This means that SAX is more complex to program compared to DOM, but significantly more memory-efficient.

Here, we will be considering SimpleXML and SAX. SimpleXML is a simple XML parsing library which is part of PHP 5. It is similar in approach to DOM - i.e. the whole of the document is loaded into memory - but offers a simpler programming interface. However, it is PHP-specific, while the DOM is a general programming interface which can be used within a range of programming languages, and SAX-like approaches are also used by a wide range of languages. When doing the exercises, I would recommend that you use SAX if you believe your programming ability is intermediate or strong, but SimpleXML if you find programming difficult. This is because SAX is more memory-efficient but trickier to program, but SimpleXML is quite easy to program but less memory-efficient and also is PHP-specific.

Example of SimpleXML

Imagine we have this XML below, stored in a file called students.xml:

<students>

<student>
<name> Mark Gill </name>
<username>9gillm69</username>
<phone>07111 111111</phone>
</student>

<student>
<name> Steve Mills </name>
<username>1mills63</username>
<phone>07222 222222</phone>
</student>

<student>
<name> Rob Price </name>
<username>5pricr67</username>
<phone>07333 333333</phone>
</student>

</students>
We could parse the XML as follows:
<?php
$xml = simplexml_load_file("students.xml");
for($index=0; $index < count($xml->student); $index++)
{
    echo $xml->student[$index]->name . "<br />";
    echo $xml->student[$index]->username . "<br />";
    echo $xml->student[$index]->phone . "<br />";
}
?>

Note how we read in the XML file with simplexml_load_file. We reference individual tags with the -> symbol, e.g. $xml->student is a collection of all the student tags within the XML. We then use a "for" loop to loop through each student in the XML as follows:

SAX parsing

Please note, this is a more advanced topic, intended only for those of you who are comfortable with programming. Please look at SimpleXML, instead, if you are less so.

SAX (Simple API for XML - this is different to SimpleXML) is a Java library for parsing XML, however similar approaches are adopted for other languages. The key feature of SAX-like approaches are that they are event driven parsers.

What do we mean by that? The XML is read one line at a time. Each time a line is read, an event - such as encountering an opening tag, encountering a closing tag, and encountering the text between the opening and closing tag - might occur. The idea is to write a piece of code to react to each of these events occurring. So we could write one piece of code to react to encountering an opening tag, another to react to encountering a closing tag, and yet another to react to encountering the text in between an opening and closing tag.

One of the things that the code to react to opening and closing tags must do is determine which tag we are within, as we are likely to want to do different things with the data depending on what tag we are within.

This is illustrated by the diagram below.

SAX parsing

The data is read into arrays

How would the code to react to opening and closing tags actually work? Remember our aim is to read in the data from the XML. So, a good approach to take would be to read each item of data into an array. So if we had some XML containing data about students, we could read the student names into one array and the student addresses into another.

Example of using the PHP SAX parser

Here is an example which reads student data from some XML. In this example the XML is just contained within a string, however it could equally well be read in from the web.

<?php
// Set up our variables.
$data = array();
$currentTag = null;

// Initialise the XML to parse. In a real application, this would be read
// from the web or a file.
$xml ="<students>".
"<student><name>Rob Stevenson</name>" .
     "<course>Computer Network Management</course></student>".
          "<student><name>Jamie Bailey</name>".
    "<course>Computer Studies</course></student>".
    "</students>";

// Parse the XML.
$parser=xml_parser_create();
xml_set_element_handler($parser, "foundAnOpeningTag","foundAClosingTag");
xml_set_character_data_handler($parser, "foundSomeText");
xml_parse($parser,$xml);
xml_parser_free($parser);

// Print out the data read in.
for($count=0; $count < count($data["name"]); $count++)
{
    echo "Name ". $data["name"][$count]. " Course ". $data["course"][$count]. "<br/>";
}

// Function to handle opening tags.
function foundAnOpeningTag($parser,$tag,$attributes)
{
    global $currentTag;
    $currentTag = strtolower($tag);
}

// Function to handle closing tags.
function foundAClosingTag($parser,$tag)
{
    global $currentTag;
    $currentTag = null;
}

// Function to handle characters within tags.
function foundSomeText($parser,$characters)
{
    global $data, $currentTag;
    $data[$currentTag][] = $characters;
}
?>

This is one possible way of using SAX to extract data from XML. In this example, we extract all the data, however, we could alter the code to extract only selected data.

First we set up an array ($data) to store all the data read in from the XML. We also set up a variable $currentTag to represent the current tag we are inside, and set it initially to null to indicate that we are currently not inside any tag.

<?php
$data=array();
$currentTag=null;

Then we initialise our XML and store it in the variable $xml. This XML would normally be read in from the web, using cURL (see above), or from a file. However here, for illustration purposes, I have just stored the XML in a variable, $xml.

$xml ="<students>".
"<student><name>Rob Stevenson</name>" .
     "<course>Computer Network Management</course></student>".
          "<student><name>Jamie Bailey</name>".
    "<course>Computer Studies</course></student>".
    "</students>";

Next we initialise the parser.

$parser=xml_parser_create();
We then tell the parser the names of the functions which handle opening and closing tags, and the text between tags, so that when the parser encounters a tag or some text, it knows where in the code to jump to:
xml_set_element_handler($parser, "foundAnOpeningTag","foundAClosingTag");
xml_set_character_data_handler($parser, "foundSomeText");
The code above is saying that:

We then start the parsing with xml_parse. This process will call the three functions foundAnOpeningTag, foundAClosingTag and foundSomeText whenever it encounters an opening tag, a closing tag and some text between tags respectively. These functions are explained in more detail below.

xml_parse($parser,$xml);
xml_parser_free($parser);
When we get to this point, our XML will have been parsed and our variabble $data will contain the data. So we can loop through the data array and write the data out; I will explain how this works below, after the three event-handling functions have been considered.

Now I'll explain the actual event handling code. Firstly here is foundAnOpeningTag, the function which runs when an opening tag is encountered.

function foundAnOpeningTag($parser,$tag,$attributes)
{
    global $currentTag;
    $currentTag = strtolower($tag);
}
How does this work? The variable $tag will contain the current tag. So we store it inside the global variable $currentTag, converting it to lower case first (one of the shortcomings of the PHP SAX-style parser is that it automatically converts tags to upper case - even if they are lower!) We will use the variable $currentTag in the character handling function, below.

Note also the global declaration of $currentTag. This is because the variable is declared outside of the function, in the "global" area of the script. In order for the function to use them, we have to declare them as global.


Here is foundAClosingTag, the code which reacts to closing tags:

function foundAClosingTag($parser,$tag)
{
    global $currentTag;
    $currentTag = null;
}

foundAClosingTag sets the $currentTag variable to null, to indicate that we are no longer inside that tag.


Finally here is foundSomeText, the function which reacts to encountering text within a tag, e.g. <name>Rob Stevenson</name>.
function foundSomeText($parser,$characters)
{
    global $data, $currentTag;
    $data[$currentTag][] = $characters;
}

Remember from foundAnOpeningTag(), above, that the global variable $currentTag contains the current tag we are inside. Our aim is to add the text that we have found to the appropriate array. The variable $data will contain all the data parsed from the XML. $data is an associative array of arrays. (An associative array is an array which can be indexed using non-numerical indices, such as strings). So $data["name"] will be an array of all the values contained within the <name> tags, and $data["course"] will be an array of all the values contained within the <course> tags.

So, we add the text ($characters) to the appropriate array, using the value of $currentTag. This is done with the line:
$data[$currentTag][] = $characters;
The [] after $data[$currentTag] means "add a new element on to the end of the array", so in this case we will add $characters on to the end of the appropriate sub-array of $data. That's our complete parsing code!

Displaying the results

Now you understand how the parsing works, we can return to the code which actually prints out the results. This is here:

for($count=0; $count < count($data["name"]); $count++)
{
    echo "Name ". $data["name"][$count]. " Course ". $data["course"][$count]. "<br/>";
}
As already discussed, $data is an array of arrays, each sub-array containing each aspect of the data, for example $data["name"] is an array of all the names and $data["course"] is an array of all the courses extracted from the XML. So we loop from 0 to the number of elements in one of the sub-arrays ($data["name"] has been picked, it doesn't matter, they will all have the same length) and display each member each sub-array in turn.