CMSC 621 Term Project: BranDon

Authors: Don Miner and Brandon Wilson
Class: CMSC621 with Dr. Joshi
This project was the term project for the course CMSC621: Advance Operating Systems at UMBC taught by Dr. Anupam Joshi during Fall 2006. The task was to design and create a distributed web services system. Project Description.
Term Paper
Abstract:
Web services are web-based applications that use an XML-formatted communication protocol, such as SOAP, to communicate and service requests made by clients. The service providers face several weaknesses, including a limitation on the number of clients they can serve at once, and making their service available to a large set of clients. The goal of our system is to provide a distributed foundation of web service providers so as to provide fault tolerance, load balancing of client requests through request distribution and dynamic replication, efficient provider and service discovery, and high service availability.
We utilize various techniques to achieve all of the mentioned attributes of distributed web service architecture. An eventually consistent cache is utilized at every server to maintain a cache of other providers in the network and perform node and service discovery through random synchronization periods. Also, in any distributed system, messages are passed among nodes for communication purposes, which can become a very bandwidth expensive task. We use a multicast method that minimizes redundant messages while messaging a list of recipients. Also, to ensure confidential communication, all servers have an RSA key pair that is used to secure the communication channel during incoming connections using SSL.

Final Term Paper
Components/Tools Used:

Load Balancing Technique:
When a get_wsdl request is received, the node servicing this request has the choice to give the wsdl location of any node in the system that has the service. These nodes that have the web service include nodes that already have the service or nodes which replicate the service in order to have it.
    The steps gone through when a node receives a get_wsdl:
  1. A get_wsdl request is received for service X.jws and a handler process is forked from the listening process.
  2. The handler process sends a "WILL YOU EXECUTE X.jws?" request to all nodes with our multicast method in which it believes have the service according to the local cache. This process will stop and return the location of the node when one of the sites is at a low enough load and returns "YES". In the event of a "YES", the location of this node is returned to the one who initiated the query.
  3. If no site that had the web service returned "YES", the handler processes sends a "WILL YOU REPLICATE X.jws?" request to all nodes (with our multicast) in which it believes does not have the service according to the local cache. This process will stop and return the location of the node when one of the sites is low enough load, successfully replicates the web service from another node and then returns "YES". In the event of a "YES, the location of this node is returned to the one who initiated the query.
  4. Finally, if no site that did not have the web service returned "YES", the handler tells one of the nodes that does have it that it will have to execute the web service request even if it is overloaded.

Cache Synchronization:
A major part of our system is the maintaining of the eventually consistent local caches which are on every single node. These local caches store crucial information about where web services are. At each node, this node's cache has an entry for every other node it knows about. Inside of each of these entries is a list of web services this node has. Also, the WS-Policy of each node is stored in this cache as well.

These caches must be synchronized with each other frequently in order to keep up-to-date information in the system. In order to do this, pair-wise synchronization sessions are done between nodes where the caches are traded and both are brought up to date with each other. For example, if node A received an update from node P at 3:30pm and node B received an update from node P at 3:29pm, both node A and B will use the information that A retrieved at 3:30pm. Also, the most current information from a node is given to the other node during these sessions. For example, when nodes A and B enter a session, they will both have the most current information about the other.

Caches can also be updated outside of these synchronization sessions. Most of these instances are very similar to a cache miss. For example, when a site is queried to execute a web service but that site just recently removed it, the node will inform the querying node of the change so that it may make the update in its cache. Another example is when a foreign node replicates a web service so it may execute it (in response to a WILL YOU REPLICATE request), this change is noted in the originating node's cache.

This method turned out to be very successful and the system state was only ever a few seconds out of date in a system of 15 nodes. All the different ways for caches to be updated worked together to provide each node with a very good view of the system state.
Multicasting:
Our system has a home-made multicast method which works to distribute the load of sending a single message to many nodes as well as to provide higher availability in a network where several nodes are disconnected from one another. This multicast is used for all broadcast communication in the system: job forwarding, web service replication requests and actual web service replication.

Our algorithm to plan the multicast is very similar to an approximation of the Steiner Tree Problem. We try to find the fewest number of "hops" to get a message to all the nodes that need it. Once this path is found, the multicast plan is executed, working in a very recursive fashion, sending the message through the network and responses being percolated back up to the originator of the multicast.
WS-Policy support:
This system has limited WS-Policy support. In order for a client to communicate with a node as well as retrieve the WSDL from another node, it must have a WS-Policy intersection with these nodes. Also, the nodes must have a WS-Policy intersection with another node in order to communicate with it. In effect, in order to perform any communication from one place to another, these two must have a WS-Policy agreement.

This has an interesting side-effect of disconnecting two nodes from talking to each other, which creates a disconnected network in some ways. Even with this present in the network, a node servicing a get_wsdl request should return the location of a node if it can service the request, even if the two nodes can not communicate. This is done with our multicast method, which will use intermediaries acting as third parties to transfer communications between two nodes.

When forwarding a request or requesting for another node to replicate, requests are only sent to nodes which have a matching WS-Policy with the client. This is efficient because the WS-Policy of each node is stored in the local caches and the intersection can be calculated locally opposed to querying over the network.
Network topology graphs:

Load distribution graphs:

Files and examples:
Term Paper
Node configuration file
WS-Policy file example
Complete code