This document will discuss the WSMX Discovery Component used in Semantic Web service Discovery in WSMX [Zaremba et al, 2005]. It will describe the various steps taken in the process of creating a software component capable of automatic Semantic Web service Discovery.
Semantic Web services are currently paving the way to provide the automation and dynamic composition of web service technologies thus reducing the effort that is taken when attempting to integrate applications, businesses and customers. A pivotal concept that enables the location of web services is that known as Discovery-"The act of locating a semantic description of a Web service-related resource that may have been previously unknown and that meets certain functional criteria. It involves matching a set of criteria with a set of resource descriptions. The goal is to find an appropriate Web service-related resource" [Web Services Glossary].
WSMO Discovery [Keller et al, 2004] defines a conceptual model for locating services that (totally or partially) fulfill a requestors goal.
Semantic Web service Discovery is based on matching abstracted descriptions of formalized goals with semantic annotations of formalized web services and that can potentially be used to get the desired service.
The purpose of this document is describe the Discovery component developed for use in WSMX. The document will describe the querying of WSMX Resource Manager populated with web services that have been registered with WSMX. This approach was previously investigated in D10 WSMX Discovery, D5.1 WSMO Web Service Discovery, D5.2 WSMO Discovery Engine. However, this document will describe the actual implementation within WSMX.
This document will provide some brief introductions to related approaches to web service discovery in Section 2. Section 3 will discuss the current state of Discovery, and will discuss the terminology and concepts use in Web Service Discovery. Section 4 and 5, will provide details of the component Architecture and Implementation, respectively.
There are three main approaches to discovery which shall be covered in this document. This section will discuss the concept of each approach and it will present any implementation of these approaches which are deemed relevant to this document.
Keyword based Discovery can be used to filter and quickly rank large amounts of goal and service descriptions. [Keller at al. 2004a] Although it is limited because of the ambiguity of natural language, it is fast and appropriate for determining a subset of services, or goal on which to perform some semantic based discovery.
It must be decided over what properties of goal/service formalized description the keyword-based discovery must be performed. Each service description in WSMO [Roman et al., 2004] has nonFunctionalProperties. However should the author of the goal/service not to provide any description of the nonFunctionalProperties, the goal/service discovery would not be able to use this content.
Keyword-based discovery can be implemented for use with logical formulae that are used in goal and service descriptions, matching the keywords that the user specified on the one hand and the names of the predicates on the other hand.
Keyword-based discovery could be used to find commonality between keywords provided by the user and concepts from the ontologies that are used to formalize goals/services. This involves matching strings of data which correspond to concepts. Concepts will always be present in any ontological description registered with WSMX. It is this final type of keyword based discovery that will be used for this implementation as it is the most reliable.
A problem arises however if two strings with the same 'conceptual' meaning are spelt differently. For example, 'FlightBooking' and 'FlugBuchung' may have exactly the same meaning, but as strings, they would be impossible to match. This is why 'Semantic' discovery is needed. This is further described in section 2.2.
In this section we consider the approaches for semantics-based discovery for simple semantic annotations of web services that are discussed in [Keller et al. 2004a].
While simple keyword-based discovery may be useful in the case of matching simple terms, it is not accurate when trying to match terms which are similar, but not exactly the same. For example, matching the terms "travel ticket" & "transportation voucher" would be misconstrued by the keyword-based discovery. It is for this reason that a data mediator is used to create mappings between the ontologies from which a match is trying to be found.
The idea behind Concept-based discovery is best described with a example usage scenario. A goal is sent as a request for a service. In order to find the request, firstly we must reduce the number of Web services to be semantically checked in order to confirm which Web services are offering a service in the correct subject area. For this example, we will use "FlightBooking". The goal ontology would specify the concept "FlightBooking", and the Web Service description ontology would specify the concept "FlugBuchung". Using the WSMX Mediation Component to mediate these two concepts we now know that the goal & the Web service description have similar concepts.
In this section we consider the approaches for semantics-based discovery for rich semantic annotations of web services that are discussed in [Keller et al. 2004a].
Web service discovery is essentially, about finding the correct service that meets the needs of a specific goal. This goal has a certain number of facts stated about it's requirements in the form of Capabilities, Pre-conditions, Post-Conditions and Non-Functional Properties [Bussler, Fensel et al, 2005]. Each Web service description has similar facts stated about the service provided. Thus it is possible to match a service description, to a goal. This is the fundamental aim of Discovery within WSMX. The scope of service discovery within WSMX is confined to services that have been registered with WSMX. All other services are ignored.
Within WSMX there is a repository which holds all the Web service descriptions of every Web service registered with WSMX. As this repository store is so big, due to the large number of service descriptions stored there, a 'time saving' method of searching through these service descriptions is required. The method employed by WSMX Discovery is the filter down the list of 'candidate' service descriptions until only those fulfilling the requirements of the goal are left.
Firstly, only those descriptions who are accepted on the basis of simple semantic description are sent given the opportunity to be 'rich-semantically' discovered by means of reasoning. This dramatically reduces the number of service descriptions to be reasoned over, thus reducing the time needed to discover appropriate services.
This section will discuss Web service discovery, the terminology used and the mechanisms employed to perform it. This section is intended to be a grounding for the architecture discussion in section 5.
The Internal components of the Discovery component Architecture; The Crawler, The Discoverer and the Filter (see Figure 1).
Figure 1: WSMX Discovery Component Architecture
The architecture envisaged for the WSMX Discovery component consists of three independent applications. The Discoverer will then search the WSMX Triplespace (or Web service Repository) for Web service descriptions required to complete a specific Goal. The Discoverer may also search the Triplespace or Ws Repository of other WSMX implementations for a required Web service description. These descriptions would then be written to the local WSMX Triplespace (or Repository or not depending on their availability). The Discovery Filter will be used to initiate the "actual" Discovery as described by the discovery definition from the Web Services Glossary. The functionality of these components will be described later. A brief diagram of the components is shown above.
The Internal components of the Discovery component Architecture will be integrated into WSMX. This section will detail how this integration will work.
This section will detail the implementation of each application within the WSMX discovery component. It will describe the development of each applications functionality as the WSMX Discovery Component becomes more mature.
This Discover is conceived as the central component of WSMX discovery. The aim is for it to provide rigorous semantic enabled discovery. [quote] The discoverer consists mainly of a discovery manager. It is this discovery manager, which controls the process of discovery. However for the purposes of this document, the discovery manager and the Discoverer component are one and the same. The process is as follows:
The Communication manager submits a discovery request to the Discoverer. The Discoverer retrieves the Goal requested from the Goal Repository of WSMX. The Discoverer submits the request to the parser. The parser returns the elements of the Goal in a Java set format, allowing the elements to be retrieved according to their type. (A concept can be retrieved as a concept, etc...)
The Discoverer sends the Concepts of the Goal to the Filter, which returns a list of all the Web service descriptions whose concepts matched those of the Goal. This matching is done through pure keyword-based discovery and will be describe in more detail is section 5.2.1. and 5.2.2
Once the Filter has returned a keyword matched service to the Discoverer, the Discoverer invokes the Reasoner to perform simple semantic reasoning over the axioms contained within the Web service, and the goal. The response time for this reasoning is critical, as it is estimated that a large amount of Web services will be returned from the Filter. Should these Reasoner response times be high, then the Discovery component would experience problems when scaled up. Once the Reasoner has found Web services which match the required goal, it is returned to the Discoverer. The Discoverer now has a number of "Discovered" services which it can return to the Communication Manager. A diagram of how this Discoverer interoperates is show in Figure 2.
Figure 2: Discoverer Component within the WSMX Discovery Architecture
Data Mediation occurs prior to the Reasoner being invoked. The Mediation component will be invoked by the Fliterer and a full description of this action can be found in section 5.2.2.
Once the concept matching has been done, and the number of Web service descriptions which are deemed relevant has been reduced, then the Discoverer will begin rich semantic discovery. This is done by sending the axioms of the Goal, and of a Web service description to a reasoner within WSMX. This allows for the correct service to be discovered based on the functionality it can provide.
The Filter is essentially the keyword-based discovery component. The aim of this component is to search the Web service repository for Web services, and return a list of Web services to the Discoverer component whose concept names match those specified in the goal given by the Discoverer.
Firstly, the discoverer submits a list of concepts parsed from the Goal. This list of concepts is in the form of a java set object. Each instance with this object can then be written into a string for simple string matching. Once a match is found, the Web service is added to a list of 'matched' services. This list is returned to the Discoverer component for Semantic Discovery to be performed. It is not the task of the Filter to 'find' services which can fulfil the goal, it has the task of merely sifting out all Web services which provide a completely different service than the one requested in the goal. This filtering process hugely reduces the amount of work for the Reasoner, and thus reduces the amount of time required for Web service discovery.
Concept mediation is the next logical discovery step after keyword-based discovery. It is possible that two ontologies discussing a travel agency service, may use similar concepts without specifying the concept names as the exact same strings. For example, one may specify a concept called "Travel ticket". The other may specify the same concept as "Voyage voucher". In this instance, both concepts are seemingly the same, however pure keyword-based discover based on string matching would not recognise the similarity of these concepts.
The WSMX Mediation component [Mocan, Cimpian. 2005] is passed the Concepts of both the Goal ontology and those of a Web service description ontology. The mediation component attempts to match these concepts by mediating the concepts to check for similarity. Once a decision has been made by the Mediator as the whether or not the concepts match, the results are returned to the Filter. ALL results returned from the mediator are stored in a table. This table allows future concept matching to be done at a much faster rate. A diagram of how this Pre-Mediated Look-Up Table would look is shown in Figure 3.
Figure 3: Pre-Mediated Look-Up Table
How to write and read from look-up table?
The database table to be used as a look-up table will be completely independent in The look-up table consists of five columns. Each row contains both of the ontology concepts to be mediated and the Unique Resource Identifier (URI) of each concepts ontology. The namespace of each ontology is a URI. The final column will state whether the concepts were meaningfully matched.
The WSMX Mediation Component [Mocan, Cimpian. 2005] has the capability to mediate between any two concepts in an ontology given that it has the required mappings between both ontologies. For these mappings to be created, it requires a user with knowledge of both ontologies and the use of a mapping tool to create the associations. For the WSMX Mediation component, a view based mapping tool was created. The WSMX Data Mediation Mapping Tool [Mocan, Cimpian. 2005a] allows associations to be created between two ontologies using a visual interface. Once these associations have been created, the WSMX Mediation component can then mediate between the ontologies without user interaction.
Via comms manager or resource manager?
Do I also need to save ontology specific info in each line of the table to identify the ontologies of each match?
It is only when two concepts are not matched in this lookup table, that the mediation component will be invoked. As the WSMX usage scales upwards, this lookup table will dramatically reduce the time needed to check confirm concept matching.
[Keller et al., 2004a] Keller, U., Lara, R., and Polleres, A., editors (2004). WSMO Discovery. WSMO Working Draft D5.1v0.1. Available from http://www.wsmo.org/2004/d5/d5.1/v0.1/.
[Roman et al., 2004] Roman, D., Lausen, H., and Keller, U. (2004). Web service modeling ontology standard (WSMO-standard). Working Draft D2v1.0, WSMO. Available from http://www.wsmo.org/2004/d2/v1.0/.
[Web Services Gloassry]http://www.w3.org/TR/ws-gloss/#defs.
[Zaremba et al, 2005] Zaremba, M., Moran, M., Haselwanter, T. WSMX - Web Service Execution Environment, http://www.wsmo.org/TR/d13/d13.4/v0.2/
[Mocan, Cimpian. 2005]Mocan, A. Cimpian, E. (2005) WSMX Data Mediation. Available from http://www.wsmo.org/TR/d13/d13.3/v0.2/
[Mocan, Cimpian. 2005a]Mocan, A. Cimpian, E. (2005) Mappings Creation Using a View Based Approach, http://www.deri.at/events/workshops/mediate2005/
[Bussler, Fensel et al, 2005]Bussler, C. Fensel, D. et al. (2005) Web Service Execution Environment (WSMX), http://www.w3.org/Submission/WSMX/
The work is funded by the European Commission under the projects DIP, Knowledge Web, Ontoweb, SEKT, and SWWS; by Science Foundation Ireland under the DERI-Lion project; and by the Austrian government under the CoOperate program.
The editors would like to thank all the members of the WSMO working group for their advice and inputs to this document.
Discovery: The act of locating a machine-processable description of a Web service-related resource that may have been previously unknown and that meets certain functional criteria. It involves matching a set of functional and other criteria with a set of resource descriptions. The goal is to find an appropriate Web service-related resource.
Web service: A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.
Registry: Authoritative, centrally controlled store of information.
Component: A component is a software object, meant to interact with other components, encapsulating certain functionality or a set of functionalities. A component has a clearly defined interface and conforms to a prescribed behavior common to all components within an architecture.