Semantic Web Services are emerging as a promising technology for the effective automation of services discovery, combination, and management [Fensel et al, 2002]. Semantic Web Services aim at leveraging two major trends in Web technologies, namely Web Services and Semantic web. While Web Services technologies have positively influenced the potential of the Web infrastructure by providing programmatic access to information and services, they are hindered by the lack of rich and machine-processable abstractions to describe service properties, capabilities, and behaviour. As a result of these limitations, very little automation support can be provided to facilitate effective discovery, combination and management of services. Instead, current service discovery relies on simple keyword matching against natural language descriptions. Automation support is considered as the cornerstone to provide effective and efficient access to services in large, heterogeneous and dynamic environments.
Semantic web services are currently paving the way to provide the automation and dynamic composition of web service technologies thus reducing the effort that is taken when attempting to integrate applications, businesses and customers. A pivotal concept that enables the location of web services is that known as Discovery-"The act of locating a machine-processable description of a Web service-related resource that may have been previously unknown and that meets certain functional criteria. It involves matching a set of functional and other criteria with a set of resource descriptions. The goal is to find an appropriate Web service-related resource" [Web Services Glossary].
WSMO discovery [Keller et al, 2004] defines a conceptual model for locating services that (totally or partially) fulfill a requestors goal. Three major steps are described in this conceptual model;
Goal Discovery this about abstracting a concrete user goal to a pre-defined, reusable, and formalized goal. The user has to describe, with different levels of accuracy, his requirements and desires. This then enables a previously very individualistic goal become more generic and thus easier to match to the corresponding service.
Web service Discovery is based on matching abstracted descriptions of formailzed goals with semantic annotations of formalized web services and then selecting web services that can potentially be used to get the desired service.
Service Discovery uses the web services matched in the previous step to access the actual services that lay behind web service interfaces. A strong mediation step is required to meet the specific needs of a choreography of a Web service. A choreography defines the sequence and conditions under which multiple cooperating independent agents exchange messages in order to perform a task to achieve a goal state [Web Services Glossary].
This document is concerned only with the second step, Web service Discovery. Once potential web services have been located, the final functionality required by the Focused Crawler is to return a list of URI's. How these URI's are then used is outside the scope of this document. The fundamental requirements of the Focused Crawler is to locate potential web services and return their URI's.
The purpose of this document is to open up and examine the potential for actively searching the network for sematically annotated web services, be they in WSDL descriptions or in UDDI/ebXML registries, rather than the in-house effect of querying WSMX relational databases populated with web services that have been registered with WSMX. The latter approach is currently being investigated in D10 WSMX Discovery, D5.1 WSMO Web Service Discovery, D5.2 WSMO Discovery Engine
This document will provide some brief introductions to related approaches to web service discovery in Section 2. The mechanism upon which the Focused Crawler uses to discover web services, and the terminology and concepts surrounding this venture will be discussed in Section 4, with details of Architecture and Implementation in Section 5 and 6, respectively. Section 3 will provide some insight on how registries, such as UDDI and ebXML, could be enriched to provide WSMO ontologies for efficient web service discovery.
As described in [Keller et al. 2004a], keyword-based discovery is one possible approach to do web service discovery. Although it does not use well-defined semantics and is limited because of the ambiguity of natural language, keyword-based discovery can be used to filter and rank quickly the huge amount of available goal and service descriptions.
In order to design a good keyword-based discovery mechanism some questions must be answered. For example, over what properties of goal/service formalized description the keyword-based discovery must be performed? It is known that each component description in WSMO [Roman et al., 2004] has a nonFunctionalProperties section. A basic and intuitive approach is to match the keywords provided by the requester against the keywords from dc:subject or against extracted keywords from natural language descriptions provided in dc:description elements of nonFunctionalProperties. The same approach can be performed by considering other parts of goal/service description, for example dc:description elements of axioms that describe goal postconditions or service capabilities and postconditions. One aspect that must be taken into account is that nonFunctionalProperties descriptions are not mandatory. So in the case where the author of a goal/service does not provide any nonFunctionalProperties description, the goal/service can not be discovered using this approach.
Alternatively keyword-based discovery can be used to find relations between keywords provided by the user and concepts from the ontologies that are used to formalize goals/services. This basically implies to match keywords from the user request against the concept names from the ontologies.
Finally, another approach that can be considered is a keyword-based discovery over the logical formulae that are used in goal and service descriptions. A keyword-based on the predicates used is as well possible because usually their names are chosen having in mind the semantics of what these predicates actually do. In essence, a keyword-based search can be performed matching the keywords that the user specified on the one hand and the names of the predicates on the other hand. A pure keyword-based matching as described above can be easily improved using different techniques like: stemming, part-of-speech tagging, phrase recognition, synonym detection etc.
In this section we consider the approaches for semantics-based discovery for simple semantic annotations of web services that are discussed in [Keller et al. 2004a].
In [Keller et al., 2004a] we discussed a set based modeling approach for web services and outlined its formalization in First-order languages. There we did not impose any restrictions on the language to be used for defining web services and goal. Thus, it is in general required to use a fully-fledged theorem prover for First-order languages in order to check the proof obligations that have been given in that document. Hence, the candidate WSML variants that we consider for this approach basically are WSML Core, WSML DL as well as the monotonic part of WSML Full.
There are a bunch of candidate reasoners available, that could be used for a prototype implementation. We will have a closer look at two specific systems, namely Vampire and SNARK, and the features they support. Future versions of this document will elaborate on the selection of a concrete theorem prover for a prototype and specific aspects of how to implement the checks for the proof obligations given in [Keller et al., 2004a].
In the previous section, no restriction is posed on the allowed expressivity to describe goals and web services and, therefore, a first-order theorem prover is in principle required.
However, if we restrict the expressivity to WSML DL i.e. to a syntactical variant of OWL DL, we can e±ciently exploit subsumption reasoning. Available DL reasoners such as RACER , FACT++ or Pellet provide support for different DL language. RACER provides e±cient subsumption reasoning for SHIQ with incomplete reasoning for nominals. FACT++ supports SHIF(D).
Pellet provides sound and complete reasoning for OWL DL without nominal (SHIN(D)) and OWL DL without inverse properties (SHON(D)), and sound but incomplete for full OWL DL (SHOIN(D)). In [Li and Horrocks, 2003], RACER is used to classify web services in the T-Box, which is time-consuming but can be done off-line when publishing them. Once the T-Box is classified, experimental results show that checking the subsumption relation between the user request and the web services in the T-Box can be done within 20 milliseconds.
In addition, and as it was shown in the Semantic Web Fred (SWF) project (see [Keller et al., 2004b]), using more expressive logics (First Order Logic) for set based modelling has consequences on the computational complexity on the one hand, but on the other hand the use cases tested did not show the need for additional expressivity with respect to the expressivity supported by current DL reasoners. In addition, the testing that was done in the context of the SWF project show that using a theorem prover for set-based discovery in principle implies checking every available service individually. For a reduced test set of four available services, the theorem prover required between one and two seconds to determine the existence of a matching service. Under the assumption that eventually a big number of services will be available to the discovery engine, a faster filtering of relevant services before evaluating them one by one is essential to make the discovery scalable. These results clearly suggest that using DL reasoners to index web services and/or pre-defined goals can be a useful and efficient mechanism to restrict more detailed (and probably more expensive) checking to a (ideally small) subset of all the available web services.
The use of pre-defined, formalized goals is expected in WSMO discovery. If we restrict such goals to be described in WSML DL, we can classify them online in a DL reasoner T-Box. When publishing a web service, its subsumption relationship with the classified pre-defined goals can be checked, and a wgMediator [Roman et al., 2004] can be generated to link the web service to a pre-defined goal it (totally or partially, depending on the subsumption relation computed) fulfills. In this way, and as goal discovery should result on a (refined) pre-defined goal, and web service discovery is expected to require matching these refined pre-defined goals against published web services, web service discovery is reduced to exploring the web services linked via wgMediators to the used pre-defined goal.
In [Keller et al., 2004a] we discussed the extension of the set based modeling approach for web services to rich semantic descriptions which capture the actual relationship between inputs and outputs/effects of web service executions as well and gave the formalization in first-order languages.
In principle, the language expressivity needed for expressing the discussed proof obligations is the same as in the case of simple semantic annotations (without restrictions on the service and goal descriptions), that means we have to do reasoning over First-order formulae. Thus, it is in general required to use a fully-fledged theorem prover for First-order languages in order to check the proof obligations that have been given in that document. Basically, the very same techniques and systems that we discussed in Section 2.2.1 can be used for an implementation here. The candidate WSML variant that we consider for this approach basically is the monotonic part of WSML Full.
The use of transaction logic [Bonner and Kifer, 1998] for web service discovery has been described in [Keller et al., 2004a]. An implementation of a slightly different proof obligation for web service discovery using transaction logic has been presented in [Kifer et al., 2004].
At the moment, is not completely clear what WSML variant will be required. Based on the work done in [Kifer et al., 2004], we envision that built-in equality and meta-modeling will be required. However it is not clear whether non- stratified negation will be needed. In addition, rules reification will be most likely required for the description of pre-defined goals and web services. This is not explicitly considered in any of the WSML variants, but it could be added to WSML-Rule.
Therefore, we estimate that WSML-Flight or, more likely, WSML-Rule will be used. Additionally, transaction logic will be obviously used to check the proof obligation defined in [Keller et al., 2004a]. However, this does not pose any extra requirement on the language needed to describe goals and web services, but only on the language supported by the prover. Currently, FLORA-2 is the only prover which supports transaction logic.
This section describes the reasoning behind the approach this implementation took to discover Web services and introduces some concepts and terminology that surround Focused Crawling and Web service Discovery.
Firstly, the definitions of a few terms that are used consistently throughout this paper must be defined as per the W3C [Web Service Glossary].
Discovery The act of locating a machine-processable description of a Web service-related resource that may have been previously unknown and that meets certain functional criteria. It involves matching a set of functional and other criteria with a set of resource descriptions. The goal is to find an appropriate Web service-related resource.
Web service A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.
Registry Authoritative, centrally controlled store of information.
Component A component is a software object, meant to interact with other components, encapsulating certain functionality or a set of functionalities. A component has a clearly defined interface and conforms to a prescribed behavior common to all components within an architecture.
The conceptual mechanism for the Focused Crawler could be compared to a human trying to find a particular service, for example, a book shop. The human in this scenario would most likely search through available repositories of services provided by businesses such as the local white and yellow page directories. These directories can be compared to online service registries such as UDDI and ebXML.
It is this simple act of a human searching through known registries for a service that this Focused Crawler wishes to replicate in the semantic Web services world. The reason for this is that “…it turns out that the neurologically plausible autonomous robots/agents achieve higher behavioural flexibility and goal attainment and are superior to the ones that count on complex but exact computational heuristics” [Dimitrova et al, 2004].
The initial step for the Focused Crawler would be to string search for Web services whether they be in WSDL interfaces, UDDI or ebXML registries then introduce mechanisms for logic based approach to discovery. The phases completed will be discussed in detail in the proceeding sections. The phases that are uncompleted will be examined in the final chapter.
This section details the internal architecture of the Focused Crawler and where this fits into the WSMX Discovery architecture that has been designed in [Keller et al. 2005].
The Internal components of the Focused Crawler consist of a Focused Manager which invokes the next tier of components; WSDL Crawler, UDDI Crawler and ebXML Crawler (see Figure 1).
Figure 1: Focused Crawler Architecture
Each component has as input the Web service to be discovered and gives as output a list of URI’s that potentially match the request. The input can be in two forms, a string or a WSML message. The list of URI’s is then stored in the [YARS] database in RDF triple format. Reasons for choosing the YARS database will be discussed in the next chapter.
The Focused Manager firstly sends the requested Web service message to the WSDL Crawler which if no Web services are returned then the UDDI Crawler is invoked. If no Web services are returned by the UDDI component then the ebXML Crawler is invoked.
The WSMX discovery component is currently designed to do undertake a logic-based approach to discovery by querying relational databases that contain registered Web services. The motivation for the Focused Crawler is due to the fact that if the Web service is not located within the WSMX Web service repository then it will obviously not be discovered. So the Focused Crawler materialized to actively seek Web services that are located outside of WSMX repositories.
The reason a Crawler materialized and not a Peer-to-Peer mechanism is due to the fact that presently none of the mentioned Web service registries (UDDI, ebXML) nor the languages (WSDL) are linked to each other. In the last chapter we will discuss some future directions to alleviate this problem and make some suggestions on how to link these technologies.
Currently, the WSMX discovery approach consists of a Communication Manager (see Figure 2) which handles incoming requests which is then analyzed by the Message Parser. The Message Parser extracts the message components such as post conditions and passes them on to the Proof Generator. Depending on the requested type of match the Proof Generator constructs logical formulas e.g. a subsumption check between two class description for a plug-in match. This step already requires descriptions of Web services. Both the incoming WSML message and Web services will be translated into the native format required by the target reasoner.
Figure 2: WSMX Discovery.
The Reasoning Manager abstracts from the different interfaces of the reasoners and offers a common API to the Proof Generator. It is also responsible for balancing the work load for the reasoners and network protocol issues.
The amalgamation of the Focused Crawler and the Discovery component within WSMX would then be as follows; if the Web service is not located within the WSRepository then the Communication Manager would then invoke the Focused Manager by sending it the original message. The YARS database would be queried for this particular Web service and if none is present the respective Crawlers would each be invoked until the Web service, if it exists, is discovered.
The implementation can be divided up into 9 development stages, 1-3 being a keyword based search for WSDL interfaces of a particular Web service and 2, 3 querying UDDI and ebXML registries respectively. Development stages 4-6 would be the decomposition of incoming messages as simple semantic logic expressions in WSML and matching these against simple semantic descriptions of Web services. Stages 7-9 would be lifting the previous stage to deal with rich semantics.
So far, development stages 1-3 of the Focused Crawler have been completed. This has been achieved by using the [Google SOAP API] which given a keyword will discover WSDL interfaces based on matching the keyword with the name of the WSDL file. The WSDL files are retrieved from the web and stored in YARS in the triple format <URI><WSDL name><content of WSDL>. The database is then available for querying based on the keyword.
The reason YARS is preferred is that it is a small lightweight package only 350kb, it is twice as fast as other retrieval systems namely, Sesame MySQL, Native and Redland, and it also supports contexts [Harth and Decker, 2005]. Also, the coding has been done in Python which works well with YARS. YARS is queried using the N3 query format
Some issues that have already arisen are the fact that WSDL’s are not semantically annotated to allow efficient keyword based discovery. For example, if the crawler is given a keyword ‘books’ we would hope to retrieve all WSDLs that sell books but in actual fact Web services like ]Amazon] don’t appear in the retrieved WSDL’s as the service is called ‘amazon’ and not bookseller or something similar.
This deliverable has discussed one possible approach for locating Web services that are located outside of WSMX. The architecture is in place to dynamically discover Web services posted in WSDL files or registries such as UDDI and ebXML. Some work has already begun for the retrieval of WSDL interfaces and the problems that arose when attempting this have been reviewed.
The next stages of development would be to access the UDDI registry via their [UDDI API] and then begin to query public online registries for Web services, firstly with keywords then progressing and eventually to rich semantic descriptions of Web services. Work has already been done with ebXML registries to enrich them with OWL ontologies [Dogac et al2004], which could also possibly be carried out for WSMO ontologies. If it possible to do in one registry then it would be possible to enrich UDDI registries with WSMO ontologies also.
[Amazon] http://www.amazon.com/exec/obidos/tg/browse/-/283155/103-4123548-1979824
[Bentallah et al. 2003] Request Rewriting-based Web Service Discovery, 2003.
[Bonner and Kifer, 1998] Bonner, A. and Kifer, M. (1998). A logic for programming database transactions. In Chomicki, J. and Saake, G., editors, Logics for Databases and Information Systems, chapter 5, pages 117{166.Kluwer Academic Publishers.
Dogac et al2004] Dogac, A., Kabak, Y., Laleci, B.G., Enriching ebXML Registries with OWL Ontologies for efficient service discovery, METU, Turkey.
[Dimitrova et al, 2004] Dimitrova, M., Barakova, E., Lourens, T., Radeva, P. The Web as an Autobiographical Agent at AIMSA 2004.
[Fensel et al, 2002] Fensel, D., Bussler, C., The Web Service Modeling Framework WSMF. Electronic Commerce Research and Applications 1 (2002).
[Google SOAP API] http://www.google.com/apis/
[Harth and Decker, 2005] Harth, A., Decker, S. Optimized index structures for Querying RDF from the Web. Available from http://sw.deri.org/2005/02/dexa/yars.pdf
[Keller et al., 2004a] Keller, U., Lara, R., and Polleres, A., editors (2004). WSMO Discovery. WSMO Working Draft D5.1v0.1. Available from http://www.wsmo.org/2004/d5/d5.1/v0.1/.
[Keller et al., 2004b] Keller, U., Stollberg, M., and Fensel, D. (2004b). Woogle meets semantic web Fred. In Proceedings of the Workshop on WSMO Implementations (WIW 2004), CEUR-WS.org/Vol-113/.
[Keller et al. 2005] Keller, U., Lara, R., and Polleres, A., editors. WSMX Discovery, (2005) Available from http://www.wsmo.org/TR/d10/v0.2/
[Kifer et al., 2004] Kifer, M., Lara, R., Polleres, A., Zhao, C., Keller, U., Lausen, H., and Fensel, D. (2004). A logical framework for web service discovery. In Workshop on Semantic Web Services at ISWC 2004.
[Li and Horrocks, 2003] Li, L. and Horrocks, I. (2003). A software framework for matchmaking based on semantic web technology. In Proceedings of the 12th International Conference on the World Wide Web, Budapest, Hungary.
[Roman et al., 2004] Roman, D., Lausen, H., and Keller, U. (2004). Web service modeling ontology standard (WSMO-standard). Working Draft D2v1.0, WSMO. Available from http://www.wsmo.org/2004/d2/v1.0/ .
[UDDI API] Accessing UDDI API and Microsoft API extensions interfaces. https://uddi.microsoft.com/publish.
[Web Services Gloassry]http://www.w3.org/TR/ws-gloss/#defs .
[YARS] Yet Another RDF Store, http://sw.deri.org/2004/06/yars/yars.html
[Zaremba et al, 2005] Zaremba, M., Moran, M., Haselwanter, T. WSMX - Web Service Execution Environment, http://www.wsmo.org/TR/d13/d13.4/v0.2/
The work is funded by the European Commission under the projects DIP, Knowledge Web, Ontoweb, SEKT, and SWWS; by Science Foundation Ireland under the DERI-Lion project; and by the Austrian government under the CoOperate program.
The editors would like to thank all the members of the WSMO working group for their advice and inputs to this document.
The following major updates have been done since the Previous version of this deliverable:
Discovery The act of locating a machine-processable description of a Web service-related resource that may have been previously unknown and that meets certain functional criteria. It involves matching a set of functional and other criteria with a set of resource descriptions. The goal is to find an appropriate Web service-related resource.
Web service A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.
Registry Authoritative, centrally controlled store of information.
Component A component is a software object, meant to interact with other components, encapsulating certain functionality or a set of functionalities. A component has a clearly defined interface and conforms to a prescribed behavior common to all components within an architecture.