Final
by
G. Wade Johnson
Advisor
Dr. Stephen Huang
I would like to express my sincere gratitude to my advisor, Dr. Stephen Huang, for his guidance and patience during this research. I would also like to thank my co-workers at Telescan, Inc., in particular Dr. Richard Carlin, Rick Hoselton, Tuyen Tran, John Gallagher, Julie Liu, Kathy Hoang, and Julie Carroll, all of whom provided some insight into this problem and its solution. Most importantly, I'd like to thank my wife, Debbie Campbell. Without her urging, I would not have started this degree, and without her support and encouragement, I would not have completed it.
As web sites become larger and more dynamic, the difficulty of developing and maintaining them becomes more apparent. This thesis explores using higher-level abstractions to design web sites. In particular, a new Web Site Description Language (WSDL) is defined for describing the structure of a web site at a high level of abstraction. A Page Layout Language (PLL) is used to describe the general presentation of individual pages. Both of these languages are defined as Extensible Markup Language (XML) applications, allowing them to benefit from tools and libraries designed to support XML. These two high-level descriptions are combined with the content of the pages in a compilation phase that creates the entire web site. This extra phase allows for good separation of web site structure, page presentation, and content, with little cost at request time. In a commercial environment, earlier versions of the system have shown improvements in the areas of web site design, implementation, and maintenance. Preliminary testing shows that WSDL may generate even further improvements.
When the World Wide Web first began, most web sites were small. They could be built and maintained by a small number of people. But the Web has been growing at an ever increasing rate. The number of large web sites is also growing. Many of these web sites are constantly being revised. As web sites grow larger and more time and resources are applied to them, it becomes obvious that the old, ad hoc method of design is not working.
Some of the problem may be caused by an incorrect focus. Much of the development on the World Wide Web treats web sites simply as collections of web pages. In this context, a web page is a single HyperText Markup Language (HTML) document with its included images and style sheet information. But a web site is more than a collection of pages. The interconnections and navigation among the web pages and consistency of presentation and design of the pages determine the user experience and usability of the web site.
The Web Site Description Language (WSDL) uses XML to address page presentation and site structure. This web site description can be used on top of systems that use external content sources, such as databases, to build dynamic pages. The result is a system for generating flexible, maintainable, realistic web sites. Most importantly, the output from this system can be displayed by current browser technology, because it is pure HTML.
Many other systems have been designed to generate a dynamic, robust web experience. These solutions range from fully dynamic pages using CGI scripts, through various forms of include technology and server-side scripting. Almost every one of these systems focuses on page generation. The only site-level support most of these systems provide is the ability to include common code or HTML in all pages.
WSDL, on the other hand, focuses on the web site as a whole. The output from the WSDL processor may even be used as input into one of the other systems. WSDL is not a complete replacement for any of these other technologies. It is an enhancement to any form of web site development. In addition, WSDL is designed to scale smoothly to larger, more complicated web sites. Finally, WSDL is designed to allow easy replication of site structure and page presentation. This feature is very useful for developers who must maintain or develop multiple semi-independent web sites.
Common Gateway Interface (CGI) scripts are a solution to a different problem. The CGI specification was developed at a time when most of the content on most web sites was static. The CGI script would then provide a limited amount of dynamic content on a fraction of the site.
The advantage of CGI was the relative ease of adding dynamic content, compared to altering the web server itself. The two main disadvantages of CGI are speed and the need for a separate program. Many current web developers are not programmers and, therefore, are wary of approaches that require programming knowledge.
CGI scripts are still useful and will probably continue to be useful for years to come. However, they are not suitable for the construction of a large production site.
Server Side Includes (SSIs) were created as a way to include boiler plate text into web pages. Early in the evolution of the web, the need for common sections of HTML on many pages in a site was recognized. This was an early attempt to begin creating web sites instead of web pages.
However, the focus of SSI is still web pages. Later, as other functionality was added, SSI evolved into the various forms of Server Side Scripting.
Most of the major approaches to dynamic content in use today are based on Server Side Scripting. Examples of this approach include ASP, JSP, PHP, XSP, Cold Fusion, and many others. The basic idea is simple: place a scripting language inside special markers in the HTML content. The web server recognizes these markers and interprets the content of these markers to generate the dynamic portion of the page.
Most scripting languages are chosen to be simplified forms of a general purpose programming language. The hope is that a mere web developer can understand these, without the need for advanced programming skill.
In practice, the developer does need to be able to program in order to do anything significant. The interpretation phase slows the response of the server as the number of page requests in a given time goes up. And last, but not least, this approach is still page-centric.
The Cocoon system is being developed as part of the Apache project. It has a few things in common with the WSDL system[12]. There is a strong focus on separation of XML generation, processing, and presentation. As described in Example: Cocoon, Cocoon relies mostly on convention to keep consistency across generated pages. Like most web site production systems today, Cocoon is definitely page-centric.
One interesting possibility for further research is the use of WSDL and the skeleton presentation system to generate the templates that some of the Cocoon layers use as input. This would help to enforce consistency where possible, yet still gain the impressive run-time benefits of the Cocoon system.
From the very beginnings of the Web, there was always an alternative when functionality beyond the capability of normal web servers was needed. A web server is not a terribly difficult program to create. Some people used this fact to create special-purpose web servers to support the functionality they needed.
This approach gives all of the flexibility that anyone could require. However, the cost is pretty high. The developer must maintain the server himself. He does not get the benefit of upgrades provided by the server vendor. He must build any needed support, unless a standard server can be used as a starting point.
In spite of this disadvantage, some companies have built their own web servers connected to proprietary databases and processing solutions on the back end. If designed well, these systems can outperform a standard server on some kinds of requests. They can also perform functions that a normal web server cannot match.
These kinds of systems do not compare with WSDL at all, because they are solving a very different problem. However, if a proprietary system can support some form of template system, that allows it to read HTML from disk to use in formatting the output, WSDL can simplify the customization of the output of this proprietary web server.
Quite a bit of background material is necessary to present WSDL. Small web sites of a dozen or so pages might not benefit as much from a tool like WSDL. An Overview of Web Development describes some of the problems inherent in large, production web sites. These problems are the reason WSDL was created. WSDL is built on a relatively new technology, the Extended Markup Language (XML). A large amount of hype and misinformation surrounds XML at this time. An Overview of XML attempts to explain the issues surrounding this language. XML in Current Use describes some of the ways that XML is being used at present.
A High-Level Approach to Web Site Design describes the design goals and assumptions that are part of the WSDL system. Any complicated system goes through multiple iterations of design and testing. WSDL is no exception. The Evolution of WSDL describes the various early implementations of the system that would eventually become WSDL.
The appendices fully document the WSDL and PLL languages, along with complete Document Type Declarations (DTDs) for each. The Source for the Example Site gives the source for an example site which shows WSDL and PLL in use on a small, but functional, web site.
In order to see the need for WSDL, it is necessary to examine the problem more closely. Some of the difficulties with developing large web sites can be related to a lack of software development experience, misunderstanding of scale, and misestimation of maintenance cost.
Most people designing and constructing web sites today have little or no experience in software engineering or information architecture[7]. Quite a few have no background in presentation or graphic design either. In many cases, a web developer's sole qualification for the job is access to Microsoft FrontPage or some other WYSIWYG tool.
This lack of experience often shows in the developer's focus. A beginning web developer tends to focus on individual pages. Depending on the developer's background, most of the emphasis is placed either on content or presentation. At this stage, navigation or structure of the site is almost always an afterthought. Consistency, if it happens at all, is usually an accident.
As most web developers gain experience, they become more adept at the various display tricks used to deliver more interesting-looking pages. Fortunately, a few begin to realize that site structure is also important. However, even at this stage, most web developers consider the site structure and page layout to be entwined as the concept of presentation. As a side effect, the developer usually spends too little time on the overall site structure and too much time on the details of the site's look[4].
Most people, including most web developers, tend to think of a web site as a collection of pages[7]. This viewpoint is a natural one considering all they see and build is the pages. This approach to web site design is fine when designing sites of less than a dozen pages. However, this approach does not scale to web sites containing hundreds or even thousands of pages. As a web site grows in size, the designer must focus more on the structure and consistency of the overall web site. Users are not as impressed with an individual page if they cannot find what they need on the web site.
When designing a web site, the developer must focus on the intended audience, the information to be presented, and the relationships between various pieces of information. These issues give rise to the overall look of the site, as well as its navigation model[4].
In Web Navigation: Designing the User Experience, Jennifer Fleming describes several different kinds of web sites, including shopping sites, community sites, entertainment sites, and information sites[4]. Each of these general kinds of site has a different goal and approach that should be supported by a different design. This level of design is difficult to maintain across all of the pages of a web site that is changing over time. Without some way of capturing this high-level design, developers usually do not have a chance of maintaining consistency in the face of change.
In many production web sites, the change requests begin the moment the web site launches. Some may be issues that were deferred until Phase Two. Others may involve perceived ``cool'' features seen on other sites. As the developer makes these changes, the design of the web site, which was so firm only a few days before, can begin to blur. Maybe good business reasons or politics obscure the clarity of the original vision. In any case, changes can soon override the original design.
If no real solid reference for the design exists, it is very hard to adapt the design to change. It is even harder to adapt the design to requested changes if no one remembers the design itself. In effect, if no one can point to the design, one can argue that it does not exist. Unfortunately in many instances, if there is a design document, it is not part of the code. Over time, the code diverges from the document, back to the state where no design exists.
When a web site is maintained without a concrete design, any changes made during maintenance are likely to cause inconsistencies in the site. Part of this effect is the entropy affecting any complex system. But the more important portion of the problem is the difficulty of recognizing the inconsistencies as changes are made. With the details that describe the overall design spread out in a thousand files, it is extremely hard to see the design of the web site breaking down, but the symptoms are there. Bug fixes or corrections do not show up everywhere. Different portions of the site are not consistent. Lastly, changes that should take ten minutes consume hours of development time.
A high-level Web Site Description Language (WSDL) could help to alleviate these issues. If the overall design of the site is described in one place, a single senior developer or a small group of senior developers can provide the insight and experience to design the web site. This approach allows the experience of the senior developers to be spread across more projects and still use more junior developers for most of the development work.
By describing the web site at a higher level, WSDL encourages the developers to focus on the web site as well as the individual pages. By reviewing the WSDL description of the web site, the developers can get an overview of the entire system. Issues such as structure and site consistency are easier to judge at this higher level of abstraction.
WSDL helps to document the overall design of the site in the best of all possible locations, the actual source code for the site. The advantage of this approach is the fact that the source and documentation cannot diverge, because there is only one copy of the design. Also, by keeping the design in this fashion, it is possible to go back to read the source and determine what the designer of the site intended. The ability to read the overall design in the source helps the maintainer of the web site to retain consistency when possible, and to adapt the design in other cases.
A language designed for this purpose should be declarative: it should define what is needed, not how to implement it. It should be easy to write, parse, and verify. The next chapter describes the Extensible Markup Language (XML), which offers an appropriate syntax for defining such a language.
The Extensible Markup Language (XML) is a not a language for marking up text. It is not a replacement for the HyperText Markup Language (HTML). XML is a standard syntax for in-line markup for use in text documents. XML also includes facilities for defining a set of markup elements that are used together as an application. To really explain XML, however, requires a little history.
In order to discuss the origins of XML, it is necessary to review two other markup standards, the Standard Generalized Markup Language (SGML) and HTML. Both of these standards influenced the design of XML in many ways.
SGML was created in an effort to standardize markup systems[25]. In order to handle all of the features of the markup systems of the time, the SGML design was comprehensive, with many optional features and shortcuts. These features made for a very powerful system. These same features and shortcuts make SGML tools difficult to program.
SGML is not actually a markup language, it is a meta-markup language[3]. SGML provides a language to support the definition of new markup languages that are called applications. All SGML applications have a similar structure. Unlike many proprietary systems, all SGML markup uses normal printable characters. Markup is defined in terms of elements and text. Elements are made up of tags and attributes. All elements, tags, and attributes must follow a well-defined format[32]. This simplifies validation and translation of the documents.
Separate from the issue of format is the validity of the tags used in a particular document. SGML describes the use of a Document Type Definition (DTD) to specify which SGML application pertains to the document. A processing system could then use information from the DTD to validate a particular document.
The HyperText Markup Language (HTML) has been an important factor in the success of the World Wide Web. HTML was based on SGML but was not as formalized in the beginning. Contrary to popular usage, HTML was designed to describe the structure of a document, not its presentation. To quote the HTML Home Page:
For most people the look of a document - the color, the font, the margins - are as important as the textual content of the document itself. But make no mistake! HTML is not designed to be used to control these aspects of document layout.[31]However, the first browsers defined a default presentation for many of the structural elements. As a result, people focused on the presentation aspects of HTML and the structural meaning of most tags was forgotten. When the primitive presentation support in HTML was found to be inadequate, vendors, like Netscape and Microsoft, extended HTML to support more presentation control.
In addition, the original browsers were defined to be extremely forgiving in their dealings with HTML[39]. This tended to support a large number of people being able to create HTML with little or no training. Many people began to rely on the side effects of the browser's interpretation of invalid HTML. This scenario has led to very complex software that tries to recover from almost any mistake the HTML author may make.
Another disadvantage of this forgiving approach to HTML interpretation is the difficulty of creating and maintaining an HTML parser and display system. When the World Wide Web first became popular, many online companies had their own browsers. Unfortunately, these browsers were all inconsistent in their handling of HTML. As HTML evolved, each company had to make changes in their proprietary code to deal with the new functionality. One by one, these vendors dropped out of the market, leaving only a handful. Today, that handful is mostly competing in how nonstandard they can be. If HTML had been more standard, it would have been possible to build standard parsing libraries that could be used by multiple browsers.
The design of XML tries to take the best features of both SGML and HTML while leaving behind their worst disadvantages. Like SGML, XML is a language for defining new markup languages. But, XML uses a much simpler feature set than SGML, making it easier to parse. Unlike HTML, all XML-based documents must be well-formed. This means that XML parsers and viewers must report any errors detected when reading a document. They are not allowed to guess what the XML author meant and then continue. This makes parsers and other tools much easier to build[21]. Moreover, the presentation aspects of an XML document are defined in a separate file. This reduces the temptation to modify an XML document for the sake of presentation.
One of the major roadblocks to SGML spreading across the Web is the difficulty of implementation of tools that fully support SGML[21]. Unlike SGML, XML was designed with simplicity and implementation in mind. Many of the optional features of SGML have been dropped. This has already resulted in a much simpler job for developers who wish to build XML tools.
The structure of an XML document is similar to that of an HTML or SGML document. The following is a fragment of an XML document:
<para type="example" align="none">This is a <em>short</em> example paragraph, containing three elements.<xref ref="examples"/></para>
An element is defined by start and end tags and the content they surround. The whole example above is one para element. The content of an element can be text, markup, or both. In the example, the content of the para element is two pieces of text, an em element, and an xref element. The content of the em element is the text short. The xref element has no content.
The start tag is distinguished from the rest of the text by starting with the character < and ending with the character >. The start tag begins with a name and may have optional attributes before the closing >. In the example, the para element has two attributes, type and align.
The end tag for an element starts with the two character combination </ and ends with the > character. Nothing is allowed in the end tag except the name used in the start tag.
An element with no content can be denoted by an empty-element tag, like the xref element in the example. An empty element tag is just like a start tag except it ends in the two characters /> and it does not contain any content[21].
Attributes can only appear in start tags or empty-element tags. All attributes take the form of a name followed by an equal sign (=) followed by a value in either double or single quotes. An attribute may only appear once in a given tag. Unlike HTML, the quotes are required in XML attributes[16].
Elements are also required to nest properly. If an element begins inside another element, it must also end inside the same element. This requirement for well formed markup is probably one of the most commonly violated rules in HTML.
One suggested approach to the use of XML involves serving XML in exactly the way web sites currently serve HTML. However, unlike HTML, an XML document probably contains elements that the browser does not know how to display. In fact, many XML documents will consist of markup that the browser does not know how to interpret. In order to display XML as a web document, some form of style sheet is needed to explain formatting information to the browser[17].
There are several standards for style sheet languages available on the Web. The two most commonly associated with XML are Cascading Style Sheets (CSS) and Extended Stylesheet Language (XSL) style sheets[24]. CSS style sheets are currently in widespread use on the Web for HTML. Although not consistently supported by the major browsers, CSS is the most standard way of separating formatting information from HTML content.
XSL, on the other hand, was developed exclusively for use with XML. XSL is a very ambitious system including support for extensive rewriting of the XML input using the XSL Transformations (XSLT)[19] subset, as well as a comprehensive formatting model. The biggest problem with XSL at the moment is the lack of browser support.
XML can be used to define markup languages for specific kinds of documents.
A good example is Extensible HyperText Markup Language
(XHTML). XHTML is a new version of HTML rewritten to conform to the rules of XML.
In general, XHTML documents can be displayed by HTML browsers except for a few
minor issues. These include the proper termination of empty elements (<br/>
instead of <br>) and non-minimized attributes (nowrap="nowrap"
instead of nowrap
).
Any group can define an XML application that describes markup for the particular kinds of documents they use. This markup describes the structure of the documents in a way that makes sense for them. For example, instead of forcing their widgets catalog to fit HTML, which is a format defined for internal documentation at CERN,[15] they can use a reasonable catalog format specified in XML.
Another good example of XML used for document markup is DocBook[11]. DocBook is an SGML application for writing many kinds of documents. After the XML recommendation was released, the maintainers of DocBook began a conversion process to make DocBook fully XML-compliant.
In some cases, it is useful to include portions of one XML application in another XML application to reuse a working design. For example, several XML applications include XHTML (or just HTML) as content for some elements in places where they need simple presentation markup. People who are familiar with HTML will recognize the tags and have little problem formatting text for those areas.
One of the more interesting uses of XML has absolutely nothing to do with documents. XML is actually a very good format for describing complex data structures[5]. Nested data is often difficult to transfer between two programs, systems, or even separate runs of the same program. An application does not need to be extremely data intensive to have this problem. For example, manipulating the configuration information for a complex program can be a relatively major undertaking on its own. Unfortunately, that time is often better spent on the main purpose for the program.
XML supports a natural format for hierarchical data. Because it follows a standardized format, many people have written parsers and libraries for reading and parsing XML[21]. As a result, a programmer can use this format without writing all of the code needed to read it. As more code is written to support both reading and writing XML, it will become easier for programmers to use XML as a data interchange and storage format.
Since XML is mostly human-readable, it is easier to verify and transform data in XML than in some proprietary binary formats[5]. Moreover, XML is inherently extensible. It is relatively easy to write code that ignores unrecognized elements and attributes. This makes it easier to write code that can deal with newer versions of the data than they were written to support. Newer versions of the code can also deal with older files without a huge amount of code being devoted to transforming the data.
Probably one of the most exciting twists on this idea is using XML as a serialization format for objects. Serialization is a process in which objects are converted to a form that survives the termination of the current process. This often involves writing the object to disk. Each element can have not only a value, but also type and context information that can help in reconstructing objects. Some have even suggested the possibility of serializing an object from one language and reconstructing an equivalent object in another language[10].
XML as a data-interchange format has become such an important idea that several efforts are underway to develop a reasonable data-typing system in XML. These schema languages were conceived to overcome a fundamental shortcoming in XML. XML has only one real data type: text. Since it was designed to mark up text documents, this is not surprising. However, now that XML deals with other kinds of data, a more complete typing system is required.
XML defines languages in which documents and data are written. In order to make use of this information, a program uses a parser to convert the text input into a form more suitable for processing.
Several standard XML parsers have been built in different languages, including C, Java, and Perl[21]. In general, these implementations include full Unicode support and may support validation against a Document Type Definition (DTD). Since many of these parsers are freely available, there is very little need to write another XML parser.
Standard parsers can be divided into four types. A parser can be validating or non-validating. This term refers to whether or not the parser compares a document against its DTD to ensure that it conforms to the definition. In addition, the programming interface to the parser can be either tree-based or event-based[9]. Tree-based parsers parse XML and return a tree of objects that corresponds to the XML. Event-based parsers, on the other hand, call event handlers when certain important events occur in the processing of the XML, such as when a start tag is encountered or when text context has been identified.
Since all XML must be well-formed, it is not difficult to build a quick parser that supports a subset of XML. In particular, if a programmer only plans to deal with straight ASCII text and work with relatively small XML files, he may decide to construct a quick parser on his own. Unlike HTML, XML makes this relatively easy. This means that even small applications that do not need the overhead of Unicode and validation can benefit from XML[42].
Although writing an XML parser from scratch is usually not necessary, some programmers will probably do it for small applications. In some cases, the programmer may reason that the overhead of learning and integrating an available parser is not worth the benefit. In other cases, the programmer may not be aware of the available offerings. Whatever the reason for this decision, it was one of the fundamental design criteria of XML. Design goal number 4 from the XML specification states: It shall be easy to write programs which process XML documents.[3]
XML editors are available from many companies. Examples include Merlot by ChannelPoint, Swish by Zveno, Epic Editor by Epic, XML Pro by Vervet Logic, and WordPerfect by Corel. There are three categories of XML editors: text editors with XML features, structure editors, or editors that support both.
Text editors with XML features allow editing of XML as a text file. This gives the maximum amount of control over the layout of the XML. This kind of editor may validate against a DTD and may support other features such as element completion. Another very useful feature supported by some of these editors is well-formedness checking. This is similar to grammar or spell checking in a word processor. It does not guarantee that the output makes sense, but at least it looks like it makes sense.
Structure editors usually display a tree-like structure of elements and attributes. The editor provides the ability to change the values of attributes on an element or add or modify elements and text. These kinds of editors do not need a check for well-formedness, because there is no way to enter data that is not well formed.
XML editors that support both text and structure editing usually contain multiple views that allow editing in either style. Some of the more advanced editors support both views open at the same time, with the two views kept synchronized during an edit session.
XML has certainly been a magnet for hype lately. The pitch normally goes something like this:
In reality, XML does give the potential for better definition of Web content. Instead of pages devoted to nothing but display tricks, XML could potentially give Web content that is much more searchable and customizable[22]. However, the key word here is potential. Much careful work is needed for even a fraction of this potential to be realized.
In order to realize the potential of XML, people need to develop useful application definitions. Many of the applications written today are either huge standards that attempt to do everything or small special-purpose languages used inside a single company or group. As XML matures, more people should develop general purpose applications that are small enough to implement and work with. In addition, the markup is not particularly useful if no one can understand it. As XML applications become more commonplace and well-specified, search engines may be able to tailor a search based on the content of particular pieces of markup. Until then, the engines will probably continue to search all of the text, regardless of markup.
XML does provide a promising format for dealing with database records that need to be placed online. The ability to create specialized document formats is definitely an advantage to those who are using the Web for online catalogs and commerce. The web sites that do provide their output in well-designed XML will definitely be easier to search. These web sites will also be able to provide much more usable and useful content to their users.
The benefits of XML to businesses that wish to distribute data to other businesses on the web are beginning to be recognized. XML can be used to describe and transfer data nicely. If rendering is not an issue, XML can still be used as an effective trade language between many different kinds of systems.
The major disadvantage of XML is its verbosity. The text-based format specified by XML is much more verbose than a binary format for the same data[10]. Most XML documentation goes on to suggest that this disadvantage can be reduced by data compression of some kind.
XML is technically superior to HTML and easier to process. Unfortunately, as has been shown in many parts of the computer industry, technical superiority does not ensure automatic acceptance and widespread use. If most web sites continue to use HTML exclusively, XML's technical superiority does not matter.
Ironically, the biggest roadblock to XML's acceptance may be its processibility. A large portion of the driving force of the Web today is commercial. Many sites either promote a particular company or display ads that are sold to generate revenue to finance the site. With XML's ability to describe content, it would be easy to build more applications for filtering out the ads and only returning the meat of the site. Moreover, it will become even easier to parse content off of a company's web site and display it on a different site. For these reasons, companies might shy away from XML, or they would need to explore new business models and ways to protect their data. This is likely to slow the acceptance of XML by the larger commercial web sites.
Another roadblock to XML's widespread use is the relative inexperience of Web developers. Many developers that have experience with finessing HTML to generate the perfect output will be reluctant to move to this new format where their tricks are irrelevant. They also may rebel against giving up any fancy editing tools they have become accustomed to. This effect will certainly put a damper on many web sites' change to XML.
Even if developers do begin to switch to XML, there is another potential problem. Not everyone will design reasonable XML applications. As is apparent with HTML, some web sites will be designed using the worst XML possible. Two reasons for this are either the designer used an inappropriate XML application for the job or the site designer created a new XML application without knowing what he was doing. For example, in Building XML Applications[10], the authors try to show how to use XML to lay out a web site. They choose a standard XML application to demonstrate reuse. However, the application they choose, CDF, is not really appropriate for this use. They end up ignoring required elements in the language and using other elements for uses that do not quite match the language design. This example misuses XML in the same way that many people, including the authors, complain about regarding HTML.
One of the discouraging predictions made by skeptics of XML is the Tower of Babel scenario. The basic form of this prediction is simple:
The result is that all XML applications become proprietary and we are back where we started from. However, now things are worse. Not only is it difficult to convert between different people's formats, but all of the files are now bigger because they are not using compact binary formats.
Many of the sites that currently use XML are compiling an XML description of web pages into HTML or Active Server Pages. Most of these web sites are using XML as a richer version of HTML that can be rendered in different ways. For example, the XML source for a help site may be converted to HTML for browsing and also converted to a format more convenient for print distribution.
Although some sites convert XML to HTML at request time, many others convert the XML into HTML and serve the HTML directly. Most of these use special purpose code to determine the format of the HTML output. Since this is a new field, many different implementations and approaches are being tried.
By using a compiled XML approach, a developer can easily develop and maintain a consistent look on a web site. Unlike equivalent solutions using server-side includes, ASP, or CGI scripts, there is no run-time penalty when a page on the site is accessed. Unlike these other technologies that are often used for content reuse, this approach combines each of the pieces of the page off-line before they are accessed.
This approach actually works very well with dynamic pages such as ASP. Only the parts of each page that really must be dynamic are left as ASP scripting code. Any static text that would have been included is inserted using the compiled XML approach. In practice, this can significantly improve the performance of a site.
The major disadvantages of compiled XML involves retraining of HTML authors. Unlike most programmers, HTML authors are not usually comfortable with a compile stage. They usually want to make a few changes and see them instantly. Moreover, it is quite difficult to convince them not to change the output HTML. Changes to the HTML output will of course be lost any time the HTML is regenerated from XML.
The other major disadvantage is caused by the fact that the authors are no longer editing straight HTML. This means that the HTML authors usually cannot use the normal slick WYSIWYG editing tools that they may prefer, like Microsoft FrontPage. This often generates resistance to change. XML editing tools do exist, but they are not in general WYSIWYG. This is because the presentation of the XML is not specified by document being edited.
The generic XML to HTML conversion tool (gxml2html) is a relatively simple system for compiling XML[18]. In many ways, its simplicity is its greatest strength. Josh Carter, the author of gxml2html, makes the argument that content and presentation should be separate. This is not new -- that sentiment is the whole reason behind style sheets. Unlike other systems, the author does not attempt an all-powerful solution that can take any XML and convert it into arbitrary HTML. Instead, gxml2html uses HTML templates, which are snippets of HTML used to replace a given XML element.
Carter goes on to argue that systems like XSL, while much more powerful, are too difficult to learn. There is also the suggestion that XSL may be overkill for many applications. As a proof of concept, the entire site describing gxml2html is built using the tool[18]. Like most XML compiling systems, gxml2html focuses on single page conversions, although it does support converting an entire working directory at a time.
Another compiled XML approach has been developed by the Apache Software Foundation. The Cocoon project seeks to separate creation, rendering, and serving of Web content[12]. This is based on a three stage approach:
Cocoon's approach to separating these stages involves multiple stages of XSLT processing. A dynamic caching system is used to reduce the run-time overhead of this approach. The documentation also suggests that Cocoon can be run in an off-line mode to compile the HTML to disk for later delivery by a web server[13].
Preliminary study of the Cocoon system shows that it is possible to perform good separation between the content and presentation. In general, separating presentation and content is a good idea. Unfortunately, there does not appear to be any particular mechanism in place to enforce this separation. Additionally, Cocoon is still very page-centric in its design.
There are already several XML applications available on the Web. Some of these applications have been developed by committees attempting to standardize some form of document. Others have been developed by companies attempting to make a name for themselves in this new field. Still others have been developed by one, or a handful, of developers trying to solve a particular problem. Because this field is so new, it is quite easy for a lone programmer working on a personal project to contribute as much as a group of multinational standards organizations.
Current XML applications cover fields ranging from molecular description and modeling (the Chemical Markup Language, CML) and mathematical formulae (MathML) to multimedia applications (Synchronized Multimedia Integration Language, SMIL) and graphics (Vector Markup Language, VML and Precision Graphics Markup Language, PGML).
Many XML applications are connected directly to Web development (The Channel Definition Format, CDF[1] and The Wireless Application Protocol, WAP[23]). Others were developed for different fields completely, such as GedML for genealogy, and RELML for Real Estate Listings, as well as a host of others.
One promising area of current development is a hybrid of the above-mentioned approaches. Using this hybrid approach, XML and HTML are used together, possibly even mixed in the same file.
Using Microsoft's Internet Explorer, developers can embed XML data directly into their HTML[10]. The embedded XML is then processed by scripts or applets for presentation to the user. The compiled XML tool gxml2html, described above, uses HTML templates to help in the conversion from XML to HTML.
The possibility explored in this thesis uses the XML-compiler approach to generate templates that define the overall look of a site. The main data of the site is provided through other mechanisms. This could be static HTML or from a database accessed using XML. A run-time process then combines this data with the templates to produce output pages.
These approaches allocate resources well. A large portion of each HTML page on a web site is devoted to common elements such as navigation, logos, and footer information. There is no need to generate this text from the raw XML on every page request. On the other hand, some portion of many pages on a web site contain dynamic information. This portion does benefit from evaluation at the time of the user's request.
XML can be used as a storage format. Although not much progress has been made in establishing XML as a standard storage format, there are two approaches currently being researched. These involve using XML to implement either an entire file system or a small database.
The average file system consists of directories (equivalent to elements) and files (equivalent to text content or unparsed entities). It is possible that a file system could be constructed on top of XML[10]. With suitable mechanisms defined for returning parts of an XML document, this would give all of the functionality of current file systems. In addition, XML attributes would allow much richer metadata to be stored with the files and directories. One drawback to this approach is XML's verbosity.
Classic Database Management Systems (DBMSs) are similar to XML in that both systems are designed to structure data in some way to supply a context. One group at Stanford University has developed a full DBMS based on XML called Lore[47]. Lore supports a query language (Lorel) and other features that take advantage of XML's unique capabilities.
Although XML's verbosity will probably prevent it from replacing most database applications, various efforts are underway to produce a standardized query language for extracting data from XML documents. This would allow XML to replace some proprietary flat-file databases and most text-based configuration or preference databases[10] [46].
One of the current initiatives that is being considered by W3C is the XPath specification. The XPath specification is intended to provide a standardized way to reference data inside an XML document[20]. The notation used by XPath looks very much like the directory structure notation used under UNIX.
One of the most useful features of XML is the ease with which it can describe complex data structures. For this reason, XML has found great popularity in the description of interfaces.
In the realm of distributed computing, people found that there was much repetitive work involved in building procedures that called code on other machines. In order to simplify this process, standard Interface Definition Languages (IDLs) were created.
The Web Interface Definition Language (WIDL) is an XML format for describing an API for dealing with Web pages[45]. One of the unfulfilled dreams of the Web was the concept of Intelligent Agents. An Intelligent Agent is a program that roams the Web on the behalf of a user, looking for information and web sites relevant to that user. Unlike search engines, this program would perform its task automatically, without intervention from the user.
The main reason that Intelligent Agents never materialized in the mainstream is the difficulty of extracting usable information from an HTML page. In general, the agent must specify the URL request, retrieve the page, parse the HTML, extract the data, and return it for processing. Then the agent can begin the useful part. The first portion of this process has been extremely hard to automate due to the nature of most of the HTML on the Web. WIDL is a method for specifying the mechanical part of this transaction.
XML has also been used to replace the IDL normally used in CORBA applications. Although the normal IDL used with CORBA covers the input and output specifications required, XML also provides a useful format for serializing data sent and received remotely[43].
Another very interesting use of XML is XML-RPC[35]. XML-RPC is a markup language designed to aid in marshaling Remote Procedure Calls (RPCs). The idea behind RPC is simple enough: treat a call to a remote computer exactly the same as a procedure call in the current process. Unfortunately, the parameters passed to this remote procedure may include addresses and other parameters that would not survive the transfer. The marshaling process converts the parameters of a procedure call into a message to be sent to a remote server. Software on that server converts this message back into a normal procedure call. Finally, the return values from the called procedure are made into a return message that goes back to the original machine.
The advantage of RPC is simplicity of use. The major downside is the marshaling process. Fortunately, the creation of the marshaling code can be automated. XML-RPC uses XML in two different capacities. First, an XML application is defined which serves as an IDL for the RPC calls that XML-RPC implements. Second, XML is used to encode the data that is transferred to the remote process and back again[35].
XML is currently being used as a format for transporting data from one system, say a SQL database, to a program that needs the data. The goal of XML as a conversion format is helping to decouple the programs from the specific database and data format used for storage.
An advantage of using XML as a conversion format is the same advantage
compiler builders get from intermediate code. If there are m
different database formats to read from and n
different programs
that need the data, doing direct translation would result in m n
different translators. When using an intermediate format, such as XML, the
number of translators is reduced to m + n
. This benefit is not
exclusive to XML. Any intermediate format would provide this benefit.
An additional benefit of using XML as a conversion format is readability. Unlike binary interchange formats, the developer needs no special tools to read XML. A well-designed XML interchange format is self-describing, making recovery of information easier, even if the original program is lost[5].
The text-based format makes manipulation easier as well. A text editor provides the minimum functionality needed to view XML. There are also XML editors that give a more structured view of the XML document. Additionally, text manipulation tools can be used to perform maintenance on the XML in between phases, possibly resulting in a much more powerful translation system with relatively little work. Since XML is a text-based format, machine dependencies such as byte order and floating point formats would not be an issue.
Standard tools are available for reading and writing XML. Therefore, the developer can spend more time on the specifics of the problem instead of developing the translation tools.
The main disadvantage of XML as a conversion format is its verbosity. A straight-forward binary translation of the data could be substantially smaller, though harder to read.
If all that is needed is a one-time conversion from one format to another, the overhead of the conversion to XML may not be worth the time. It might be possible to put together a one-shot program in less time. However, the debugging time involved should not be underestimated.
In addition, a potential for loss of information exists in the conversion from a binary representation to a text-based XML representation and back again. Any conversion program should be carefully designed not to lose any information in the conversion process.
The issue of the time needed to convert binary data into an appropriate text form and back was once considered a potential problem. However, given the speed of modern computers, this issue is not as important as it once was. In general, the time to access a hard disk or another computer across a network takes much more time than this conversion.
If we were to attempt to describe a web site using the three orthogonal concepts of content, page presentation, and site structure, we would need a good notation for describing each part. This thesis defines a system which uses separate XML applications for the three separate parts. A program can then combine the pieces and compile them into a web site.
HTML can do a relatively good job of describing content and page presentation. SSI[37], ASP, PHP, HTML::Mason, and others do a good job of factoring out common portions of pages to reduce maintenance. Tools like Cold Fusion and Apache::DBI simplify access to databases. ASP, PHP, Cold Fusion, mod_perl[8] and many others allow run-time changes to web pages at the point of request. However, none of these technologies are geared toward overall site design. Any overall design and consistency work is performed directly by the developer.
All of these technologies can be used to generate consistent, fast and well-designed web sites. However, the consistency and overall design must be maintained through the direct efforts of the developer. The tools do not work at a high enough level of abstraction to reduce the developer's burden. In many ways, this is similar to the contrast between assembly language and high-level computer languages. It is possible to write very well-designed, structured, readable code in assembly language. However, high-level languages supply constructs that make writing code with these attributes much easier.
How then can we gain the benefits of a high-level abstraction when building our web sites? One way is to take the assembly vs. high-level language analogy one step further and create a high-level language for site construction. This language could supply the higher-level constructs needed to simplify a site-centric view of web sites.
Another important benefit of this approach is focusing the developer's attention on the higher level constructs. This change in focus could result in better web site design. By helping the developer focus on the higher level issues, the developer has an incentive to think about and experiment with these issues without being bogged down in the actual details of implementation.
In order to be useful, a web site description system must meet certain minimum requirements. To be used in the construction and maintenance of a production web site, the system must meet even more requirements. As with any design, there are certain issues that are not intended to be addressed, and these non-requirements must be spelled out to reduce confusion.
The minimum requirements are simply those that describe functionality that cannot be ignored and still have a functional system. Most of these requirements are not a part of other site-building tools.
Most web site construction tools ignore this requirement. Those tools are based around creating pages. In a medium to large site, navigation and overall site structure are actually much more important and more difficult than page design.
Most web site construction tools address this issue as their primary focus.
Although most web site building tools focus on implementation of web pages, they normally focus on single pages. Most tools that support some form of common subpage factoring rely on run-time inclusion of the common components. Although this method does work, it does not scale particularly well on high-traffic sites.
This is the primary message of this thesis.
A tool that is only useful on one, or a handful of kinds of web sites, is a curiosity at best. In order to be truly useful, the tool must help design the web site the developer wants to build.
In the normal approach to web site design, the details of site structure and overall look are spread throughout all of the files in the site. In order to replicate this structure, large amounts of the site must be copied to the new site and edited to add the new content. This cut and paste style of development has been found to be a problem in software construction[2].
For large, high-traffic sites, this may be the most important requirement of the system. Common site components should be easily maintained with little or no run-time cost.
In a production environment, there are often parallel sets of changes in development at any one time. This feature reduces the impact of releasing some changes without releasing them all. Improperly used, this feature can generate a maintenance nightmare.
This requirement is probably the most controversial. However, when designing large or heavy traffic web sites, this may be one of the most important requirements.
Timing tests on an earlier design similar to this system generated around 150 pages in under 5 seconds. A fair portion of that time was spent starting Perl and reading configuration information with a primitive XML parser.
This system is designed for use by people trying to solve problems where the normal drag-drop approach fails. Many systems created over the years for use without programming do not scale well to difficult problems.
If this system proves useful and someone decides to build WYSIWYG tools to support it, that would be great. However, the goal of this system is to make large, complex sites possible to build and maintain.
Even without a complete architecture, it is possible to develop a consistent, usable web site through sheer sweat and determination. However, like any other form of software development, the initial development is only a small fraction of the total time spent working on the web site. Shortly after the initial launch, the first change requests come in. As more and more changes accumulate, the design of the site can begin to break down, as described in Maintaining the Web Site.
By collecting the high-level description into a single file, WSDL makes measuring the consistency of the site easier. Given reasonable consistency measures, one can tell if the web site is getting less consistent. This can be a major advantage when a single developer is in charge of several sites, or if she must change a web site she has not looked at in six months. Some of these measures are simple enough that one can gauge them by looking at the WSDL description. Others can be calculated using a fairly simple script.
Some of the metrics that one might find useful in evaluating a web site are
These metrics and the numbers derived from them can help to evaluate a web site. Unfortunately, these numbers do not directly indicate whether a web site is in trouble. However, the change in these numbers over time can give an indication of where problems may arise.
For example, if the web site was originally defined to have five major sections and all of them have the same presentation except one, one would expect the number of group elements under the main navigation to be five. The expected number of layout elements would be two. After several changes the number of group elements under the main navigation is six and the number of layout elements is ten. The developer probably already knows that there was one more new section, but he may not have realized that the overall consistency of the sections appears to be getting worse. There are now more different styles of page presentation than there are major sections. It would be a good idea to investigate this issue and either correct the problem or find an acceptable reason for the divergence.
Without this indicator, the first clue that the presentations have diverged is the difficulty in making a single change to a section. A request comes in for a change to the header on the third section. The change is made and it appears on half the pages in the section. The developer tracks down the pages that do not match and makes a change there. That change affects one of the pages in the fourth section, and so on. By having all of this information in one place, it is easier to recognize the decline before the developers are spending all of their time correcting the effects of the last change.
At the highest level, the purpose of the Web Site Description Language (WSDL) is the description of a whole web site in enough detail for a program to generate it. This language must be able to describe the structure of a web site and its navigation in a clear, concise format. Another issue to consider is what makes this language different than other similar languages. Other web content creation languages focus on page creation. The site is then defined as a collection of pages with appropriate navigation built onto the pages.
In order to describe a web site, the pieces that need to be described must first be identified. Any web site can be partitioned into several kinds of data. Some of it is explicit, such as the individual pages and images that make up the site. Some of the data is implicit.
What is meant by implicit data? Implicit data includes data about the site (meta-data) and common pieces of information that occur throughout the site. Some pieces of meta-data are as follows:
What common pieces of information are associated with the web site? This is an area that people have explored with a view to reducing page maintenance. Some of these items include the following:
Some of the above information is factored out of various pages and put into separate files that are included somehow to reduce maintenance. For example, the footer of all of the pages in a web site may have the same look and text. This would often be located in one file and then included as needed. Other parts of this data are maintained only by conventions enforced by the developers. For example, the presentation of side-bar text is expected to be consistent throughout the site, but the actual side-bar must be different on every page. The developers may then establish conventions for how this piece of HTML is constructed when it is needed. In many cases, none of this information is documented or really understood as an important part of the web site.
The explicit data that makes up a web site may, on the surface, seem easier to define. Obviously, the pages that make up the site are all necessary parts of the site, as are all of the images. However, there are many other resources on a web site that are not part of the navigable HTML. These resources include:
Obviously, this set of data is as varied as any other portion of the web site. In order to define a high-level description of the web site, these pieces must be classified. These classifications not only support the construction of the web site. They also improve the ability to analyze the web site for inconsistencies and maintenance problems.
Some people might ask why they need to care about this level of detail. After all, people have been building web sites for years without making these distinctions, so how are these distinctions helpful? In addition to the standard software design arguments for this level of detail, there are some serious practical, real-world considerations as well.
Assume someone is building web sites for external customers instead of a site for their personal use or their company. How should that developer respond to the following client questions?
If the developer happens to have put the web site together in just the right way and the clients ask for just the right modifications, it's no problem. But other seemingly innocent suggestions become a nightmare to implement. Unfortunately, similar issues can arise with internal customers as well.
As with any other form of software design, web site design is a study in managing change and risk. The risks have to do with browser incompatibilities, download times, and platform inconsistencies. Managing change involves Internet time, which used to be called Rapid Application Development. The clients want faster changes and newer technologies, without sacrificing download speed, user experience, or web site stability. Unfortunately, if the major design components of a given web site are spread among 500 pages, simple changes are no longer simple.
Even using some form of inclusion technology does not solve the problem. Issues of run-time cost are only part of the issue. The consistency of the site depends on including the same boilerplate HTML in every file in the web site. This approach is far from perfect. If this boilerplate HTML is not included in a file, that file is not consistent with the rest of the web site. If the boilerplate HTML is changed in one file or if the included HTML is similar to, but not the same as, the standard boiler plate, that file will also be inconsistent with the rest of the web site.
The inconsistencies that may arise through errors in the inclusion technique are obviously a problem. But there is a more important problem hidden by the obvious problem. There is no way to find any of these problems without looking at every file in the web site. This is definitely a potentially large maintenance problem.
The design of WSDL is geared toward providing tools to solve these problems. It is not a panacea, however. A flexible, usable web site still requires careful thought and design. WSDL just helps collect the necessary information in a way that simplifies certain kinds of changes. It can also be used as a guide to direct the thought process.
WSDL provides a way to keep common page components in one place without the run-time costs of server-based include systems. WSDL also provides a single place to describe the structure of a site, instead of the standard practice of scattering that information through every file on the web site. In some environments, the structure is described in an external document that developers can reference when there is a question. However, external documentation is rarely synchronized with the code.
The goal of the conditional elements is to allow multiple versions of a configuration to exist at one time without the need for separate versions of the file. Since the implementation language, at present, is Perl, the simplest solution for Boolean expressions are those supported by Perl. The Boolean expression can be any Perl expression.
Conditional expressions are modeled on the ones supported by XSLT[19]. The only difference is that the WSDL conditionals use Perl
syntax for the test attribute and Perl's definition of true and false. Remember
also that the >
symbol is illegal in an attribute. You must use
>
to encode it. Likewise, the &
character
must also be encoded as &
. The if is replaced
by its contents if the test attribute evaluates to true. Otherwise,
the element and its contents are removed. In the example below, if the
$debug
variable is true, the debug page is defined,
otherwise it is not.
<if test="$debug"> <page id="debug" title="Debug Page" href="/debug.html"/> </if>Example of an if element
The choose element allows for multiple tests and a default.
The content of the first when element whose test
succeeds replaces the entire choose element. If none of the
when tests succeeds and there is an otherwise element,
the content of the otherwise element replaces the
choose element. If no otherwise exists, the
choose and its contents are removed. In the example below, the
value of the variable $feature
determines the level of
functionality provided.
<choose> <when test="1==$feature"> <page id="cool" title="Cool Page" href="/feature1.html"/> </when> <when test="2==$feature"> <page id="cool" title="Cool Page" href="/feature2.html"/> </when> <when test="3==$feature"> <page id="cool" title="Cool Page" href="/feature3.html"/> </when> <otherwise> <page id="cool" title="Cool Page" href="/unavailable.html"/> </otherwise> </choose>Example of a choose element
A good use for this feature is changing some of the capabilities of the WSDL
file based on command line parameters. Any parameter of the form
name=value
sets an entry in the hash %CmdLine
. As
expected, the value
is stored in the hash keyed by
name
. For example, if the command line contains the item
PLL=1
, the value of $CmdLine{PLL}
would be
1
.
The first version of the program does not prevent a boolean expression from referencing or changing any value in the program. This behavior may change in later versions of the program.
Several elements have attributes that reference files or directories on the local disk, or URLs located in the output web site. The rules for dealing with these directories are fairly straightforward. However, they sometimes result in surprises. The purpose of the rules is to reduce the amount of redundant data entered into the WSDL file.
Any directory or URL that is absolute after macro expansion is used as is.
For example, if the value of src attribute of a file
element is {{root}}/stuff.pdf
, which resolves to
/docs/stuff.pdf
, that file path is used as is. If a directory or
URL is relative, it is resolved relative to its parent element's equivalent
attribute. In the case of URLs, the root attribute of the parent is
considered to be equivalent to the href attribute of the child. For
example, given the WSDL fragment below:
<group root="/news"> <page href="today.html" title="Today's News"/> </group>Attribute inheritance
The URL for the Today's News page is /news/today.html
,
combining information from the group and the page.
If the src or dest attribute of an element does not contain a filename, but the href attribute does, the filename in the href is used.
There is one portion of the design of WSDL that is very different from most XML applications. The normal XML approach to inclusion of data from another file is through the use of general entities. To include a file or other resource using a general entity is a two step process. First, the entity referencing the resource must be created in the DTD. Next, the entity must be referenced in the appropriate place in the code.
One major advantage of this approach is the ability to include any kind of resource, not just files. Also, because it is standardized, it may be supported directly by an XML parser. Major disadvantages include the learning curve of this separate mechanism and the fact that not all non-validating parsers support the feature.
In order to allow an easy path towards reusing pieces of a web site description, some of the larger WSDL elements support a special attribute named include. This attribute names a file that is to be read for the content of the current element. This approach was chosen primarily because this method seemed easier to understand than an approach relying on general entities.
Nothing in the WSDL design prevents using general entities to include external data in a WSDL file. The include attribute mechanism is just simpler to learn for someone with less XML knowledge and experience.
Any element that uses the include attribute cannot have any content. This is most easily accomplished by making it an empty element when the include attribute is used.
There are a few restrictions on the attributes of including and included elements. In general, the attributes of the resulting element is the union of the attributes of the including and included element. If an attribute exists in both the including and included element, the including element takes precedence over the same attributes in the included element. The one exception to this model is the id attribute (See The WSDL Element Reference). If the id attribute exists in both the including and included element, it must have the same value.
In an earlier version of our system, any extra data to be associated with the page was added as new attributes on the page element. Since that system was not intended to be validated and had no DTD, this was not a major problem.
However, this approach does not scale well to multiple, different kinds of web sites. In a large project, the ability to validate the WSDL file would become critical to the maintenance of the site. Unfortunately, there is no way to develop a single generic page description that covers all possible scenarios. Even if there was, it would be too large and complicated to use.
In order to correct this deficiency in the earlier design, Page Properties were added. These properties are supported through the prop element. Using these properties, the developer can add application-specific data to each page element in the web site. Just as importantly, a parser can still validate the resulting WSDL file using the WSDL DTD.
In the earlier version of this system, these properties were used in many different ways. Some ideas for use might include
Including this feature allows much more flexibility when creating the presentation files. The properties approach allows a particular application to add specific features in a standard way. This gives us the advantages of the original approach without any of the disadvantages.
One possible enhancement to this system would be to add prop elements to groups and websites. At present, the definition of WSDL does not include prop elements as children of these other elements. Only a few of the implications of this change have been considered, without any solid conclusions.
Let's say a group of developers is asked to build on a web site that has a consistent look across all of the pages. After careful consideration, they design the web site and manage to get all of the pages based on the same presentation file. A week later, a new requirement comes in. Some of the pages need a different banner image at the top of the page. Because they are in a hurry, they duplicate the layout page into two versions, one with the first banner and one with the second. A few days later, another request comes in for a third banner to be placed on other pages.
At this point, the developers should realize that splitting the
presentation file again is likely to create maintenance problems. What they
really need is a single presentation file and a parameter that specifies which
banner image to insert. First, they go back to a single presentation file.
They replace the banner image name with a reference to a page property, such
as prop[banner]
. In each page in the WSDL file, they now add the
prop element with a name of banner and a value of the
appropriate banner image name.
After this change, any changes to the banner image names result in changes to the WSDL file and nothing else. This applies whether the client decides to go with a single image again, a different image on each page, or anything in between. Now they have a more flexible design and they have arrested the slide into maintenance chaos, this time.
The WSDL file describes a web site at a high level, but it does not contain enough information to generate actual files. Most of the missing information is a description of the presentation of the individual pages. The presentation of individual pages is described by special presentation files that are referenced by the layout elements in the WSDL file. These presentation files contain information on laying out individual pages described in one of two formats: the Page Layout Language or skeleton-based presentation.
The Page Layout Language is a high-level markup language designed to help structure the individual pieces of a set of HTML pages. The language describes components of the pages at a high level to abstract the presentation knowledge from the pages themselves. For a list of the elements supported by PLL, see The PLL Element Reference.
The PLL presentation approach is based on a very high-level description of the presentation of a page. The WSDL processor determines how each of these high-level elements is converted into actual HTML. Since the PLL elements must be well-defined, the processor can guarantee valid HTML is generated. Moreover, the processor can sanity check the PLL description to verify that all of the proper elements are in the presentation, in the right contexts. The processor can also verify attributes.
The skeleton-based presentation files contain the actual HTML to be used to lay out the pages. Macro commands are embedded in this HTML that supply the content specific to this page. This system gives the complete flexibility needed to generate any desired HTML output. Unfortunately, the WSDL processor cannot verify the HTML.
This system does give the largest amount of flexibility. It also carries the danger that content may creep into the presentation files, defeating some of the purpose of the WSDL system. However, it is easier to learn for people with a background in ASP, PHP, or plain HTML.
As described above, much of the benefit of the PLL format comes from the high-level nature of the format. This format can generate better HTML over time as the WSDL processor is refined. However, this format is not as flexible as the complete control over the HTML afforded by the skeleton-based presentation system.
Although the skeleton-based presentation files are more flexible, they suffer from a relatively low level of abstraction. The developer must control every aspect of the presentation explicitly. The processor cannot provide default presentation behavior or validate the HTML.
This tradeoff is very much like the one between high-level languages and assembly language. There is nothing that can be done in a high level language that cannot be done in assembly language. Moreover, assembly language is much more flexible than any high level language. But, in spite of these facts, high level languages allow better productivity because they hide many details that most programmers do not need (or want) to know.
Just like the tradeoff between high level languages and assembly, it is useful to have both the high level and low level options. In many cases, the high level representation is good enough to do what is needed. But every now and then, only assembly code will do. For this reason, PLL is provided for regular pages requiring relatively straightforward presentation. Skeleton files are available for pages that require some really strange format that does not conform to the simple model provided by PLL.
If the presentation files were nothing but static text, they would not serve any purpose in WSDL. However, both presentation file formats support macro commands that allow programmable functionality at the time the presentation file is used to build a page. The full list of macro commands available is described in The WSDL Macro Reference.
Most pieces of text in the WSDL system that are used to generate output
pages are subject to macro expansion. The process is relatively
straight-forward. The output string is searched for a string of the form
{{some_string}}
, where some_string does not contain
the string }}. Then, some_string is evaluated as described
in The WSDL Macro Reference. The
result of this evaluation replaces the {{some_string}}
in the
output and the expansion continues.
These macro commands can be embedded in the content of text elements and any content files that are included into output pages, including skeleton files. Used carefully, this macro system can reduce the amount of content that creeps into the skeleton files.
PLL files handle macro commands slightly differently from the rest of the WSDL system. First of all, the text and code elements may be referenced directly using the appropriate PLL elements. The macro commands may only be placed in the content of the prelayout element, in the results of the text and content elements, or in the arguments to a code routine call. The last of these gives a feature that is not yet possible anywhere else in the system. A macro command can be used to specify the value of an argument to a code routine. Supporting this feature in general is a topic for further research.
The code element provides a large portion of the power of the WSDL system. This feature allows arbitrarily powerful custom functionality to be added to the system and run at the time the site is compiled. This means that it incurs no run-time penalty.
The code element contains Perl code that is executed as part
of the WSDL process at times specified in the WSDL description. The
begin
and end classes are relatively simple. All
begin
class code elements are evaluated at the time they
are encountered in processing the WSDL file. All end
class
code elements are evaluated in the order they were encountered just
before the WSDL processor shuts down.
The routine
class code elements, or code routines,
are only executed when necessary during the construction of specific pages. In
order to simplify this process, several variables are passed to the code
routine, including
$curr
$ancestors
$ancestor->[0]
.%args
Additionally, these global variables are available to the code routine:
$wsdl
$website
%CmdLine
The WSDL object contains a DataDoc
object as well as an
XML::Parser
object. The member functions that are most likely
to be useful in code routines are described below:
$wsdl->website
Return the outermost website element.
$wsdl->byID( "id" )
Return the element identified by the supplied id string or
undef
if none matches.
$wsdl->IDedElementsByType( "type" )
Return a list of all elements that have ids and match the specified type.
See The DataDoc Object Reference for complete documentation of the DataDoc object model. When the DataDoc sub-object in the WSDL object is created, several portions of the input file are filtered out for efficiency. The DataDoc will not include any processing instructions or comments. In addition, any ignorable whitespace is removed from elements that do not support text content.
In order to simplify working with these internal objects in code routines, several utility functions are provided.
NthContent
Content
returns the single content
item indexed from the list.resolve_macros
resolve_macro
child_of
XMLelement
objects.descendent_of
XMLelement
objects.ancestor_of
parent_of
Because of the many mechanisms for delivery of HTML output, page content will be different for each site. In order to make this discussion somewhat more concrete, this thesis provides a particular application for an example site. The classic bookstore example is too rigid to show the flexibility of this system. Instead, this thesis discusses a college web site, including a simulated course registration system, at the mythical Internet College. The presentation and structure of the site is built in WSDL. In general, the content of the web site is not supplied, unless it is necessary to prove a point. (See The Source for the Example Site)
Unlike most toy sites, this one contains a few of the quirks encountered when building a real site. The lifetimes of pages in a real site are a continuum ranging from static pages, that only change if the entire site is redesigned, to dynamic pages, that are different on every request. Unlike many example sites, this one contains a wide range of page lifetimes.
The disclaimer page is static -- it probably will not change in the lifetime of the site. At the other end of the scale, the pages relating to billing and the student schedules must be dynamically generated for each request. The calendar and schedule pages change on a regular schedule, once per semester or per year. The faculty pages change infrequently, but not on a fixed schedule. It is easier, when building an example site, to focus on one of these at the expense of all of the others. Real web sites tend to be more complex and interesting.
This example should show how the page content may need to be modeled in an arbitrarily complex way. More importantly, using this model, we can separate this portion of the problem, which is the most unique part, from the presentation and site structure issues. Although each is important, it is the content that determines the difference between one web site and the next.
The initial idea for WSDL predated my introduction to XML. The WSDL language did not appear fully-formed to be used for web site development. It was actually the next stage in a series of languages developed for similar purposes.
The company I work for provides financial content through a financial web site. The company also develops customized financial web sites for external customers. I was requested to help with a problem where multiple clients wanted sites that were exactly the same as ours, with some minor differences. The initial attempts to solve this problem involved extensive use of ASP and JavaScript tricks.
This solution was fragile, and we predicted it would not support more than three to five sites. I suggested a system that would allow us to compile a web site and reserve the run-time behavior for items that changed at run time. It was decided that this sounded too risky to apply company resources to.
A couple of years, and a half-dozen projects, later, a project came along that required an extremely ambitious delivery schedule. We realized that we should just about have enough time to complete the major look and feel components of the site by the deadline. But, there would not be any time or resources available for rework if the client changed the requirements. Clients always change the requirements. This time, the web site compiler idea was approved, because we knew the project would not succeed without it.
The first version of the template generator, as it came to be called, took about two days to complete and could generate 70 web page templates in under three seconds. The system used a Perl script and an XML-based configuration file to turn a couple of templates containing special markers into 70 pages of HTML that would wrap our content. Within a week, an HTML developer was not only making changes to the templates, but also editing the configuration file and making minor tweaks to the Perl code.
By the end of the project, approximately 20 skeleton files, as we now called the input templates, were being used to create almost 200 output templates that would be combined with run-time data to generate thousands of result pages. The template generator was definitely a success. However, this version suffered from some major drawbacks. Making some structural changes required changes to the Perl code, resulting in less flexibility than we wanted. Much of the Perl code was written for this particular client and could not easily be reused. The design of the configuration file assumed we could add attributes as needed to some of the major elements. This made validation a difficult task. Last of all, the hand-written XML parser was not as flexible as some of the ones available on the Internet.
This version of the template generator needed changes to be useful on any other project. The company handed the system to a new programmer, who spent several weeks enhancing and simplifying the design. She and I spent quite a bit of time exploring the initial design and discussing what I saw as the new direction to go. Her version incorporated some of my ideas and a lot of novel work to create the next generation of the template generator. The results of her efforts were used to generate the next version of our company web site. She was also granted permission to use this system as the topic of her master's thesis[6].
Later, I reworked the template generator to deal with a set of similar web sites that only differed in the colors and defined structural changes. This system could create any number of web sites that followed one of four basic designs.
Each of the major versions of the template generator had different strengths and weaknesses, but there were several things they had in common. Their major strengths were the XML-based configuration files and the skeleton files that gave pages their basic form. The major weakness was the fact that each site needed its own Perl code and configuration file format. In order to solve this problem, WSDL would need to be designed more flexibly from the beginning.
We began by researching web site design and implementation. We found that quite a bit had been written about web site design from an information architecture[7] and structure[4] standpoint. The books we found on the subject discussed the things a web site designer needs to keep up with and what decisions to consider. But the implementation was assumed to follow the normal approach of building individual pages with some included components for reusability.
Next, we reviewed the earlier attempts I had been part of. We saw that most of the shortcomings could be traced to attempting to solve the current situation, without looking at the real, underlying problem. With that in mind, we began by enumerating all of the components WSDL would need to describe a web site. The basics of the current structure including the website element, siteinfo element, and most of the navigational elements fell into place early. The common pieces that applied to the whole web site went into the siteinfo element. Everything else went in the website element.
We experimented by writing small sites in WSDL, to see how they would look. In general, they seemed fairly reasonable. One early design decision involved non-HTML files. We wanted to be able to list the non-page files in WSDL as well. We believed this might be particularly useful during site reorganizations.
Another important design decision was the ability to nest website elements to support the concept of sub-sites. Some complicated web sites use the concept of sub-sites to enhance the user experience. The user is not confronted with a massive site containing a large number of pages. Instead, the user sees a set of sub-sites all accessible from one umbrella web site. In general, all of the sites have major presentation elements in common. However, each will have some distinctive feature that allows the user to recognize which site he is visiting.
The first major reality check came when a program (MakeWSDL.pl) to walk an existing web site and write out its description in WSDL was built. We decided to start with my own personal web site. This site is simple, although not particularly well designed. The idea was to see what a simple web site description would look like. We expected the structure to look bad, but the actual output was a surprise. It was obvious that the navigable content of the site must be separated from the other files. It was almost impossible to find the structure in amongst the other files. The entire navigational structure, one of the fundamental reasons for this language, was hidden.
To fix this problem, the content of the website element was modified to contain a list of resources and the actual navigation. This decision also changed the location for the stylesheet elements. Before that time, they had been part of the siteinfo. But, that did not really fit. Moving the stylesheet elements to the resources made much more sense.
During this reorganization, we began thinking about the way we encoded people's names. We realized that the original approach had been simplistic. The original elements that made up a person's name were honorific, given, and family. We realized that many names would not fit this form. This approach could not deal with people with only one name. It also did not deal with middle names, or numbers, or initials. We could add more elements to deal with middle name, initials, nicknames, and more; but, this seemed inelegant. Finally, we realized that a better approach was to take all of the name portions of a person's name and make them name elements with a class which further defines the role of that portion of the name. we left the decorations honorific and degree separate because we did not actually consider them to be part of the name.
Another limitation that became obvious during the reorganization was a loss in flexibility from previous designs. The earliest versions of the template generator allowed the addition of site-specific attributes to the page element. We realized that by defining a DTD for WSDL, we had made this feature impossible in its earlier implementation. We did understand that there is no way WSDL can define all of the possible page attributes that everyone will ever need. Thus, the prop element was born. This element allows us to associate an unlimited number of named properties with a page.
With some testing, the need for the set element under resources became obvious. There needed to be some way to group the resources. Shortly after that decision, fpages and subpages became part of the content for resources and sets. Some sites, including mine, have pages devoted to links leading to other web sites. These pages seem naturally implemented with sets of fpages.
As we began to develop the output logic, the problem of presentation descriptions became much more obvious. In earlier projects, we had used skeleton files that contain raw HTML with embedded macro commands. We had originally intended to use a high-level presentation language instead. Both approaches have advantages and disadvantages. The skeleton approach is much easier to implement, but it is much easier to abuse. The presentation language approach is easier to verify, but it requires more development and reduces the control of the developer.
On the advice of a wise programmer I know, ``When attempting to choose between two good implementations, try to choose both,'' we implemented both systems. The PLL approach works very well for well-structured presentations. If the presentation is too complicated, the skeleton approach can always be used. Implementing this feature required the addition of the layout element.
We originally had required the src and dest attributes on all file elements. In working with a real web site, we discovered that this lead to a lot of duplication. Allowing the file elements to inherit these attributes from their enclosing set element resulted in a much cleaner design. A side effect of this decision was the need to add src and dest attributes to the website element in order to reduce redundancy at lower levels.
The current system for handling the inheritance of directories seems more complicated than necessary. Initial testing generated several surprising results. Other than a small amount of cleanup that solved some of the surprises, no redesign of this functionality has been attempted.
Another surprise that happened during this stage of the implementation was the realization that the stylesheet element needed src and dest attributes as well. Since the design had been moving toward a system allowing the entire site to be built through generation of pages and copying of non-page files, it was necessary to define where the style sheet would come from and go to.
The completed system consists of several programs implemented in Perl. Perl was chosen for this task because of its extensive support for text manipulation, as well as its rapid prototyping capabilities. Libraries for parsing and manipulating XML in Perl are also available.
The initial version of the WSDL processor is named
MakeWebsite.pl
. The program uses the XML::Parser
module to convert an XML file into a memory object called a
DataDoc
. The program then walks this DataDoc
object
and creates the appropriate output files for the web site under
construction.
This object was designed specifically to simplify access to attributes and still give good performance on access to content. The ability to find an element's ancestors or siblings was not a design goal. As a result, the data structure is quite a bit simpler and lighter than the equivalent Document Object Model (DOM) structure.
Once a WSDL description of a web site has been created, the WSDL processor is invoked on the WSDL file. This program parses the WSDL file, uses the presentation files and content, along with any external files, and creates a full copy of the web site in the target directory.
In addition to the name of the input file, MakeWebsite.pl has five command line options:
index.html
, if none is given.In addition to MakeWebsite.pl
, there are several other scripts
developed as part of this research.
The program MakeWSDL.pl
takes the URL of a web site on the
command line and prints a starting WSDL file for that site to the standard
output. Because MakeWSDL.pl
cannot know about the actual structure
of the site, except through its links, the program makes guesses about how to
group the various pages and other content. The program also extracts style sheet
information and any contact information it can find. The output from this
program is not a usable WSDL file, because it lacks presentation information
and much of the site's actual structure. The program's purpose is to automate
some of the tedious portion of creating a new WSDL file.
The program TestPLL.pl
validates the PLL files that are passed
as arguments. Called with no arguments it runs a regression test on the PLL
validation code. This program serves as a quick check on any PLL presentation
files needed for a project.
There is an equivalent validation program for WSDL files called
WSDLvalid.pl
. Called with file names on the command line, it
validates each of the WSDL files. With no arguments, it also performs a
regression test on the WSDL validation code.
Some of the benefit of using WSDL comes from the ability to analyze the WSDL
description of a web site. The program WSDLmetrics.pl
provides the
data for this analysis. This program reads a WSDL description and generates the
metrics described in Payoff
section.
These programs are available at http://www.anomaly.org/wade/wsdl/, along with the source to the example site.
Although speed of web site compilation was not a major focus in the design, it is an issue with measuring.
The example site is a small, relatively simple site. The code routines used
for the site navigation are of medium complexity. The test for faculty web sites
adds some time to the generation of the site. The WSDLmetrics.pl
program returned the following results.
WSDL Summary: ------------ total navigation elements: 34 group elements: 4 page elements: 17 pageref elements: 0 fpage elements: 0 subpage elements: 0 website elements: 0 max elements per navigation: 3 max elements per group: 8 layout elements: 2 header elements: 1 footer elements: 1 code elements: 7 text elements: 1 stylesheet elements: 1 max depth of nesting: 5
MakeWebsite.pl
created this site in approximately 3 seconds on a
200 MHz Pentium Pro running Linux.
As an completely different kind of example, a WSDL file was created for the
University of Houston Computer Science web site using MakeWSDL.pl
.
This description was modified to use a minimal presentation file and no content,
in order to measure raw speed of WSDL processing. The
WSDLmetrics.pl
program returned the following results.
WSDL Summary: ------------ total navigation elements: 3702 group elements: 79 page elements: 796 pageref elements: 1246 fpage elements: 157 subpage elements: 139 website elements: 0 max elements per navigation: 389 max elements per group: 516 layout elements: 1 header elements: 0 footer elements: 0 code elements: 0 text elements: 0 stylesheet elements: 9 max depth of nesting: 5
MakeWebsite.pl
created this site in approximately 10 seconds on
a 200 MHz Pentium Pro running Linux.
The WSDL system is still a work in progress. During the development of this version, several enhancements or areas for further research presented themselves. Some of these ideas were not explored because the full implications of the features are not yet apparent. Others were put aside in the interest of actually completing a working system.
One idea for possible enhancement of the system involves more
code classes. The current set of begin
,
end
, and routine
handle the current system's
requirements nicely. However, code that is executed at the beginning and
end of each page could be a useful enhancement. Extending
that idea a little further, code that executes at the beginning and end
of each group and website could also be useful.
Extending this idea in a different direction, we could define classes of pages so that only some of the page begin and end code applies to certain pages. Obviously, there is room for quite a bit of experimentation in this area.
The ability to add prop elements to groups and websites opens up many new possibilities. However, issues of semantics of these features are still open to question. There's also the question of accessibility through macro commands and inside code routines.
Another possible area for further research is the ability to define multiple output formats for the PLL presentation system. Although the skeleton-based presentation system can be used for any form of output, the WSDL processor only generates HTML from PLL presentation files. The ability to define multiple formats or the ability to specify the conversion from PLL into an output format could be very useful.
The ability to nest macro commands in the parameters for other macro commands could be very powerful. For instance, the value of a page property could be passed as a parameter to a code routine. Another possibility is using a page property to select which contact to use for email questions. Unfortunately, the parsing for arbitrary levels of this nesting cannot be done with regular expressions. An actual context-free grammar would be required.
Another direction for future research involves more tools to support WSDL development. A WSDL-specific editor would simplify construction of WSDL files. Various validation, analysis, and optimization tools are possible that increase the usefulness of this higher-level description of a web site. Lastly, the command-line tools could be enhanced to friendlier, graphical versions, reducing the learning curve.
The purpose of the WSDL system is reducing maintenance costs and inconsistencies in large web sites. The system defines two high-level languages that describe a web site in enough detail that the WSDL processor can generate the web site. This approach is very different from other systems that have been implemented for the creation of web-based content.
Testing with previous versions of this system showed a definite increase in developer productivity during initial development. In addition, maintenance costs were reduced dramatically. The current WSDL design includes all of the main features of the earlier versions. In addition, the WSDL system also addresses the some of the earlier shortcomings, such as the ability to have separate code bases for different clients and automatic verification of the WSDL files.
This section lists and describes all of the elements and attributes in the WSDL language.
The body element defines a set of attributes for the body element of any HTML page that references it.
The body element has no content.
The body element has one required attribute, described below, as well as all of the attributes of the HTML 4.0 body tag:
The required id attribute supplies a unique name for this body element.
The choose element provides the ability to choose among multiple options. This feat is accomplished with the help of the when and otherwise elements.
The choose element contains one or more when elements followed by an optional otherwise element.
The choose element has no attributes.
The code element contains sections of scripting code that is evaluated at compile time to generate portions of output pages. The obvious example would be code used to generate navigable material that is different for each page.
At present, the only code language supported is Perl. The Perl code is
evaluated to create a subroutine reference that is called when the
appropriate code macro is called in the presentation file. The parameters for a
code reference is expected to be a list of name=value
pairs.
The code element contains the source code that is evaluated whenever this code reference is called. Since the content of the code element often contains characters that are not valid in character content, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.
The code element has one required attribute and two optional attributes:
The required id attribute supplies a unique name for this code element.
The class defines which of three types of code
reference this element contains. The default class is
routine
, which specifies a section of code to be treated as
a single function. The begin
class specifies code that
should be evaluated before any code is executed. This may be used for
initializing global variables and declaring utility functions. The
end
class specifies code that should be evaluated after
all code has been executed.
The include attribute supplies a filename from which to read the content of the code element. If this attribute exists, the code element must contain no content. For more information, see Element Inclusion.
The contact element contains information about a person or organization associated with the web site.
The content of the contact element comes in one of two forms. The simplest is an optional descr element followed by a single institution element. The more complex form is an optional descr element followed by a set of elements that describe a person. This set consists of an optional honorific element, followed by one or more name elements, and zero or more degree elements.
The contact element has one required attribute and two optional attributes:
The required id attribute supplies a unique name for this contact element.
The href attribute provides a URL link that is associated with this contact. This would often be a home page for the person or organization.
The email attribute provides an email address associated with this contact.
The copyright element specifies copyright information for use on the web site.
The copyright element contains raw text without markup. This content is expected to be the terms of the copyright, although there is no way to validate that.
The copyright element has one required attribute and two optional attributes:
The required id attribute supplies a unique name for this copyright element.
The year or range of years for which this copyright applies.
The owner attribute contains a reference to a contact element that supplies the copyright owner's name.
The degree element contains the degrees applied to the contact's name.
The degree element contains raw character data without markup.
The degree element has no attributes.
The descr element contains a small amount of text that is a description of the current contact. The descr is useful for adding small amounts of explanatory text to a contact, above and beyond the contact's name and contact information.
The content of the descr element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section.
The descr element has no attributes.
The directory element defines a relative or absolute directory in the web site.
The directory element has no content.
The directory element has two required and one optional attribute:
The required id attribute supplies a unique name for this directory element.
The required name attribute supplies the actual directory name for this directory element.
The class attribute supplies a broad category for the directory to allow grouping for various validation and reporting features. Suggested classes might include image, database, stylesheet, include, applet, and script.
The file element designates a non-html file linked on the system. This element describes files like downloadables, PDF files, images, etc. Anything that resource to be managed in this system but is not created by the system is described by a file element.
The file element has no content.
The file element has one required and six optional attributes:
The id attribute supplies a unique identifier for this file element.
The src attribute gives the location in the source file system where this file is located.
The dest attribute lists the location where this file is to be written.
The required href attribute specifies the URL for this file on the completed site.
The class attribute allows partitioning the possible files into broad groups. These may be used for further processing or reporting. The defined groups are
The title attribute specifies a title to be used for the link that points to this file.
The type attribute allows us to associate a mime-type with this file.
The footer element specifies the content to be used as a footer on pages in the web site. The footer element can be thought of as a special case of the text element.
The content of the footer element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.
The footer element has one required attribute and one optional attribute:
The required id attribute supplies a unique name for this footer element.
The include attribute supplies a filename from which to read the content of the footer element. If this attribute exists, the footer element must contain no content. For more information, see Element Inclusion.
A fpage element references a foreign page. A foreign page is either a page on another web site, or a page on the current web site that is not generated by the WSDL processor.
The fpage element has no content.
The fpage element has two required and two optional attributes:
The required title attribute specifies a title to be used for the link that points to this fpage.
The required href attribute supplies the URL of the page that the fpage reference points into.
The fragment attribute contains the name of the page fragment if this element references part of a foreign page instead of the whole page. The page fragment is the part of the URL after the '#'.
The id attribute supplies a unique identifier for this fpage element.
The group element collects a set of pages together into one logical, navigation piece. This is has sections of a site would normally be defined.
The group element can contain the conditional elements, if and choose plus zero or more of the navigational elements listed below:
This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The group element has no required attributes and several optional attributes:
The id attribute supplies a unique identifier by which this group can be referenced.
The include attribute supplies a filename from which to read the content of the group. If this attribute exists, the group element must contain no content. For more information, see Element Inclusion.
The dest attribute lists the directory into which the pages in this group are written.
The root attribute defines the root of the directory tree for this group.
The main attribute specifies the id of the page that is the main page for this group.
The href attribute specifies the URL for this group. If not supplied, the URL for the page specified by the main is used.
The title attribute specifies a title to be used for the link that points to this group. If not specified, the title from the page referenced by main is used. If neither the title or main are supplied, the title of the first navigational element in the group is used.
The keywords attribute lists the keywords to be added to each page of this group that does not have keywords of its own.
The description attribute contains a short description of the page, sometimes used by search engines. This description may be applied to any page in this group that does not have a description of its own.
The layoutref attribute contains a reference to the default layout for pages in the group.
The bodyref attribute contains a reference to the default body attributes for pages in the group.
The copyref attribute contains a reference to the default copyright for the group.
The header element specifies the content to be used as a header on pages in the web site. The header element can be thought of as a special case of the text element.
The content of the header element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.
The header element has one required attribute and one optional attribute:
The required id attribute supplies a unique name for this header element.
The include attribute supplies a filename from which to read the content of the header element. If this attribute exists, the header element must contain no content. For more information, see Element Inclusion.
The honorific element contains part of the contact name. (e.g. Mr., Ms., Dr.)
The honorific element contains raw text without markup.
The honorific element has no attributes.
The if element provides simple if-then conditional functionality. If the test evaluates to true, include the content of the element. Otherwise, discard the content of this element.
The if element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the if element replaced the element itself.
The if element has one required attribute.
The required test attribute contains a boolean expression that determines whether or not the content of the if element is used. For more information, see Conditional Elements
The institution element contains the name of an institution used as part of a contact.
The institution element contains raw text without markup.
The institution element has no attributes.
The layout element gives a name and format to a presentation file used to format HTML pages.
The layout element has no content.
The layout element has two required attributes and one optional attribute:
The required id attribute supplies a unique name for this layout element.
The required file attribute specifies the file containing the presentation information.
The class attribute tells the presentation file
format. Currently, two presentation formats are supported:
pll
and skeleton
. See Presentation Files for more
information.
The name element contains the one of the name parts of the contact name.
The name element contains raw text without markup.
The name element has one optional attribute:
The class attribute clarifies which part of the name of a contact this element refers to. The class may have one of the following values: initial, given, family, middle, number, or nickname.
The navigation element serves as a container for all of the navigable items on the web site. These items include anything that a user might navigate to in the course of viewing a site.
The navigation element can contain the conditional elements, if and choose plus zero or more of the navigational elements listed below:
This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The navigation element has two optional attributes, described below.
The id attribute supplies a unique identifier by which this navigation can be referenced.
The include attribute supplies a filename from which to read the content of the navigation. If this attribute exists, the navigation element must contain no content. For more information, see Element Inclusion.
The otherwise element provides the default behavior of the choose element. If none of the when tests evaluate to true, use the content of this element. If any of the when tests are true, discard the content of this element.
The otherwise element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the otherwise element replaced the element itself.
The otherwise element has no attributes.
The page element describes an individual page in the web site. It is used both to define meta-data about the page and also locate the page within the navigation. The page element should only be used to describe pages that are created with the WSDL tool. If a web page is referenced in the navigation, but is not created through WSDL, use the fpage element instead.
The page element contains a zero or more prop elements and optionally any of the conditional elements (if and choose.)
The page element has one required attribute and several optional attributes:
The required title attribute specifies a title to be used for the link that points to this page.
The src attribute lists the directory in which source file for this page can be found.
The dest attribute lists the file into which the completed page is written.
The href attribute specifies the URL for this page after the site is built.
The id attribute supplies a unique identifier by which this group can be referenced.
The keywords attribute lists the keywords to be added to this page.
The description attribute contains a short description of the page, sometimes used by search engines.
The content attribute contains the name of the file containing the main content of this page.
The layoutref attribute contains a reference to the layout for this page.
The bodyref attribute contains a reference to the body attributes for this page.
The styleref attribute contains references to any stylesheets for this page.
The copyref attribute contains a reference to the copyright for this page.
The pageref element is a reference to a page elsewhere in the web site. This element is needed because many sites are not trees, they are actually graphs. The pageref element allows a page to be referenced from multiple places in a web site.
The pageref element has no content.
The pageref element has one required attribute and two optional attributes:
The required ref attribute contains the unique identifier of the page to be referenced.
The fragment attribute contains the name of the page fragment if this reference points into a page instead of to the whole page. The page fragment is the part of the URL after the '#'.
The title attribute specifies a title to be used for the link that points to this pageref. If no title is supplied, the title from the referenced page is used.
The prop element is a project-specific property attached to the current page. For more information, see Page Properties
The prop element has no content.
The prop element has two required attributes:
The required name attribute a name to be used when this property must be referenced.
The required value attribute contains the actual value of this property.
The resources element is a container for all of the site resources that do not necessarily participate in site navigation. Examples include applets, images, and downloadable files. The design goal for this element is to allow the one WSDL file to contain all of the information about the web site whether it will be generated by the WSDL processor or not.
The resources element can contain the conditional elements, if and choose plus zero or more of the resource elements listed below:
This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The navigation element has one optional attribute:
The include attribute supplies a filename from which to read the content of the resources. If this attribute exists, the resources element must contain no content. For more information, see Element Inclusion.
The server element defines a logical name for a server or machine on the Web.
The server element has no content.
The server element has two required and one optional attribute:
The required id attribute supplies a unique name for this server element.
The required name attribute supplies the actual server name for this server element.
The class attribute supplies a broad category for the server to allow grouping for various validation and reporting features. Suggested classes might include: main, image, database, search, and ad.
The set element collects a set of resources together into one logical unit. The purpose of a set is to group several resources into a single logical entity. Often this grouping is used to apply common attributes to several resources at once. Usually, the attribute that is applied is the root directory.
The set element can contain the conditional elements, if and choose plus zero or more of the elements listed below:
This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The set element has no required attributes and three optional attributes:
The id attribute supplies a unique identifier by which this set can be referenced.
The include attribute supplies a filename from which to read the content of the set. If this attribute exists, the set element must contain no content. For more information, see Element Inclusion.
The src attribute lists the directory in which source files for this set can be found.
The dest attribute lists the directory into which the files from this set are written.
The root attribute defines the root of the directory tree for this set.
The siteinfo element contains meta-data that is used in the description of a web site. This includes all of the common elements and information about the site.
The siteinfo element serves as a container for various pieces of common data for a website. In addition to the conditional elements, if and choose, a siteinfo element can contain zero or more of the following elements:
After evaluation of any conditional elements, the siteinfo should contain no elements except those on the above list. This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The siteinfo element has three optional attributes:
The id attribute supplies a unique identifier for this siteinfo element.
The include attribute supplies a filename from which to read the content of the siteinfo. If this attribute exists, the siteinfo element must contain no content. For more information, see Element Inclusion.
The ref attribute references another siteinfo element to be used in this website. If this attribute exists, the siteinfo element must contain no content.
It is illegal for a siteinfo element to contain both a include and a ref. It is also illegal to have either a include or a ref and to contain content.
The stylesheet element defines style information for use on the web site.
The content of the stylesheet element is either empty or the internal stylesheet information to be applied to pages on the web site. Remember to place this data in a CDATA section if it contains markup not defined by WSDL.
The stylesheet element has two required attributes and three optional attributes:
The required id attribute supplies a unique name for this stylesheet element.
The required type attribute allows us to associate a mime-type with this stylesheet.
The href attribute specifies the URL to use when referencing an external stylesheet.
The src attribute gives the location in the source file system where this stylesheet is located. This attribute is only useful if the href attribute is also supplied.
The dest attribute lists the location where this stylesheet is to be written. This attribute is only useful if the href attribute is also supplied.
The subpage element serves as a placeholder for a navigational item that points into a page, not to the page as a whole. This distinction is important to prevent the attempted generation of subpages by the WSDL processor.
The subpage element has no content.
The subpage element has two required and two optional attributes:
The required title attribute specifies a title to be used for the link that points to this subpage.
The href attribute supplies the URL of the page that the subpage reference points into. This may be empty if the subpage reference is internal to the current page.
The required fragment attribute contains the name of the page fragment. The page fragment is the part of the URL after the '#'.
The id attribute supplies a unique identifier for this subpage element.
The text element is used to designate boilerplate text that appears on pages in the web site.
The content of the text element can be any combination of text and markup. If the markup is not defined by the WSDL markup language, it is usually contained in a CDATA section. This element must be empty if the include attribute is specified.
The text element has one required attribute and two optional attributes:
The required id attribute supplies a unique name for this text element.
The class attribute supplies a broad category for the text element to allow grouping for various validation and reporting features. Suggested classes might include: disclaimer, message, warning, etc.
The include attribute supplies a filename from which to read the content of the text element. If this attribute exists, the text element must contain no content. For more information, see Element Inclusion.
The website serves as a container for all elements in a web site description. The website element contains a description of the entire web site. This element can also be used to describe a sub-site within another web site.
A website element contains an optional siteinfo followed by one or more navigation elements, an optional resources element, and zero or more code elements. This element must be empty if the include attribute is specified. For more information, see Element Inclusion.
The website element has two required attributes and a large number of optional attributes:
The required id attribute supplies a unique identifier by which this website can be referenced.
The include attribute supplies a filename from which to read the content of the website. If this attribute exists, the website element must contain no content. For more information, see Element Inclusion.
The src attribute lists the base directory from which to obtain any source files for the creation of the website.
The dest attribute lists the directory into which the completed website is written.
The root attribute defines the root of the directory tree for this website.
The required main attribute specifies the id of the page which is the main page for this website.
The title attribute specifies a title to be used for the link that points to this website. If not specified, the title from the page referenced by main is used. This attribute is most useful on website elements which designate subsites in a main website.
The keywords attribute lists the keywords to be added to each page of the site that does not have keywords of its own.
The description attribute contains a short description of the page, sometimes used by search engines. This description may be applied to any page which does not have a description of its own.
The layoutref attribute contains a reference to the default layout for pages in the website.
The bodyref attribute contains a reference to the default body attributes for pages in the website.
The styleref attribute contains references to the default stylesheets for pages in the website.
The linktypes attribute defines the format of links between pages on the website. This attribute takes one of two values: absolute and relative. A value of absolute makes all links absolute URLs. To make all links relative URLs, use a value of relative. If no value is specified, the default is relative.
The copyref attribute contains a reference to the default copyright for the website.
The layout designates the default layout for the pages in this website.
The when element provides the tests for a choose element. If the test evaluates to true, include the content of the element. Otherwise, discard the content of this element.
The when element can contain any text or markup. The only requirement on this data is that it remain valid WSDL if the content of the when element replaced the element itself.
The when element has one required attribute.
The required test attribute contains a boolean expression that determines whether the content of the when element is used. For more information, see Conditional Elements
This section lists and describes all of the elements and attributes in the Page Layout Language (PLL). The purpose of PLL is to describe the presentation of an HTML page without getting bogged down in the actual coding details. The elements of the language are, therefore, exclusively related to page presentation. There is no direct support for images, fonts, colors, or other implementation details.
The body element contains the main presentation description of the output page.
The body element has two forms. The first form is a list of one or more row elements. The second form consists of a list of one or more of the following elements:
The body element has two optional attributes. These attributes are required if the body content consists of row elements, otherwise they are disallowed.
The cols attribute supplies the number of columns expected in each row in the page.
The rows attribute supplies the number of rows expected the page.
The cell element describes an individual piece of content in a row.
The cell element contains one or more of the following elements:
The cell element has six optional attributes:
The align attribute specifies the alignment to apply to the cell presentation element. Its values are the same as the equivalent attribute in HTML's td, left, center, and right.
The class attribute specifies a stylesheet class to apply to the cell presentation element.
The colspan attribute allows a cell to occupy two or more columns in the presentation.
The rowspan attribute allows a cell to occupy two or more rows in the presentation.
The align attribute specifies the vertical alignment to apply to the cell presentation element. Its values are the same as the equivalent attribute in HTML's td, top, center, bottom, and baseline.
The width attribute supplies a pixel or percentage width for the underlying table cell.
The code element references a code defined in the WSDL file for this web site.
The code element has no content.
The code element only has one optional attribute:
The ref attribute contains the id of a code element defined in the WSDL file. If no ref attribute is supplied, the first code defined in the WSDL file is used.
The content element specifies the place in the page presentation where the content for this page is to be placed. Optionally, this element can specify where the content of a file should be included.
The content element has no content.
The content element only has one optional attribute:
The file attribute specifies the name of a file to read and insert in place of this element. This allows multiple pieces of content to be blended together into a single page.
The footer element references a footer defined in the WSDL file for this web site.
The footer element has no content.
The footer element only has one optional attribute:
The ref attribute contains the id of a footer element defined in the WSDL file. If no ref attribute is supplied, the first footer defined in the WSDL file is used.
In the presentation of some pages, it is useful to add space between columns of information. The gutter presentation element gives this capability.
The gutter element has no content.
The gutter attribute has three optional attributes:
The width attribute supplies a pixel or percentage width for the underlying table cell.
The rowspan attribute allows a gutter to occupy two or more rows in the presentation.
The class attribute specifies a stylesheet class to apply to the gutter presentation element.
The head element described the meta-information that normally
goes in the head of the HTML document. When the WSDL processor
evaluates the head element, it automatically supplies the
title element, meta tags for keywords
and description
, and the link tag for the stylesheet
based on the page description.
The head element can contain either zero or more text or code elements.
The head element has no attributes.
The header element references a header defined in the WSDL file for this web site.
The header element has no content.
The header element only has one optional attribute:
The ref attribute contains the id of a header element defined in the WSDL file. If no ref attribute is supplied, the first header defined in the WSDL file is used.
All PLL files must have a layout element as the root element. This element contains all of the rest of the elements in the presentation description.
The layout element can contain an optional prelayout element, an optional head element, and a required body element.
The layout element has one optional attribute:
The class attribute specifies a particular form of output. Currently, only one form is supported html4. This attribute will eventually be used to tailor the output produced by the WSDL processor.
Some variations on output format require special code to appear before the root element of the markup. The prelayout element allows the specification this information. Information that may go here includes the language declaration for ASP or the DOCTYPE declaration for HTML.
The prelayout element contains the text that should occur before the root element of the output. Since the content of the prelayout element often contains characters that are not valid in character content, it is usually contained in a CDATA section. This element must be empty if the ref attribute is specified.
The prelayout element has one optional attribute:
The ref attribute supplies a reference to a text element from the WSDL file. The content of that element is placed before the root element in the output. If this attribute exists, the code element must contain no content.
The row element specifies the vertical positioning of pieces of the output page. All of the elements in a given row appear side-by-side in the output.
The row element can contain zero or more cell or gutter elements.
The row element has one optional attribute:
The class attribute specifies a stylesheet class to apply to the row presentation element.
The text element references a text defined in the WSDL file for this web site.
The text element has no content.
The text element only has one optional attribute:
The ref attribute contains the id of a text element defined in the WSDL file. If no ref attribute is supplied, the first text defined in the WSDL file is used.
This section lists and describes all of the macro commands that are available in
the WSDL system. These macro commands may be applied in many different contexts.
Macro commands may be used in attributes of WSDL elements to retrieve data from
other portions of the WSDL document. For example, if a developer wanted all of the
images on a web site to be referenced from a consistent directory structure, he
could build a directory element containing this information called
images. Any file elements defined for images can now use a
value like {{directory[images]}}/my_image.png
for the href
attribute. The images structure of the entire system can now be changed relatively
painlessly.
Macro commands are also useful for generating text inside text elements, presentation files, and content. This could be used for direct references to images as in the example above. Macro commands can also be used for inserting boilerplate text like disclaimers, headers, and footers. A more powerful use of macros involves code routines. This allows WSDL to execute arbitrary pieces of Perl code to construct text to place in the output. Using code routines, relatively complex navigation is quite easy. More importantly, if structural changes are made in the WSDL file, this code can regenerate the new navigation automatically.
Builtin macros are called by placing the name of the macro between double
curly braces where they should evaluate. (e. g. {{root}}
) If the
macro has parameters, the parameters are placed within the double curly braces
between parenthesis after the macro name. (e. g.
{{timestamp(gmt)}}
)
Return the currently defined body attributes in a format suitable for adding to an element.
Return the value of the command line parameter specified by the supplied argument.
Return the content defined for this page.
Return the appropriate meta tag to add a description to the current page.
Returns the value of the dest attribute on the outermost website.
Content of the first footer element defined in the WSDL file. Often useful as a default if none is set by the current page.
Return the fully-qualified URL for the current page.
Content of the first header element defined in the WSDL file. Often useful as a default if none is set by the current page.
Return the appropriate meta tag to add keywords to the current page.
Returns the value of the root attribute on the outermost website.
Returns the value of the src attribute on the outermost website.
Return the appropriate link or style code to include the this page's stylesheet in the HTML.
Return the current timestamp in Perl localtime
format. If
the parameter gmt
is supplied, the time in question will be
GMT.
Return the URL for the current page, minus the server name and protocol.
Code element macros are called using the form
{{code[routine_name]( args )}}
. The routine_name
is the
id specified for one of the code elements in the WSDL file. The
args
must be supplied as whitespace-separated name=value
pairs, e.g., {{code[nav_bar]( color=blue selected=white )}}
.
These arguments are passed as a hash to the code routine when it is called.
Attributes of the current element can be referenced using
{{@attr}}
, where attr
is the name of the attribute,
e.g., {{@title}}
. A specific element in the WSDL file can be
referenced by it's id using {{type[id]}}
, where
type
is the type of the element and id
is it's unique
id, e.g., {{page[main]}}
. These two can be combined with the syntax
{{type[id]/@attr}}
, e.g., {{page[main]/@href}}
.
Page properties can be accessed using the syntax {{prop[name]}}
.
The value of the property in this page with the supplied name is returned.
I created the ancestor of the DataDoc model before I knew that the Document Object Model (DOM) existed. When I discovered the standard DOM interface, I found that it was somewhat heavy-weight for my application. The DOM supports a large number of navigational and manipulation functions that are not required in this application. The data structure for the DOM is also complicated by the need to support this powerful navigation mechanism.
Additionally, much of the information in WSDL is stored in attributes of the individual elements. Although DOM does support attributes, gaining access to those attributes is a little awkward. This would have been a disadvantage in working with WSDL.
The DataDoc model consists of several classes:
XML::Data::DataDoc
The DataDoc
class is an abstraction for an XML entity.
It contains all of the elements and information from the XML file.
XML::Data::DataDoc::element
The element
class abstracts all elements of the
XML document.
XML::Data::DataDoc::text
The text
class abstracts all character data of the
XML document.
XML::Data::DataDoc::comment
The comment
class abstracts any comment from the
XML document.
XML::Data::DataDoc::pi
The pi
class abstracts any processing instruction from
the XML document.
XML::Data::DataDoc::cdata
The cdata
class abstracts CDATA sections from the XML
document.
The DataDoc
element encapsulates the entire XML document. This
object supports the following member functions:
Content
Encoding
Root
Standalone
Version
The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
AddContent
Print
MakeElement
element
object with the
given type.MakeText
text
object containing the
given text.MakeComment
comment
object containing
the given text.MakePI
pi
object with the given
target and text.MakeCdata
cdata
object containing
the given text.Any element data is stored in objects of type element
.
This includes the $curr
object passed to the code routine. An
element
contains all of the attributes and content of the
XML element it was read from.
The element
object interface includes the following member
functions:
AttribNames
AttribNames
member function returns a list of all of
the attribute names used on this element. The order of the returned
attribute names is not definedAttrib
Attrib
member function gets the value of an attribute
if the name of the attribute is the only parameter. If more than one
parameter is passed to Attrib
, the parameters are treated as
name, value pairs and the appropriate attributes are set.HasAttrib
HasAttrib
member function returns true if the element
has the attribute specified by the parameter.DelAttrib
DelAttrib
member function deletes the attribute
specified by the parameter.Content
element
objects, text is returned as text
objects, and CDATA
sections are returned as cdata
objects.ElementContent
ElementContent
member function returns a list of all
elements that are contained by the object.IsEmpty
IsEmpty
member function returns true if the element
has no content.TextContent
TextContent
member function returns all of the text
and CDATA sections included in the element as one string.Type
Type
member function returns the type of this
element as a string.Print
Print
member function prints the current
element
object and all of its contents to the standard output
as an XML element.The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
element
objects.AddContent
element
object.DelContent
element
object all content items specified by the string
argument. The form of this argument is identical to the Content
member function described below.DelContentItem
element
object the content item whose reference is passed as
an argument.ReplaceContent
The most general of the content member functions is Content
.
When called with no arguments, it returns a list of all of the items
contained in the element. Elements are returned as element
objects, text is returned as text
objects, and CDATA sections
are returned as cdata
objects.
The Content
member function can be called with various
parameters to restrict the portions of the content returned. The different
arguments are listed below:
Content
is called with a tag name as a string, it
returns a list of the child elements of that type.'*'
'*'
is passed to Content
, the
function returns a list containing all of the child elements contained in
the element, just like ElementContent
.'#text'
'#text'
, Content
returns a list of all the text pieces contained in the element.'#comment'
'#text'
, Content
returns a list of all the comments contained in the element.'#pi'
'#text'
, Content
returns a list of all processing instructions contained in the element.'#cdata'
'#text'
, Content
returns a list of all the CDATA sections contained in the element.The text
object is a thin wrapper over the actual text
in an element's content. It supports three useful member functions:
Content
Content
member function returns the string that this
object contains.Print
Print
member function prints the contents of the
current text
object to the standard output as legal XML
text.Type
Type
member function always returns the string
'#text'
. This is useful for distinguishing text
objects from element
objects.The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
text
objects.The comment
object is a thin wrapper over an XML comment. It
supports three useful member functions:
Content
Content
member function returns the string that this
object contains.Print
Print
member function prints the current object
to the standard output as a legal XML comment.Type
Type
member function always returns the string
'#comment'
. This is useful for distinguishing
comment
objects from element
objects.The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
comment
objects.The pi
object is a thin wrapper over an XML processing
instruction. The processing instruction target is accessible using the syntax
$pi->{target}
for a pi object stored in $pi
. From
the same object, the processing instruction data is accessible through
$pi->{data}
. It supports two useful member functions:
Print
Print
member function prints the current object
to the standard output as a legal XML processing instruction.Type
Type
member function always returns the string
'#pi'
. This is useful for distinguishing pi
objects from element
objects.The following member function should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
pi
objects.The cdata
object is a thin wrapper over the actual text
of a CDATA section. It supports three useful member functions:
Content
Content
member function returns the string that this
object contains.Print
Print
member function prints the current object
to the standard output as a legal XML CDATA section.Type
Type
member function always returns the string
'#cdata'
. This is useful for distinguishing
cdata
objects from element
objects.The following member functions should not be needed in any code routine used in WSDL. They are provided here for completeness.
new
cdata
objects.AddContent
cdata
object.<!-- DTD for the Web Site Description Language Version: 0.7 Author: G. Wade Johnson Copyright 2000, by G. Wade Johnson Released under the Perl Artistic License. --> <!-- Entities: attribute values --> <!ENTITY % fileref "CDATA"> <!ENTITY % url "CDATA"> <!ENTITY % rngnumber "CDATA"> <!ENTITY % email "CDATA"> <!ENTITY % mimetype "CDATA"> <!ENTITY % linktypes "(absolute|relative)"> <!ENTITY % fileclasses "(other|applet|archive|audio|binary| document|image|plugin|script|source| text|video)"> <!ENTITY % nameclasses "(initial|given|family|middle|number| nickname)"> <!ENTITY % loclasses "(pll|skeleton)"> <!ENTITY % codeclasses "(routine|begin|end)"> <!ENTITY % boolexpr; "CDATA"> <!-- Entities: attributes --> <!ENTITY % root "root %url; #REQUIRED"> <!ENTITY % copyref "copyref IDREF #IMPLIED"> <!ENTITY % styleref "styleref IDREFS #IMPLIED"> <!ENTITY % include "include %fileref; #IMPLIED"> <!ENTITY % codeclass "class %codeclasses; 'routine'"> <!ENTITY % layoutref "layoutref IDREF #IMPLIED"> <!ENTITY % bodyref "bodyref IDREF #IMPLIED"> <!ENTITY % dest "dest %fileref; #IMPLIED"> <!ENTITY % src "src %fileref; #IMPLIED"> <!-- Entities: content values --> <!ENTITY % rawtext "#PCDATA"> <!ENTITY % styledtext "#PCDATA"> <!ENTITY % source.code "#PCDATA"> <!ENTITY % person "honorific?, name+, degree*" <!ENTITY % contact.content "descr?,((%person;)|institution)"> <!ENTITY % nav.item "group|page|pageref|subpage|fpage| file|website"> <!ENTITY % cond "(if|choose)"> <!ENTITY % siteinfo.content "(server|directory|contact|copyright|text|header|footer| layout|body|%cond)*"> <!ENTITY % page.content "(%cond;|prop)*"> <!ENTITY % stylesheet.content "#PCDATA"> <!-- Element definitions --> <!-- website: describes an entire site. --> <!ELEMENT website (siteinfo?,navigation+,resources?,code*)> <!ATTLIST website id ID #REQUIRED %include; %src; %dest; %root; main IDREF #REQUIRED title CDATA #IMPLIED keywords CDATA #IMPLIED description CDATA #IMPLIED %styleref; linktype %linktypes; "relative" %layoutref; %bodyref; %copyref;> <!-- navigation: describes the navigation for a website --> <!ELEMENT navigation ((%nav.item;|%cond;)*)> <!ATTLIST navigation id ID #IMPLIED %include;> <!-- resources: list of non-navigational items on a website --> <!ELEMENT resources ((stylesheet|set|page|file|fpage|subpage| %cond;)*)> <!ATTLIST resources %include;> <!-- set: a grouping of resources that share common characteristics for instance a directory --> <!ELEMENT set ((stylesheet|set|page|file|fpage|subpage| %cond;)*)> <!ATTLIST set id ID #IMPLIED %include; %dest; %src; root %url; #IMPLIED> <!-- siteinfo: meta-data for the site --> <!ELEMENT siteinfo %siteinfo.content;> <!ATTLIST siteinfo id ID #IMPLIED %include; ref IDREF #IMPLIED> <!-- group: a group of pages. equivalent to a section in many sites. --> <!ELEMENT group ((%nav.item;|%cond;)+)> <!ATTLIST group id ID #IMPLIED %include; %dest; root %url; #IMPLIED main IDREF #IMPLIED href %url; #IMPLIED title CDATA #IMPLIED keywords CDATA #IMPLIED description CDATA #IMPLIED %layoutref; %bodyref; %styleref; %copyref;> <!-- page: a page to be built by this system. --> <!ELEMENT page %page.content;> <!ATTLIST page title CDATA #REQUIRED %src; %dest; href %url; #IMPLIED id ID #IMPLIED keywords CDATA #IMPLIED description CDATA #IMPLIED content %fileref; #IMPLIED %layoutref; %bodyref; %styleref; %copyref;> <!-- prop: a property of the page. These properties contain small pieces of information that may be used in the construction of a page. --> <!ELEMENT prop EMPTY> <!ELEMENT prop name NMTOKEN #REQUIRED value CDATA #REQUIRED> <!-- pageref: a reference to a page elsewhere in the web site. This element is needed because many sites are not trees, they are actually graphs. --> <!ELEMENT pageref EMPTY> <!ATTLIST pageref ref IDREF #REQUIRED fragment NMTOKEN #IMPLIED title CDATA #IMPLIED> <!-- subpage: a reference to a location/fragment of another page. --> <!ELEMENT subpage EMPTY> <!ATTLIST subpage title CDATA #REQUIRED href %url; #IMPLIED fragment NMTOKEN #REQUIRED id ID #IMPLIED> <!-- fpage: a "foreign page" which includes offsite links as well as local links to pages not built with this system. --> <!ELEMENT fpage EMPTY> <!ATTLIST fpage title CDATA #REQUIRED href %url; #REQUIRED fragment NMTOKEN #IMPLIED id ID #IMPLIED> <!-- file: a non-html file linked on the system. This item describes files like downloadables, PDF files, images, etc. Anything that we may wish to manage in this system but is not created by the system. --> <!ELEMENT file EMPTY> <!ATTLIST file id ID #IMPLIED title CDATA #IMPLIED %src; %dest; href %url; #REQUIRED class %fileclasses; #IMPLIED type %mimetype; #IMPLIED nav (yes|no) "yes"> <!-- server: define a logical name for a server/machine on the web. --> <!ELEMENT server EMPTY> <!ATTLIST server id ID #REQUIRED name NMTOKEN #REQUIRED class CDATA #IMPLIED> <!-- some server classes: main, image, database, search, ad --> <!-- directory: define a relative or absolute directory in the site. --> <!ELEMENT directory EMPTY> <!ATTLIST directory id ID #REQUIRED name %fileref; #REQUIRED class CDATA #IMPLIED> <!-- some directory classes: image, database, stylesheet, include, applet, script --> <!-- contact: information about a person or organization associated with the site. --> <!ELEMENT contact (%contact.content;)> <!ATTLIST contact id ID #REQUIRED href %url; #IMPLIED email %email; #IMPLIED> <!-- descr: description of a contact --> <!ELEMENT descr %styledtext;> <!-- portions of a name --> <!ELEMENT honorific %rawtext;> <!ELEMENT name %rawtext;> <!ATTLIST name class %nameclasses; #IMPLIED> <!ELEMENT degree %rawtext;> <!ELEMENT institution %rawtext;> <!-- copyright: copyright information for a site or page --> <!ELEMENT copyright %rawtext;> <!ATTLIST copyright id ID #REQUIRED year %rngnumber; #IMPLIED owner IDREF #IMPLIED> <!-- text: definition of boilerplate text that may be used on the site. --> <!ELEMENT text %styledtext;> <!ATTLIST text id ID #REQUIRED class CDATA #IMPLIED %include;> <!-- some text classes: disclaimer, message, --> <!-- if "include" is specified, read from there and ignore content. Should having content with src specified be an error?? --> <!-- header: definition of header text that may be used on the site. --> <!ELEMENT header %styledtext;> <!ATTLIST header id ID #REQUIRED %include;> <!-- if "include" is specified, read from there and ignore content. Should having content with src specified be an error?? --> <!-- footer: definition of header text that may be used on the site. --> <!ELEMENT footer %styledtext;> <!ATTLIST footer id ID #REQUIRED %include;> <!-- if "include" is specified, read from there and ignore content. Should having content with src specified be an error?? --> <!-- stylesheet: definition of style information for a site. --> <!ELEMENT stylesheet %stylesheet.content;> id ID #REQUIRED type %mimetype; #REQUIRED %src; %dest; href %url; #IMPLIED> <!-- if "href" specified, build a link. if content, build a style element and include inline. (What do we do about both?) --> <!-- layout: associates a name with a layout file. This file describes the layout of particular web pages. --> <!ELEMENT layout EMPTY> <!ATTLIST layout id ID #REQUIRED file %fileref; #REQUIRED class %loclasses; "pll"> <!-- code: container for code to be used in the construction of the website. --> <!ELEMENT code %source.code;> <!ATTLIST code id ID #REQUIRED %codeclass; %include;> <!-- cover the full list of attributes from the HTML body tag --> <!ELEMENT body EMPTY> <!ATTLIST body id ID #REQUIRED class CDATA #IMPLIED style CDATA #IMPLIED title CDATA #IMPLIED lang NAME #IMPLIED dir (ltr|rtl) #IMPLIED onclick CDATA #IMPLIED ondblclick CDATA #IMPLIED onmousedown CDATA #IMPLIED onmouseup CDATA #IMPLIED onmouseover CDATA #IMPLIED onmousemove CDATA #IMPLIED onmouseout CDATA #IMPLIED onkeypress CDATA #IMPLIED onkeydown CDATA #IMPLIED onkeyup CDATA #IMPLIED onload CDATA #IMPLIED onunload CDATA #IMPLIED background CDATA #IMPLIED bgcolor CDATA #IMPLIED text CDATA #IMPLIED link CDATA #IMPLIED vlink CDATA #IMPLIED alink CDATA #IMPLIED> <!-- Conditionals the names and basic functionality are copied from XSL, but the tests are considerably simpler. --> <!-- if: provides simple if-then functionality. If the test evaluates to true include the content of the element. Otherwise, discard the content of this element. --> <!ELEMENT if ANY> <!ATTLIST if test %boolexpr; #REQUIRED> <!-- choose: provides the ability to choose among multiple options. This feat is accomplished with the help of the when and otherwise elements. --> <!ELEMENT choose (when+,otherwise?)> <!-- when: provides the tests for a choose element. If the test evaluates to true include the content of the element. Otherwise, prune this subtree. --> <!ELEMENT when ANY> <!ATTLIST when test %boolexpr; #REQUIRED> <!-- otherwise: provides the default behavior of the choose element. If none of the when tests evaluate to true, use the content of this element. Otherwise, discard the content of this element. --> <!ELEMENT otherwise ANY>
<!-- DTD for the Page Layout Language Version: 0.7.1 Author: G. Wade Johnson Copyright 2000, by G. Wade Johnson Released under the Perl Artistic License. --> <!-- Entities: attribute values --> <!ENTITY % fileref "CDATA"> <!ENTITY % number "CDATA"> <!ENTITY % length "CDATA"> <!ENTITY % styleclass "CDATA"> <!ENTITY % valignvals "(top|center|bottom|baseline)"> <!ENTITY % alignvals "(left|center|right)"> <!-- Entities: content values --> <!ENTITY % rawtext "#PCDATA"> <!-- Element definitions --> <!ELEMENT layout (prelayout?, head?, body)> <!ATTLIST layout class (html4) "html4"> <!ELEMENT prelayout (%rawtext;)> <!ATTLIST prelayout ref NMTOKEN #IMPLIED> <!-- ref to a text element --> <!ELEMENT head ((text|code)*)> <!ATTLIST head > <!ELEMENT body ((row+)|(content|header|footer|text|code)+)> <!ATTLIST body cols %number; #IMPLIED rows %number; #IMPLIED> <!ELEMENT row ((cell|gutter)*)> <!ATTLIST row class %styleclass; #IMPLIED> <!ELEMENT cell (header|footer|text|code|content)+> <!ATTLIST cell width %length; #IMPLIED colspan %number; #IMPLIED rowspan %number; #IMPLIED valign %valignvals; #IMPLIED align %alignvals; #IMPLIED class %styleclass; #IMPLIED> <!ELEMENT gutter EMPTY> <!ATTLIST gutter width %number; #IMPLIED rowspan %number; #IMPLIED class %styleclass; #IMPLIED> <!ELEMENT header EMPTY> <!ATTLIST header ref NMTOKEN #IMPLIED> <!ELEMENT footer EMPTY> <!ATTLIST footer ref NMTOKEN #IMPLIED> <!ELEMENT code EMPTY> <!ATTLIST code ref NMTOKEN #IMPLIED> <!ELEMENT text EMPTY> <!ATTLIST text ref NMTOKEN #IMPLIED> <!ELEMENT content EMPTY> <!ATTLIST content file %fileref #IMPLIED>
This is some of the source for the Internet College example site. The only portion of the source reproduced here is that which relates to WSDL. Most of the support files are not included.
This is the complete WSDL description of the Internet College web site. This
example shows some of the more interesting features of WSDL. The first such
feature is the conditional compilation based on the PLL
command
line argument. If this argument exists and is non-zero, the web site is built
using the PLL-based presentation. Otherwise, the web site is built using the
skeleton-based presentation.
This web site uses a standard header and footer with a dynamic
navigation bar on the left of the page. In actuality, the navigation is static
HTML. There is a similar navigation bar on each page, the left_nav
code routine generates the appropriate code for each page. The
nav_utils
element contains utility code needed to build the
navigation bar.
In the resources element, there are a pair of PDF files that are intended to be referenced off the Registration and Billing Welcome page. These files are not part of the main structure for the site, but they are still maintained as part of the WSDL description.
The faculty pages are a good description of something that is awkward in most Server Side Scripting systems. The design constraint on the faculty pages is a standard page presentation, with consistent information. If the faculty member has a college home page, this page should reference that location. The interesting parts of this design are the fact that most of the page content is retrieved from a database at the time the web site is complied. The idea is that this data does not change very rapidly, so retrieving it at the time of request is wasteful of resources. In addition, we check a mounted directory in the file system to see if the faculty member has set up a home page.
The faculty pages are generated by the faculty_content
text element containing several code routine references. The related
code elements are database
and enddatabase
,
for maintaining access to the faculty database, and stats
and
cv
for generating the appropriate content for the pages.
The Registration and Billing system consists of two CGI scripts,
schedule.cgi
and billing.cgi
. These scripts are not
built by the WSDL processor. However, the design calls for those scripts to
wrap their output in templates that are generated from WSDL. This allows the CGI
scripts to maintain a consistent look with the rest of the site.
<!-- Internet College Website --> <website id="wwwic" main="front" root="{{server[icmain]}}/" layoutref="standard" bodyref="defbody" src="icsrc" styleref="stylesheet1" linktype="absolute"> <siteinfo> <contact email="webmaster@inet.edu" id="webmaster"> <name>Webmaster</name> </contact> <choose> <when test="$CmdLine{TestSite}"> <server id="icmain" name="{{cmdline(TestSite)}}"/> <server id="homes" name="{{cmdline(TestSite)}}/faculty"/> </when> <otherwise> <server id="icmain" name="http://www.inet.edu"/> <server id="homes" name="http://homes.inet.edu"/> </otherwise> </choose> <choose> <when test="$CmdLine{PLL}"> <layout id="standard" file="icsrc/layout/standard.pll" class="pll"/> <layout id="fac" file="icsrc/layout/faculty.pll" class="pll"/> </when> <otherwise> <layout id="standard" file="icsrc/layout/standard.html" class="skeleton"/> <layout id="fac" file="icsrc/layout/faculty.html" class="skeleton"/> </otherwise> </choose> <body id="defbody" bgcolor="ivory"/> <footer id="def_footer"><![CDATA[<p> <hr size="1" noshade> <address> Email: <a href="mailto:{{contact[webmaster]/@email}}"> {{contact[webmaster]}}</a>. </address></p> <a href="{{page[disclaimer]/@href}}">Disclaimer and Legal Information</a> <h5> © 2000 Internet Univeristy.<br> All rights reserved.<br> {{full_url}} </h5> ]]></footer> <header id="def_header"><![CDATA[ <img src="{{root}}images/banner.png" alt="Inet College" width="600" height="80"> <h1>{{@title}}</h1> ]]></header> <text id="faculty_content"><![CDATA[ <table border="0" cellpadding="0" cellspacing="0"> <tr><td valign="top" width="100"> <img src="{{root}}images/faculty/{{prop[homedir]}}.png" alt="id picture" height="150" width="100"> </td> <td valign="top" align="left">{{code[stats]()}}</td> </tr> <tr><td colspan="2">{{code[cv]()}}</td></tr> </table> ]]></text> </siteinfo> <!-- Start of Site Navigation --> <navigation> <page href="{{root}}index.html" id="front" dest="index.html" title="Internet College Home" content="home.html"/> <page href="{{root}}disclaimer.html" title="Internet College Disclaimer" id="disclaimer" dest="disclaimer.html" content="disclaimer.html"/> <group main="front" id="mainsections"> <group title="General Information" main="geninfo" root="{{root}}info/" dest="info/"> <page id="geninfo" title="Information" content="geninfo.html" href="index.html" dest="index.html"/> <page id="directions" title="Directions" content="directions.html" href="directions.html"/> <page id="cal" dest="calendar.html" href="calendar.html" title="Inet College Calendar" content="cal2000-1.html"/> </group> <group title="Faculty" main="stafflist" dest="faculty/" root="{{root}}faculty/" layoutref="fac"> <page id="stafflist" title="Our Faculty" href="Faculty.html" layoutref="standard" content="faculty.html"/> <page id="fac1" title="Dr. G. Brown" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="brown"/> </page> <page id="fac2" title="Dr. J. Bashir" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="bashir"/> </page> <page id="fac3" title="Dr. B. Crusher" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="crusher"/> </page> <page id="fac4" title="Dr. S. Franklin" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="sfranklin"/> </page> <page id="fac5" title="Dr. D. Sculley" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="sculley"/> </page> <page id="fac6" title="Professor C. Xavier" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="profx"/> </page> <page id="fac7" title="Dr. L. Zimmerman" dest="{{prop[homedir]}}.html" href="{{prop[homedir]}}.html"> <prop name="homedir" value="zimmer"/> </page> </group> <group title="Registration and Billing" main="billinfo" root="{{root}}billing/" dest="billing/"> <page id="billinfo" title="Welcome" href="index.html" content="billinfo.html"/> <!-- Create a template used by schedule.cgi for look&feel --> <page title="Your Schedule" id="yrsched" dest="schedule_tmpl.html" href="schedule.cgi" content="cgi_template.html"/> <!-- Create a template used by bill.cgi for look&feel --> <page title="Your Bill" id="yrbill" dest="bill_tmpl.html" href="bill.cgi" content="cgi_template.html"/> <page title="Financial Aid" id="finaid" href="finaid.html" content="financial.html"/> </group> </group> </navigation> <resources> <stylesheet href="{{root}}inet.css" id="stylesheet1" src="inet.css" type="text/css"/> <set id="schedules" root="{{root}}info/" dest="info/"> <file title="Class schedule, Fall 2000" href="fall2000.pdf" src="schedules/fall2000.pdf" class="document"/> <file title="Class schedule, Spring 2001" href="spring2001.pdf" src="schedules/spring2001.pdf" class="document"/> </set> <set id="imageset" src="images/" dest="images/"> <file title="Building" class="image" href="s18.gif" dest="s18.gif"/> <file title="Inet College" class="image" href="banner.png" dest="banner.png"/> </set> <set id="billscripts" src="billing/" dest="billing/"> <file class="script" href="schedule.cgi"/> <file class="script" href="bill.cgi"/> </set> <set id="facultyphotos" root="{{root}}images/faculty/" src="images/faculty/" dest="images/faculty/"> <file class="image" href="brown.png" dest="brown.png"/> <file class="image" href="bashir.png" dest="bashir.png"/> <file class="image" href="crusher.png" dest="crusher.png"/> <file class="image" href="sfranklin.png" dest="sfranklin.png"/> <file class="image" href="sculley.png" dest="sculley.png"/> <file class="image" href="profx.png" dest="profx.png"/> <file class="image" href="zimmer.png" dest="zimmer.png"/> </set> </resources> <!-- Set up interface to the Faculty database. --> <code id="database" class="begin"><![CDATA[ use lib 'icsrc'; use FacultyDatabase; open_faculty_database() or die "Unable to access faculty data.\n"; ]]></code> <!-- Shut down interface to the Faculty database. --> <code id="enddatabase" class="end"><![CDATA[ close_faculty_database() or die "Unable to close faculty data.\n"; ]]></code> <code id="lwp" class="begin"><![CDATA[ use LWP::Simple; ]]></code> <!-- These utility functions support the left-hand navigation functionality. --> <code id="nav_utils" class="begin"><![CDATA[ sub leftnav_expand { my $curr = shift; my $ancestors = shift; my $top = shift; my $indent = shift || ""; my $output = ""; foreach my $c ($top->Content( '*' )) { my $p = $c; my $a = $ancestors; $p = $wsdl->byID( $c->Attrib('ref') ) if 'pageref' eq $c->Type(); if('group' eq $c->Type()) { $p = $wsdl->byID( $c->Attrib('main') ); $a = [ $c, @{$ancestors} ]; $output .= nav_row( 0, $p, $c->Attrib('title'), $a, $indent ); } else { $output .= nav_row( $curr, $p, $c->Attrib('title'), $a, $indent ); } if('group' eq $c->Type() and descendent_of( $c, $curr )) { $output .= leftnav_expand( $curr, [ $c, @{$ancestors}], $c, $indent . " " ); } } $output; } sub nav_row { my $curr = shift; my $p = shift; my $title = shift || $p->Attrib('title'); my $ancestors = shift; my $indent = shift; my $output = ""; local $Context[0]->{prop}; # expand properties in navbar set_page_properties( $p ); my $href = make_href( $p, @{$ancestors} ); my ($rstyle, $tstyle) = ("navrow", "navtext"); ($rstyle, $tstyle) = ("currnavrow", "currnavtext") if($p == $curr); $output .= qq{<tr><td class="$rstyle">$indent}; $output .= qq{<a href="$href" class="$tstyle">$title</a>}; $output .= qq{</td></tr>\n}; $output; } ]]></code> <!-- Build left-hand navigation. --> <code id="left_nav"><![CDATA[ my $output = qq{<table border="0"}; $output .= qq{ width="$args{width}"} if exists $args{width}; $output .= qq{ cellpadding="$args{cellpadding}"} if exists $args{cellpadding}; $output .= qq{ cellspacing="$args{cellspacing}"} if exists $args{cellspacing}; $output .= qq{>\n}; my $top = ($website->Content('navigation'))[0]; $top = $wsdl->byID( $args{top} ) if exists $args{top}; my $home = $wsdl->byID( $website->Attrib('main') ); $output .= nav_row( $curr, $home, $home->Attrib('title'), $ancestors, '' ); $output .= leftnav_expand( $curr, $ancestors, $top ); $output .= qq{</table>\n}; $output; ]]></code> <!-- Retrieve the faculty statistics and display statistics. --> <code id="stats"><![CDATA[ my $homedir = $Context[0]->{prop}->{homedir}; my $rec = get_faculty_data( $homedir ); my $output = ''; $output .= "$rec->{fullname}<br>\n"; $output .= "$rec->{position}<br>\n<hr size='1' noshade>\n"; $output .= "Office: $rec->{office}<br>\n"; $output .= "Phone: $rec->{phone}<br>\n"; $output .= "email: $rec->{email}<br>\n"; my $url = $rec->{website} || resolve_macros( "{{server[homes]}}/~$homedir/index.html" ); $output .= qq{<a href="$url">Home Page</a>} if get( $url ); $output; ]]></code> <!-- Retrieve the faculty biographical information and display statistics. --> <code id="cv"><![CDATA[ my $rec = get_faculty_data( $Context[0]->{prop}->{homedir} ); my $output = ''; if($rec->{bio}) { $output .= "Current Biographical Information:<br>\n"; $output .= $rec->{bio}; } else { $output .= "No biographical information available.<br>\n"; } if(@{$rec->{courses}}) { $output .= "<p>Courses:</p>\n"; $output .= "<ul>\n"; foreach my $c (@{$rec->{courses}}) { $output .= " <li>$c</li>\n"; } $output .= "</ul>\n"; } else { $output .= "<br>Not scheduled for any classes at this time.<br>\n"; } ]]></code> </website>
The following skeleton-based presentation files provide the basic structure of the pages on the Internet College example web site.
This is the presentation file for all of the pages on the web site except the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site.
<html> <head> <title>{{@title}}</title> {{stylesheet[stylesheet1]}} </head> <body {{body}}> <table border="0" cellspacing="0" cellpadding="0" width="100%"> <tr> <td colspan="2">{{header}}</td> </tr> <tr><td class="navbg" width="120" valign="top"> {{code[left_nav](width=100% cellspacing=0 top=mainsections)}} </td> <td width="660" valign="top">{{content}}</td> </tr> <tr> <td colspan="2">{{footer}}</td> </tr> </table> </body> </html>
This is the presentation file for the faculty pages. It includes the standard
header and footer, as well as the left hand navigation used throughout the
site. In addition, the content for the page is generated from the
faculty_content
text element.
<html> <head> <title>{{@title}}</title> {{stylesheet[stylesheet1]}} </head> <body {{body}}> <table border="0" cellspacing="0" cellpadding="0" width="100%"> <tr> <td colspan="2">{{header}}</td> </tr> <tr><td class="navbg" width="120" valign="top"> {{code[left_nav](width=100% cellspacing=0 top=mainsections)}} </td> <td width="660" valign="top">{{text[faculty_content]}}</td> </tr> <tr> <td colspan="2">{{footer}}</td> </tr> </table> </body> </html>
The following PLL presentation files provide the basic structure of the pages on the Internet College example web site.
This is the PLL file for all of the pages on the web site except the faculty pages. It includes the standard header and footer, as well as the left hand navigation used throughout the site.
<layout> <head/> <body cols="2" rows="3"> <row><cell colspan="2"><header/></cell></row> <row><cell class="navbg" width="120" valign="top"> <code ref="left_nav(width=100% cellspacing=0 top=mainsections)"/> </cell> <cell width="660" valign="top"><content/></cell></row> <row><cell colspan="2"><footer/></cell></row> </body> </layout>
This is the PLL file for the faculty pages. It includes the standard header
and footer, as well as the left hand navigation used throughout the site. In
addition, the content for the page is generated from the
faculty_content
text element.
<layout> <head/> <body cols="2" rows="3"> <row><cell colspan="2"><header/></cell></row> <row><cell class="navbg" width="120" valign="top"> <code ref="left_nav(width=100% cellspacing=0 top=mainsections)"/> </cell> <cell width="660" valign="top"><text ref="faculty_content"/> </cell> </row> <row><cell colspan="2"><footer/></cell></row> </body> </layout>
[1] Broumphrey, Frank et al. XML Applications. Wrox Press Ltd., 1998.
[2] Brown, William J. et al. AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. John Wiley & Sons, Inc., 1998.
[3] Connolly, Dan, ed. XML: Principles, Tools, and Techniques. World Wide Web Journal. O'Reilly & Associates, Winter 1997.
[4] Fleming, Jennifer. Web Navigation: Designing the User Experience. O'Reilly & Associates, 1998.
[5] Harold, Elliotte Rusty. XML Bible. IDG Books Worldwide, 1998.
[6] Liu, Bowen. "Object-oriented Templates for WWW Development and Management: A Model and Its Implementation," Master of Science thesis, University of Houston, December 1999.
[7] Rosenfield, Louis, and Peter Morville. Information Architecture for the World Wide Web. O'Reilly & Associates, 1998.
[8] Stein, Lincoln, and Doug MacEachern. Writing Apache Modules with Perl and C. O'Reilly & Associates, 1999.
[9] St. Laurent, Simon. XML Elements of Style. McGraw-Hill, 2000.
[10] St. Laurent, Simon, and Ethan Cerami. Building XML Applications. McGraw-Hill, 1999.
[11] Walsh, Norman, and Leonard Muellner. DocBook: The Definitive Guide. O'Reilly & Associates, 1999.
[12] The Apache Software Foundation. Cocoon. http://xml.apache.org/cocoon/index.html, August, 2000.
[13] The Apache Software Foundation. Cocoon: A Publishing Infrastructure. http://xml.apache.org/cocoon/infrastructure.html, August, 2000.
[14] Le Hégaret, Philippe. Document Object Model (DOM). http://www.w3.org/DOM/, August 11, 2000.
[15] Berners-Lee, Tim. The original proposal of the WWW, HTMLized. http://www.w3.org/History/1989/proposal.html, May, 1990.
[16] Bray, Tim, Jean Paoli, and C. M. Sperberg-McQueen, ed. Extensible Markup Language (XML) 1.0. http://www.w3.org/TR/1998/REC-xml-19980210, February 10, 1998.
[17] Bos, Bert. Web Style Sheets. http://www.w3.org/Style/, November 27, 1999.
[18] Carter, Josh. gxml2html: generic XML to HTML conversion tool. http://multipart-mixed.com/xml/, November 14, 1999.
[19] Clark, James, ed. XSL Transformations (XSLT). http://www.w3.org/TR/xslt, November 16, 1999.
[20] Clark, James, and Steve DeRose, ed. XML Path Language (XPath). http://www.w3.org/TR/xpath, November 16, 1999.
[21] Connolly, Dan. Extensible Markup Language (XML). http://www.w3.org/XML/, November 26, 1999.
[22] Connolly, Dan. The XML Revolution. http://helix.nature.com/webmatters/xml/xml.html, October 1, 1998.
[23] Cover, Robin. WAP Wireless Markup Language Specification (WML). http://www.oasis-open.org/cover/wap-wml.html, November 23, 1999.
[24] Deach, Stephen, ed. Extensible Stylesheet Language (XSL). http://www.w3.org/TR/WD-xsl, April 21, 1999.
[25] Goldfarb, Charles F. The Roots of SGML -- A Personal Recollection. http://www.sgmlsource.com/history/roots.htm, October 11, 1997.
[26] Lie, Håkon Wium, and Bert Bos. Cascading Style Sheets, level 1. http://www.w3.org/TR/REC-CSS1, January 11, 1999.
[27] Bos, Bert et al. Cascading Style Sheets, Level 2. http://www.w3.org/TR/REC-CSS2/, May 12, 1998.
[28] Michel, Thierry. Synchronized Multimedia. http://www.w3.org/AudioVideo/, May 3, 2000.
[29] NCSA. Common Gateway Interface. http://hoohoo.ncsa.uiuc.edu/cgi/intro.html, December 6, 1995.
[30] NCSA HTTPd Development Team. NCSA HTTPd Tutorial: Server Side Includes (SSI). http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html, September 28, 1995.
[31] Raggett, Dave, and Ian Jacobs. HTML Home Page. http://www.w3.org/MarkUp/, November 28, 1999.
[32] Sperberg-McQueen, C. M., and Lou Burnard. A Gentle Introduction to SGML. http://www.uic.edu/orgs/tei/sgml/teip3sg/index.html, March 5, 1996.
[33] Tauber, James. Web, Internet, Networks at SCHEMA.NET. http://www.schema.net/web/#cdf, June 25, 2000.
[34] The Unicode Consortium. Unicode Home Page. http://www.unicode.org/, September 29, 2000.
[35] UserLand Software, Inc. XML-RPC Home Page. http://www.xmlrpc.com/, June 02, 2000.
[36] Webb, Martin. irt.org Knowledge Base: Q5800 What is ASP?. http://developer.irt/org/script/5800.htm, June 3, 2000.
[37] West, Mark. The Server Side Includes Tutorial. http://www.carleton.ca/~dmcfet/html/ssi1.html#yeah, February 19, 1995.
[38] The World Wide Web Consortium. The World Wide Web Consortium. http://www.w3.org/, November 24, 1999.
[39] The World Wide Web Consortium. HyperText Mark-up Language. http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/MarkUp.html, November 3, 1992.
[40] The World Wide Web Consortium. W3C's Math Home Page. http://www.w3.org/Math/, June 2, 2000.
[41] Zara, Steve. XML-CML.ORG - The Site for Chemical Markup Language. http://www.xml-cml.org/, August 31, 2000.
[42] Andrivet, Sebastien. "A Simple XML Parser," C/C++ Users Journal 17, no. 7 (July 1999): 22-32.
[43] Hamstra, Dirk. "XML and CORBA," Dr. Dobb's Journal, no. 305 (November 1999): 98-100.
[44] Mann, Steve. "The Wireless Application Protocol," Dr. Dobb's Journal, no. 304 (October 1999): 56-66.
[45] Monson, Lynn. "The WIDL Specification," Dr. Dobb's Journal, no. 291 (November 1998): 92-96.
[46] Sintes, Tony. "XML and Software Configuration," Dr. Dobb's Journal, no. 314 (July 2000): 56-62.
[47] Goldman, Roy, Jason McHugh, and Jennifer Widom. "Lore: A Database Management System for XML," Dr. Dobb's Journal, no. 311 (April 2000): 76-80.