PATH: Instructional Server> Internet Fundamentals>

Web Page Authoring Overview


BASIS & SCOPE

The practice of composing web pages and sites requires a variety of technical skills as well as many of the abilities and talents normally associated with any type of creative endeavor. A successful web author should have knowledge and skill in all of the following areas:

This document was developed under the assumption that its students already have at least a basic ability in all of these areas. The focus of this document is on introducing one computer language know as HyperText Markup Language or HTML and showing students how to bring all of the skills mentioned above together in order to create web pages and web sites.


WEB TERMINOLOGY

Before reading the remainder of this handout, you may find it useful to scan through the Web Page Authoring Glossary web page located at:

http://www.gibsonr.com/classes/internet/wpa_glos.html

It contains a cross-referenced list of many of the terms used in discussing web page authoring and the HyperText Markup Language.


HOW WEB PAGES WORK

Web pages are not written - they are constructed. The visual object that the public calls a "web page" is actually manufactured by each web browsing program (such as Microsoft Internet Explorer® or Netscape® Navigator) each time the page is displayed. A browser starts its work by interpreting a command that is entered by the user or hidden in another web page. This command is called a URL (Uniform Resource Locator). The browser uses this instruction to locate the file that you want displayed and then retrieves the file from its storage place (perhaps on a distant web server, or maybe from a file on your PC). The browser must then try to recognize what type of data it has received in order to determine how to handle it. As a multimedia program, a browser is capable of handling many different forms of data including text, graphics, audio and video (just to name a few). However, no browser is capable of displaying all forms of data. In fact, the ability of browsers in that regard is really quite limited. Computers use thousands of different coded languages to represent the myriad of data forms they process. Even an ordinary PC may use hundreds of different languages during its monthly operation. Each form of data (text, graphics, etc.) typically has many different languages that could be used to represent it in computer storage. For example, most PC's are programmed to recognize as many as ten different languages just for storing graphic data such as icons and pictures. The specific language used to store a data file is usually indicated by the extension (suffix) appended to the end of the file's name when it is stored. For example, the extension GIF indicates that a file is stored in the popular "Graphics Information Format" language. Most browsers are written to recognize this language and would be able to display (or "render") the data without any assistance from another program. For this reason, the GIF language is referred to as a "native data format" for browsers. On the other hand, many architectural drawings are stored using "drawing" file format identified by the extension DWG. This is not a native data format to a web browser and so it cannot render these files without help from another program. Browsers can be enhanced to use additional programs called "viewers" (or "helper applications") to render some non-native data formats, but not all of them.


WEB CLIENTS (BROWSERS) VS. WEB SERVERS

All Internet software functions as one part of a pair of programs called clients and servers. A "client" is one of a pair of network programs that work in unison to retrieve data from a host and present it to a user. The client is the program that interacts with the user. Web clients are called browsers. A "server" is the other of the pair of network programs that work in unison to retrieve data from a host and present it to a user. The server is the program that runs on the target host and responds to requests by "client" programs for data. Although web pages are usually stored on servers, browsers render them; and each browser has different abilities and may result in a slightly different web page being rendered.

Regardless of which browser is used, the process involved in rendering a web page is basically the same. The steps are:

  1. The user either clicks on a link of enters a URL on the browser'address bar.
  2. The browser parses the URL to determine the target protocol and device and then sends the request to the target device (server or local machine).
  3. (Assuming an http request) the target device responds to the request. If the target file is an ordinary web file (.html or .htm), then a copy of that file is sent back to the web browser. If the target file is a server-side script (eg. a .asp or .php) file, then the script runs on the server and typically produces and returns a web page as its output.
  4. When the browser receives the returned html file, it is typically stored in a "cache" (temporary storage folder) to speed up any future requests to read the same page. The use of a cache can be disabled by the user; but most browsers use one by default. After a preset period of time (typically a few weeks), older files will be dropped from the cache and replaced by more recent ones. The browser also will record the URL for the file in a "history list" to make it easy to find an reload the page again if desired. This list is maintained automatically, but can be managed by the user.
  5. Before reading and rendering the html file, a modern browser will analyze it and construct a hierarchical representation of it in main memory. This representation is called the DOM (Document Object Model). This model can be used by scripts (small text-based programs) within the web page file to alter the structure and look of the web page, making it dynamic (reactive to user actions such as button presses or rollovers).
  6. Finally, the browser reads the DOM and renders the web page based on it. If the file contains references to other objects (such as images, music or videos), then the browser sends requests for those to the target device and inserts them into the finished web page.

If any errors occur during these actions, the browser must contain instructions defining how to resolve them. Well written pages (free from syntax errors) will transfer and render more quickly and efficiently than poorly written pages. Browsers can compensate for variety of common errors, but this requires the browser program to be larger and slower than necessary. Web authors now try to write valid source code. The better web pages are written, the smaller and more efficient the browsers can be. The proliferarion of web technology embedded in small devices such as cell phones and wrist watches makes the production of valid source code more and more important.

WEB CLIENT HARDWARE ISSUES

Hardware will also affect how a browser can render a web page. For example, many computer screens have a "resolution" (dot density) of 800 pixels (dots of light) across by 600 pixels high. Other screens have a higher resolution of 1280 by 1024 pixels. Each image has a fixed size in pixels. Thus, an image that is 400 pixels wide will use half of the page width on an 800x600 screen as opposed to less than a third of the width of a 1280x1024 screen. If the reader's screen has a smaller 640x480 resolution, then the image will take up more of the screen and may also affect the layout of surrounding text. Although the author of a web page defines its content and layout, the browser and the hardware that it is running on have full control over how the page will be rendered including such details as font and window size. It is wise to view your web pages using more than one type of computer with more than one browser.


HTML - HYPERTEXT MARKUP LANGUAGE

The HyperText Markup Language (or HTML) is the primary language used to create documents for the World Wide Web. HTML is used to define the structure of a document and to a lesser degree its format or appearance. The language was designed to define documents in a simple, portable way that could be interpreted by any kind of computer system, regardless of size or manufacturer. HyperText is a system of text that is cross-referenced, usually by storing it in separate files in separate locations, sometimes on very distant machines. The text contains hidden embedded instructions known as "tags" that are used to enhance its appearance or to place a non-text object within it when it is displayed. The tags are not displayed by web browsers, but rather are interpreted by them as instructions about how the page should be displayed. Readers of web pages actually do not see the HTML code that is retrieved from web servers, but rather see their browser's interpretation of that language. The tags can be used to provide cross-referencing information known as "hyperlinks" to other documents or to indicate positions within the text where enhancements such as boldfacing or changes in character size should be applied to the text when it is being displayed by the browser. Note that most browsers provide some menu choice that will display the true contents of an HTML file (including the tags) if desired. In Internet Explorer, just right-click on the background of the page being viewed and then select "View Source" from the context menu that appears.

HTML files are indicated on the web by the extension html (or just htm) that is appended to the end of their filenames. Multimedia objects such as pictures are not actually stored inside of HTML files, but are referenced in them using tags that contain information called "hypertext references". Web browsers interpret these references to determine the location of and then retrieve multimedia objects and finally include them with the text in the file to create the resulting web page. Thus, web pages are not stored as a whole; rather they are rendered (visually constructed) each time they are viewed.


RENDERING OF HTML BY BROWSERS & VIEWERS

Web pages containing multimedia objects may require additional hardware to be rendered depending on how many different forms of data (media) you plan to use and how many your system can currently manipulate. Additional programs called "viewers" may be needed to display or playback some forms of data such as non-standard graphics, music or movies. Each browser has a limited ability to render data. Some data formats will be "native" to a browser, other will not. Non-native data formats are not an insurmountable problem though. Additional software (often free) can be used to view them. Any program that help to render data is called a "viewer" regardless of the form of the data (even programs that play audio data are called viewers). Viewers that can be used to render or manipulate multimedia data independently from web browsers are called "helper applications". Many software companies have developed program modules (or "applets") that can be added to their browsers to increase their data literacy. Such modules are called "add-ins" or "plug-ins". They are not full stand-alone programs and cannot run without their parent program. You can determine which data formats are native to your browser by retrieving the [WWW Viewer Test Page] from the University of Wisconsin - Madison. It provides sample data files in many different data formats that you can attempt to view using your browser. If your program is unable to display any of these data formats, then that sample page offers links to information about them, including where to find viewers.


WWW PROTOCOLS & URL'S

A "protocol" is a standardized set of rules under which programs are developed to promote uniformity of a network service or resource such as e-mail or the World Wide Web. For example, all web software is written to conform to a protocol named HTTP or HyperText Transfer Protocol. An older protocol known as FTP (or "File Transfer Protocol") has long been used to transfer files between computers on the Internet regardless of their type or the brand of software being used. FTP clients allow users to upload and download files to and from web servers and other computers. E-mail is transmitted and routed following protocols known as SMTP (Simple Message Transfer Protocol) and POP (Post Office Protocol). Another older protocol known as "telnet" allows users to remotely connect to and control distant machines. These are just a few of the many protocols that allow a wide variety of services to take place on the Internet. The World Wide Web is just one part of those Internet services.

Web software is relatively new to the Internet. Its authors are fully aware of all of the earlier protocols. For this reason, web browsers are written to use multiple protocols and are able to interact with servers other than just web servers. Most modern web browsers can:

* - You must have access to the server and configure the web browser to know its address before e-mail or news hierarchies can be accessed. Most browsers recognize the special "mailto" instruction (see below) and thus can post articles back to a News server if you have setup the browser to know the address of an Internet mail server.

Most web browsers allow you to directly retrieve any of the resources above by entering a command called a "Uniform Resource Locator" (URL). Each highlighted "anchor" (often underlined and displayed in a different color such as blue) in a web page relates to one of these special linking instructions. There are currently seven standard protocols used in URL's, although some newer clients will recognize more. Each one designates the type of Internet resource being used. A table showing the syntax for each appears below, followed by examples of the use of each one.

ProtocolUniform Resource Locator (URL) Syntax
Hypermediahttp://hostname:port#/directory_path/document_name.html
Secured HTTPhttps://hostname:port#/directory_path/filename.html
Remote FTPftp://hostname:port#/directory_path/filename
Local Filefile:///directory_path/document_name
   -or-directory_path/document_name.html
Remote Logintelnet://hostname:port#
Send E-mailmailto:email_address
Newsgroupnews:newsgroup.hierarchy.name

URL EXAMPLES

Note that the use of upper and lowercase in the commands below may be critical. Some web servers are case sensitive; others are not. It is best practice to always type the URL just as you saw it written.

Remote Web Pages (Hypermedia):

If you want to retrieve a hypermedia document named my_file.html in a directory named Documents from a host named www.anynet.net, enter the URL:

     http://www.anynet.net/Documents/my_file.html

Telnet - Remote Logins:

If you want to remotely login to port number 3000 on a host named main.anynet.net , enter the URL:

     telnet://main.anynet.net:3000

Newsgroup Articles:

If you want to view the newsgroup hierarchy rec.sport.tennis, enter the URL:

     news:rec.sport.tennis

FTP Remote File Retrievals:

If you want to retrieve a file named "readme.txt" in a directory named Documents from a host named ftp.anynet.net on which you do not have an account, enter the URL:

     ftp://ftp.anynet.net/Documents/readme.txt

If you want to retrieve a file named "readme.txt" in a directory named Documents from a host named ftp.anynet.net on which you have an account named myacct and a password of pword, enter the URL:

     ftp://myacct:pword@ftp.anynet.net/Documents/readme.txt

"Absolute Reference" to a Local File:

If you want to retrieve a hypermedia document named my_file.html from a specific directory named Documents on Drive C: of a local computer, enter the URL:

	file:///C|/Documents/my_file.html

Notice the required use of the vertical bar ( | ) in place of the colon following the drive letter.

"Relative Reference" to a Local File:

If you want to retrieve a hypermedia document named my_file.html from the same directory as the last document, enter the URL:

     my_file.html

Notice the lack of any protocol specifier (such as http:) in front of the command.

E-mail from a Browser:

If your browser can send e-mail, you could send a message to a user named jsmith on a host named irsc.edu with the URL:

     mailto:jsmith@irsc.edu

WEB PAGE STORAGE & HOME PAGES

HTML files and the other multimedia objects that are used to construct a web page must be stored on a device that is accessible to the browser program. This device is typically a magnetic disk that is part of a PC or a web server attached to the Internet. A "web site" is a collection of HTML files and related data objects (such as images, movies, or programs) that are meant to be used as a group. Such sites often involve the use of multiple files and storage folders (also known as "directories") that are organized in a storage hierarchy known as a "directory tree". The folder that is used as the starting point for the group of files is called the "parent folder". Subordinate folders are referred to as "children" of that parent.

One or more web sites are typically stored on a dedicated web server, but a web site can be stored on a simple PC and viewed using a web browser located on the same machine. Rather than allowing users to view the entire contents of a folder at will, an author often creates a special HTML file called an "index" file in the folder. This file serves as a table of contents or index to the specific files that the author wants to offer to the reader. Most web servers will not send readers a directory listing of a folder's contents, but rather send its index page whenever a users provides a URL that stops with the folder's name. Some servers will simply return an error message if the user types an incomplete URL. This gives the author control over exactly what files can be viewed.

The index file is often referred to a folder's "home page", but this term actually has many meanings. In the most general sense, a home page is a web page that is meant to be viewed as a starting point when viewing a web site. The term is used in three different senses:

  1. A server's "home page" is the one that will be sent to a browser whenever an incomplete URL (i.e. one that does not specify the full path to a specific HTML file) is use to retrieve a web page.
  2. A browser's "home page" is the one that the browser will attempt to retrieve each time it is started or whenever the user presses its "Home" button. The user of the browser can configure this setting.
  3. A personal "home page" is a biographical web page about a person.

The home page of the IRSC web server can be retrieved with the URL:

     [http://www.irsc.edu/]

My Internet Fundamentals Page is reached with the URL:

     http://www.gibsonr.com/classes/internet/index.html

Notice that most sites name the machine that acts as their web server "www", although this is not a rule.


HTML EDITING

Because HTML is a text-based language, it is easy to write and edit. Most programs can read and write text (although that may not be their primary choice for a language). HTML authors have five editing approaches to choose from and ultimately may use a combination of them to produce a web page or site.

  1. Pure Text Editors: Programs such as Windows Notepad, Mac's TextEdit, or the UNIX/Linux vi or emacs utilities can be used to enter or edit HTML code directly. Such programs often expect to load or save only files that have a filename extension of txt, so you may have to take extra steps to place the proper extension of html on the file. If you are using Notepad, you may have to place quotes around the entire filename when saving it to prevent the addition of the extension txt to the end of the filename. Pure text editors offer the following advantages:
    However, because these programs are simply text editors, they cannot interpret HTML tags and cannot be used to render your web pages. This is not a major weakness, since you should view your pages with a variety of browsers anyway. But you will need to "toggle" (switch) between two or more programs to do this. The other major weakness is that these programs are completely ignorant of HTML, so if you need help, you will have to get it elsewhere. "HTML Help" programs can be run concurrently with text editors to perform this service.
  2. Tag Editors: These programs are enhanced text editors that were written with the sole purpose of editing HTML files. They offer controls (buttons and menu items) that act as shorthand for entering HTML tags and also provide lists of tags and settings that may be otherwise difficult to remember. An example of such a program is HTMLed®. These programs manage the html extension for you and also save only text. They usually do not have the ability to render web pages, but can launch one or more browsers to do so.
  3. Graphic (WYSIWYG) Editors: Sophisticated editors such as Dreamweaver® or Microsoft® Expression Web now exist that allow you to edit a web page in a "What You See Is What You Get" (WYSIWYG) environment. In other words, you edit the finished view of the page rather than the HTML code. Such an editor works in the background composing and storing the HTML code necessary to render your page. These programs are very powerful, but may employ HTML coding practices and techniques that you do not like. Fortunately, most also offer the ability to edit the HTML code directly, allowing you to compensate for that weakness. Graphic editors are typically larger and slower than the other types of editors and are usually more expensive. But they normally offer many enhancements related to organizing and managing large web sites and web servers.
  4. Foreign Editors: Many programs that were created for reasons completely unrelated to the Internet are now offering the option of saving their data as HTML code. For example, you can create an electronic spreadsheet (a.k.a. "worksheet") using the Microsoft® Excel program and save it as an HTML file. In such a case, the Excel program will convert the spreadsheet to an HTML table when it saves the data. Many such programs also have the ability to render HTML files as well. Microsoft® Word can render and edit HTML files, although its ability as a browser is quite limited and slow. The important thing to remember when using these editors to write web pages is that you must specify that the file is an HTML file when you save it to avoid having the program save any of its primary data language in the HTML file.
  5. Web Authoring Wizards: These are user friendly programs that lay people can use to author web pages by answering a series of simple questions that define their prospective page. The program then users these answers as design specifications and writes the HTML code for the user. Many sites that offer free hosting of web pages require their users to compose their web pages in this manner.

Many brands and versions of each of the types of editors above can be obtained for free or at extremely low cost. See the web page entitled HTML Editor Software and Review Sites for more information.


WEB PAGE AUTHORING - STEPS

Web publishing is a multi-part process:

  1. Select your topic and define your target audience considering their background, technical experience and access to web resources such as browsers, viewers and multimedia hardware.
  2. Locate reference material (text, images, video, etc.) being careful to observe the laws and spirit of intellectual property. When in doubt, err on the side of caution - ask permission.
  3. Define the organizational structure of your web site and a navigational plan. Decide whether you want your pages read in sequence or organized in a hierarchy, etc. Then plan navigational aids such as buttons, navigation bars, menus, and site maps to aid the reader.
  4. Compose narrative text about the topic and define its layout and enhancement (italics, etc.) The text can be authored with a text editor or word processor or it can be copied from the other software using "Copy & Paste" techniques.
  5. Decide on the locations of any graphic images (either inline or hot links) on each page. Graphics that are stored in files in the same storage folders as your HTML files can be referenced in a "relative" way with just their filenames (an possible a path to their folder). Graphic files that are located on remote machines must be referenced using a URL in the HTML image tag.
  6. Identify and position anchors within your text that will serve as hot links to other resources. Determine the proper URL for the resource and then place it in an HTML anchor tag that surrounds the text or object that you want to be "hot" on the page.
  7. Generate any multimedia objects that you do not have yet using multimedia editors such as Adobe PhotoShop®, Paint Shop Pro®, or Microsoft® Sound Recorder.
  8. Create the HTML files necessary to render the assembled text and multimedia objects. See the section above entitled HTML Editing for options.
  9. Store all of the files in the appropriate folders and sub-folders on your computer or network.
  10. Test all of your links beginning with your index page. Clear the "History List" in your browser to make it easier to track which links you have tested (most browsers change the color of links that have been followed based on the contents of the History List).
  11. Validate that your source code meets applicable published web language standard using sites such as the [W3C Markup Validation Service]
  12. Upload the entire web site to its final destination (typically a web server). If you have "write access" (permission to write) to the web server, then use FTP software to upload all of the files that make up your site to the appropriate folders on the server. Some of the advanced graphic editors offer "server extensions" that can be added to servers to allow your editor to directly interact with the web server and transfer your pages using your editor.
  13. Retrieve the home page for your site and test the links again using a variety of clients. Remember that every brand and version of browser will render HTML code slightly differently. It is advisable to view your files with a variety of browsers to get a feel for these differences.
  14. Finally, list the starting URL for your site with Internet directories and search engines such as Google®, Yahoo®, and Infoseek®. See the class web page on Internet Search Tools & Starting Points at:
         http://www.gibsonr.com/classes/internet/search.html

REMEMBER


PATH: Instructional Server> Internet Fundamentals>