Excerpts from NCSA's Beginner's Guide to HTML

Some Terms

WWW
World Wide Web
Web
World Wide Web
SGML
Standard Generalized Markup Language--a standard for describing markup languages
DTD
Document Type Definition--this is the formal specification of a markup language, written using SGML
HTML
HyperText Markup Language--HTML is an SGML DTD
In practical terms, HTML is a collection of platform-independent styles (indicated by markup tags) that define the various components of a World Wide Web document. HTML was invented by Tim Berners-Lee while at CERN, the European Laboratory for Particle Physics in Geneva.

HTML Documents

HTML documents are plain-text (also known as ASCII) files that can be created using any text editor (e.g., Emacs or vi on UNIX machines; SimpleText on a Macintosh; Notepad on a Windows machine). You can also use word-processing software if you remember to save your document as "text only with line breaks".

Tags

An element is a fundamental component of the structure of a text document. Some examples of elements are heads, tables, paragraphs, and lists. Think of it this way: you use HTML tags to mark the elements of a file for your browser. Elements can contain plain text, other elements, or both.

To denote the various elements in an HTML document, you use tags. HTML tags consist of a left angle bracket (<), a tag name, and a right angle bracket (>). Tags are usually paired (e.g., <H1> and </H1>) to start and end the tag instruction. The end tag looks just like the start tag except a slash (/) precedes the text within the brackets.

Some elements may include an attribute, which is additional information that is included inside the start tag. For example, you can specify the alignment of images (top, middle, or bottom) by including the appropriate attribute with the image source HTML code.

NOTE: HTML is not case sensitive. <title> is equivalent to <TITLE> or <TiTlE>. There are a few exceptions noted prefaced by escape sequences.

Not all tags are supported by all World Wide Web browsers. If a browser does not support a tag, it will simply ignore it. Any text placed between a pair of unknown tags will still be displayed, however.

The Minimal HTML Document

Every HTML document should contain certain standard HTML tags. Each document consists of head and body text. The head contains the title, and the body contains the actual text that is made up of paragraphs, lists, and other elements. Browsers expect specific information because they are programmed according to HTML and SGML specifications.

Required elements are shown in this sample bare-bones document:

    <html>
    <head>
    <TITLE>A Simple HTML Example</TITLE>
    </head>
    <body>
    <H1>HTML is Easy To Learn</H1>
    <P>Welcome to the world of HTML.
    This is the first paragraph. While short it is  
    still a paragraph!</P>
    <P>And this is the second paragraph.</P>
    </body>
    </html>
The required elements are the <html>, <head>, <title>, and <body> tags (and their corresponding end tags). Because you should include these tags in each file, you might want to create a template file with them. (Some browsers will format your HTML file correctly even if these tags are not included. But some browsers won't! So make sure to include them.)

Viewing the Source

To see a copy of the file that your browser reads to generate the information in your current window, select View Source (or the equivalent) from the browser menu. (Most browsers have a "View" menu under which this command is listed.) The file contents, with all the HTML tags, are displayed in a new window.

This is an excellent way to see how HTML is used and to learn tips and constructs. Of course, the HTML might not be technically correct. Once you become familiar with HTML and check the many online and hard-copy references on the subject, you will learn to distinguish between "good" and "bad" HTML.

Remember that you can save a source file with the HTML codes and use it as a template for one of your Web pages or modify the format to suit your purposes.

Markup Tags

HTML

This element tells your browser that the file contains HTML-coded information. The file extension .html also indicates this an HTML document and must be used. (If you are restricted to 8.3 filenames (e.g., LeeHome.htm, use only .htm for your extension.)

HEAD

The head element identifies the first part of your HTML-coded document that contains the title. The title is shown as part of your browser's window (see below).

TITLE

The title element contains your document title and identifies its content in a global context. The title is typically displayed in the title bar at the top of the browser window, but not inside the window itself. The title is also what is displayed on someone's hotlist or bookmark list, so choose something descriptive, unique, and relatively short. A title is also used to identify your page for search engines.

For example, you might include a shortened title of a book along with the chapter contents: NCSA Mosaic Guide (Windows): Installation. This tells the software name, the platform, and the chapter contents, which is more useful than simply calling the document Installation. Generally you should keep your titles to 64 characters or fewer.

BODY

The second--and largest--part of your HTML document is the body, which contains the content of your document (displayed within the text area of your browser window). The tags explained below are used within the body of your HTML document.

Headings

HTML has six levels of headings, numbered 1 through 6, with 1 being the largest. Headings are typically displayed in larger and/or bolder fonts than normal body text. The first heading in each document should be tagged <H1>.

The syntax of the heading element is:
<Hy>Text of heading </Hy>
where y is a number between 1 and 6 specifying the level of the heading.

Do not skip levels of headings in your document. For example, don't start with a level-one heading (<H1>) and then next use a level-three (<H3>) heading.

Paragraphs

Unlike documents in most word processors, carriage returns in HTML files aren't significant. In fact, any amount of whitespace -- including spaces, linefeeds, and carriage returns -- are automatically compressed into a single space when your HTML document is displayed in a browser. So you don't have to worry about how long your lines of text are. Word wrapping can occur at any point in your source file without affecting how the page will be displayed.

In the bare-bones example shown in the Minimal HTML Document section, the first paragraph is coded as

    <P>Welcome to the world of HTML.  
    This is the first paragraph.
    While short it is
    still a paragraph!</P>

In the source file there is a line break between the sentences. A Web browser ignores this line break and starts a new paragraph only when it encounters another <P> tag.

Important: You must indicate paragraphs with <P> elements. A browser ignores any indentations or blank lines in the source text. Without <P> elements, the document becomes one large paragraph. (One exception is text tagged as "preformatted," which is explained below.) For example, the following would produce identical output as the first bare-bones HTML example:

    <H1>Level-one heading</H1>
    <P>Welcome to the world of HTML. This is the  
    first paragraph. While short it is still a
    paragraph! </P> <P>And this is the second paragraph.</P>
To preserve readability in HTML files, put headings on separate lines, use a blank line or two where it helps identify the start of a new section, and separate paragraphs with blank lines (in addition to the <P> tags). These extra spaces will help you when you edit your files (but your browser will ignore the extra spaces because it has its own set of rules on spacing that do not depend on the spaces you put in your source file).

NOTE: The </P> closing tag may be omitted. This is because browsers understand that when they encounter a <P> tag, it means that the previous paragraph has ended. However, since HTML now allows certain attributes to be assigned to the <P> tag, it's generally a good idea to include it.

Using the <P> and </P> as a paragraph container means that you can center a paragraph by including the ALIGN=alignment attribute in your source file.

    <TT><P ALIGN=CENTER></TT>
    This is a centered paragraph.
    [See the formatted version below.]
    </P>

This is a centered paragraph.

It is also possible to align a paragraph to the right instead, by including the ALIGN=RIGHT attribute. ALIGN=LEFT is the default alignment; if no ALIGN attribute is included, the paragraph will be left-aligned.

Lists

HTML supports unnumbered, numbered, and definition lists. You can nest lists too, but use this feature sparingly because too many nested items can get difficult to follow.

Unnumbered Lists

To make an unnumbered, bulleted list,

  1. start with an opening list <UL> (for unnumbered list) tag
  2. enter the <LI> (list item) tag followed by the individual item; no closing </LI> tag is needed
  3. end the entire list with a closing list </UL> tag

Below is a sample three-item list:

    <UL>
    <LI> apples
    <LI> bananas
    <LI> grapefruit
    </UL>

The output is:

The <LI> items can contain multiple paragraphs. Indicate the paragraphs with the <P> paragraph tags.

Numbered Lists

A numbered list (also called an ordered list, from which the tag name derives) is identical to an unnumbered list, except it uses <OL> instead of <UL>. The items are tagged using the same <LI> tag. The following HTML code:

    <OL>
    <LI> oranges
    <LI> peaches
    <LI> grapes
    </OL>

produces this formatted output:

  1. oranges
  2. peaches
  3. grapes

Definition Lists

A definition list (coded as <DL>) usually consists of alternating a definition term (coded as <DT>) and a definition definition (coded as <DD>). Web browsers generally format the definition on a new line and indent it.

The following is an example of a definition list:

    <DL>
    <DT> NCSA
    <DD> NCSA, the National Center for Supercomputing
      Applications, is located on the campus of the
      University of Illinois at Urbana-Champaign.
    <DT> Cornell Theory Center
    <DD> CTC is located on the campus of Cornell
      University in Ithaca, New York.
    </DL>

The output looks like:

NCSA
NCSA, the National Center for Supercomputing Applications, is located on the campus of the University of Illinois at Urbana-Champaign.
Cornell Theory Center
CTC is located on the campus of Cornell University in Ithaca, New York.

The <DT> and <DD> entries can contain multiple paragraphs (indicated by <P> paragraph tags), lists, or other definition information.

The COMPACT attribute can be used routinely in case your definition terms are very short. If, for example, you are showing some computer options, the options may fit on the same line as the start of the definition.

<DL COMPACT>
<DT> -i
<DD>invokes NCSA Mosaic for Microsoft Windows
  using the initialization file defined in the path
<DT> -k
<DD>invokes NCSA Mosaic for Microsoft Windows in
  kiosk mode
</DL>
The output looks like:
-i
invokes NCSA Mosaic for Microsoft Windows using the initialization file defined in the path.
-k
invokes NCSA Mosaic for Microsoft Windows in kiosk mode.

Nested Lists

Lists can be nested. You can also have a number of paragraphs, each containing a nested list, in a single list item.

Here is a sample nested list:

    <UL>
    <LI> A few New England states:
        <UL>
        <LI> Vermont
        <LI> New Hampshire
        <LI> Maine
        </UL>
    <LI> Two Midwestern states:
        <UL>
        <LI> Michigan
        <LI> Indiana
        </UL>
    </UL>

The nested list is displayed as

Preformatted Text

Use the<PRE> tag (which stands for "preformatted") to generate text in a fixed-width font. This tag also makes spaces, new lines, and tabs significant -- multiple spaces are displayed as multiple spaces, and lines break in the same locations as in the source HTML file. This is useful for program listings, among other things. For example, the following lines:

    <PRE>
      #!/bin/csh
      cd $SCR
      cfs get mysrc.f:mycfsdir/mysrc.f  
      cfs get myinfile:mycfsdir/myinfile    
      fc -02 -o mya.out mysrc.f     
      mya.out   
      cfs save myoutfile:mycfsdir/myoutfile 
      rm *  
    </PRE>

display as:

      #!/bin/csh
      cd $SCR
      cfs get mysrc.f:mycfsdir/mysrc.f  
      cfs get myinfile:mycfsdir/myinfile    
      fc -02 -o mya.out mysrc.f     
      mya.out   
      cfs save myoutfile:mycfsdir/myoutfile 
      rm *

The <PRE> tag can be used with an optional WIDTH attribute that specifies the maximum number of characters for a line. WIDTH also signals your browser to choose an appropriate font and indentation for the text.

Hyperlinks can be used within <PRE> sections. You should avoid using other HTML tags within <PRE> sections, however.

Note that because <, >, and & have special meanings in HTML, you must use their escape sequences (&lt;, &gt;, and &amp;, respectively) to enter these characters.

Extended Quotations

Use the <BLOCKQUOTE> tag to include lengthy quotations in a separate block on the screen. Most browsers generally change the margins for the quotation to separate it from surrounding text.

In the example:

    <P>Omit needless words.</P>
    <BLOCKQUOTE>
    <P>Vigorous writing is concise. A sentence should
    contain no unnecessary words, a paragraph no unnecessary
    sentences, for the same reason that a drawing should have
    no unnecessary lines and a machine no unnecessary parts.
    </P>
    <P>--William Strunk, Jr., 1918 </P>
    </BLOCKQUOTE>

the result is:

Omit needless words.

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts.

--William Strunk, Jr., 1918

Forced Line Breaks

The <BR> tag forces a line break with no extra (white) space between lines.

Horizontal Rules

The <HR> tag produces a horizontal line the width of the browser window. A horizontal rule is useful to separate major sections of your document.

You can vary a rule's size (thickness) and width (the percentage of the window covered by the rule). For example:

<HR SIZE=4 WIDTH="50%">
displays as: