What is Semantic HTML?

Semantics is the study of meaning: how meaning is created and applied to signs. "Why does X mean X?" is a question of semantics.

HTML is the markup language that we use to write web pages. It's understood by standard web browsers, as well as dozens of other types of "user agents", including mobile phones, search engine spiders, aural browsers etc.)

HTML consists of two types of things:

A few tags can be content of their own (like images, Flash movies, or metadata), but most HTML tags are used to apply structure to content.

Semantic HTML, or "semantically-correct HTML", is HTML where the tags used to structure content are selected and applied appropriately to the meaning of the content.

So, for example, if you're wanting your HTML to be semantically-correct...

  • A <p></p> paragraph tag pair should only be used to indicate a paragraph (which is a structural concept). It should never be used to apply space to a web page. Never, ever, use a series of <p> tags to create space!
  • The HTML tags <b></b> (for bold), and <i></i> (for italic) should never be used, because they're to do with formatting, not with the meaning or structure of the content. Instead, use the replacements <strong></strong> and <em></em> (meaning emphasis), which by default will turn text bold and italic (but don't have to do so in all browsers), while adding meaning to the structure of the content.

Always separate style from content

HTML tags should never be used to apply presentation - that's the job of CSS (Cascading Style Sheets). See http://webdesignfromscratch.com/how-html-css-js-work-together.cfm to learn more about how HTML, CSS and JavaScript fit together in web pages. (Note, perfect production practice also removes all JavaScript functions and event handlers from the markup as well!)

Why semantically correct HTML is better

Writing semantic HTML brings a wide range of benefits:

Ease of use

First of all, semantic HTML is clean HTML. It's much easier to read and edit markup that's not littered with extra tags and inline styling. Clean markup also saves time and money when other people have to interact with it - say, a web developer who has to implement your page template in a content management system or any other web application.

A corollary benefit is that your HTML files are also smaller, so they load quicker.

Accessibility

Unless you've had to interact with HTML markup through media other than your web browser, it doesn't seem obvious to imagine that your web pages have a life outside the browser window, but they very often do. Web pages can be consumed by humans and machines in lots of different ways!

When you separate visual aspects (i.e. style) from the actual meaning of a document, you end up with a document that always means the same thing. The way it's presented or consumed can vary. One common technique web designers use is to apply different style sheets for different media. For example, you can apply a certain stylesheet only when a document is printed to paper, another one when it's viewed on screen, and yet another when it's accessed by a text-to-speech aural browser.

A text-to-speech reader also understands the tags <strong> or <em> but it treats text output with those tags very differently to the way a visual browser responds. The TTS reader adjusts vocal tone or volume, rather than contrast or text style, which conveys the same meaning but through a different medium.

Search Engine Optimisation

Search engine spiders and crawlers, like Googlebot, represent another genus of user agents. They also consume web page content, in an attempt to discern the meaning within.

When a crawler finds a web page, it stores its assessment of what the page is about on an indexed database to use when matching people's search queries. The big question is - how do search engines match search terms to known pages to create a prioritised list?

Of course, they all do it a bit differently, but one of the keys to Search Engine Optimisation is to use plain old common sense. If you were a search engine, how would you do it? If you work through the problems a search engine faces, a few things soon become clear, often easily expressed prefixed with “all other things being equal...”.

Let's say you have two web pages, each with exactly the same text content (10 kilobytes).

One of the pages has an additional 5KB of HTML markup, neatly annotating the semantic meaning in the content.

The second page has 30KB of additional markup, with inline styles, lots of nested <div> tags, and decorative imagery.

Now, the more graphically intense page might look better to human visitors (might!), but if each page contains the search term "bluebottle" 5 times, which would you (pretending to be a search engine) judge was most relevant to someone searching for “bluebottle”?

Clearly, it's the first, more lightweight page, for a few possible reasons:

  1. The keyword density of the lightweight page is greater. It features the search term five times in 15KB of markup, whereas the second page features it five times in 40KB of markup. Whatever the additional markup is for (the search engine might not be able to tell), it doesn't seem to be about “bluebottle”.
  2. Each occurrence of the search term is likely to be higher up towards the start of the document in the lightweight page than it is in the 40KB page. All other things being equal, the earlier you find a search term within a document, it's more likely that the document is about that term, or the term is more prominent in the document's content.
  3. Assuming that the first document is neatly marked up with semantically correct HTML, it's more likely that the search term will be placed inside a higher-value tag (such as a heading, or link) than in a more graphical page (which might use an image as a link, perhaps without a proper alt attribute).

Repurposing

When your markup (content, with meaning) is separated from your styles (style sheets for different media), obviously the content can be understood more easily by all user agents. That means not only user agents you already know about, but ones you don't yet know about (like automated crawlers that create custom RSS news feeds on a certain topic, or image- or video-specific search engines), as well as others that have not yet been invented!

The last couple of years have seen mixing and mashing content emerge as a major feature of new web sites and applications. This can happen without the knowledge of the original site owner, but in most cases this freedom of content to move around the web, adapting to various media, is beneficial to the original creator.

Often in these situations, the content taken from a web page is formatted differently on the new remixed page, which makes it all the more important to remove any style content from the markup itself. (Note that inline styles, applied directly within HTML tags, override any other styles implemented through separate stylesheets, and so they would have to be stripped off programatically.)

Clearly, it's easier to grab and re-use content from any source, and apply it to any medium, when it does not contain any hard-coded style information, and also when it does contain semantic markup that can help a computer program understand the meaning and structure of the content.

Read more in our “Guide to Semantic HTML” e-book (£5.00)

Ben Hunt has published a e-book which also provides:

Get “Guide to Semantic HTML” now for £5.00

Search this site
On “Save the Pixel”
Buy Save the Pixel, the best-selling guide to simple web design.
Clicss templates, great robust useful CSS templates from £40
Floor 3
111 Buckingham Palace Road
London
SW1W 0WQ
UK
Phone
+44 (0)207 1600 989

Articles + tutorials in HTML & CSS Production

Overview
Menu of all our articles on HTML, CSS and web page production
CSS
List of our CSS articles
Introduction to CSS
Beginner's introduction to Cascading Style Sheets (CSS)
HTML
List of HTML articles
Introduction to HTML
Introduction to basic HTML tags and the structure of HTML documents.
How HTML, CSS and JavaScript work together in making web pages
Best practice for using HTML, Cascading Style Sheets, and JavaScript together to make web pages.
Building a web page with HTML + CSS for complete beginners
Learn what HTML is and how to build a website from scratch. A guide to creating a web page using HTML and CSS for people with no prior knowledge
Block vs Inline display style in CSS
HTML elements can be displayed either in block or inline style. The difference between these is one of the most basic things you need to know in order to use CSS effectively.
Inheritance and Cascading Styles in CSS
Introduction to how styles apply in CSS through inheritance and cascading.
HTML Lists
The basics of lists: unordered, ordered and definition lists covered.
HTML Tables
The basics of tables. When to use tables, and how to do it. Includes tips on colspan and rowspan properties, and the col and colgroup tags.
Anatomy of HTML tags
Describes the common attributes that can feature in your HTML tags.
Introduction to Semantic HTML
Explains what semantic HTML, or semantically-correct HTML, is and how it benefits web development.
Web Page Production using xHTML and CSS (ebook)
This new 61-page ebook provides a worked example of web production, taking you through the entire process from a Photoshop page design, to a working HTML page template.
Datasheet-style form using HTML, CSS and JavaScript
Make a datasheet-style web form using HTML, CSS and JavaScript
Tabular list-style form using HTML, CSS and JavaScript
Create an appealing tabular list using HTML, CSS and JavaScript
Complete HTML tag reference
Our complete guide to HTML and xHTML tags, and their proper usage.
Keeping your content in order of priority with flexible CSS layouts
This article shows you how to use CSS floats to achieve any column layout, while keeping your most important content highest on the page.
© Scratchmedia Limited, 2009
Floor 3, 111 Buckingham Palace Road, London, SW1W 0WQ, UK
+44 (0)207 1600 989