The Definitive Guide to Semantic Web Markup for Blogs

You’d think that as a result of open-source development practices, blog architectures would be pretty close to perfection in areas like Web standards and maximum SEO impact.

You’d be wrong.

Unbelievably, nearly every WordPress, MovableType, or TypePad theme that I’ve come across in the past year fails a simple test for truly semantic (and Google-recommended) XHTML markup. Now, I’ll be the first to admit that these failures are by no means fatal flaws. At the same time, though, I find it extremely unsettling that an inferior markup structure is prevailing in the face of an absolutely correct way of doing things.

After having this revelation, I thought I’d champion the cause and start changing sites one by one, all the while evangelizing the benefits of perfect markup. Then I realized that there are only 24 hours in the day, and I wanted at least three of those to go towards playing Guitar Hero

So to compromise, I decided to publish the essential guide to semantic Web markup for blogs. Learn it, live it, and benefit from it—it can mean the difference between a good site and one that will blow you away.

Proper XHTML structure for blogs and for Google!

One key principle governs the markup on every page of your site:

Your goal is to describe each page to the search engines through the use of hierarchical XHTML tags (<title>, <h1>, <h2>, <h3> etc.) and to present them in a logical, meaningful order.

Regarding blog architectures, there are five areas that we’re going to focus on, as seen in picture below.

Semantic XHTML image guide

Figure 1. We’re going to cover these 5 areas of semantic XHTML markup.

1. Title your pages the right way!

Page titles are the most important link between pure SEO and your human readers. Although their apparent impact on your site’s pages may appear minimal, their true impact in the search engines is undeniable. As you can see in the image below, Google pulls the contents of your <title> tag and links it as the most prominent piece of information in your search result.

Search engine results page

Figure 2. Your <title> tags are served in the SERPs, so you’d better make them count!

Although some WordPress themes handle page titles gracefully, many are constructed in a way that doesn’t make sense when viewed within the context of the SERPs. For instance, the highly ubiquitous Kubrick theme, which comes pre-installed with WordPress, has page titles that are constructed like so:

Blog name » Post title

With this structure, all of the search engine results for your site’s pages would be prefaced by the title of your blog. This may not seem so bad, but you need to view this from the perspective of the average search engine user—does he or she care what your site’s name is when they’re searching for something that is of interest to them? Absolutely not.

Keep in mind, too, that users scan content rather than reading it (especially true for the SERPs), so you need to provide them with as much value and as little fluff as possible.

Want to fix your titles? Check out my article on how to add dynamic, search engine friendly titles to your WordPress blog.

2. How to code up your logo and tagline

This is the second most common problem that I see in WordPress themes and Web sites in general (I’m even guilty of this one). All too often, site logos are served inside <h1> tags. Countless WordPress themes are guilty of this markup misdemeanor, so odds are extremely good that your site is currently suffering from a bad case of logo egomania. Here’s why it’s a problem.

Besides the <title> tag, the <h1> tag is supposed to tell both Google and users exactly what they can expect to find on the current Web page. In addition, search engines assign a hierarchical rank to the different headline markup tags, and except for the <title>, the <h1> tag is the most powerful piece of information you can serve to the search engines about a particular page.

Let’s look at this very site as an example. For months, I’ve served “Pearsonified” within <h1> tags, so this means that every page of my site appears to be primarily about Pearsonified, and secondarily about whatever topic the page is truly about.

How bass ackwards is that?

The cardinal rule here is that your blog title is not nearly as important as it’s marked up to be (I know, I’m clever), and those <h1> tags ought to be reserved for more specific information about the individual pages of your site.

The solution? Try serving your blog’s title inside a <div> instead.

Oh, and what about your tagline? Ideally, your tagline should be laser-focused on your unique value proposition, the primary subject of your Web site. This is a classic case of “do as I say, not as I do” because my own ridiculous tagline is “Best Damn Blog on the Planet.” The irony here is that if this were actually true, then that wouldn’t be my tagline! Ah well… live and learn.

So, back to you—what to do with that tagline of yours? I recommend serving your laser-focused tagline inside <h1> tags on your home page, and on interior pages, you should serve it inside <h2> or <h3> tags so that it doesn’t appear more important than the actual page/post title.

3. Serve your post titles inside <h1> tags!

If logo egomania is the second most common problem I’ve seen in WordPress themes, then post titles being served inside <h2> tags (or worse) is far and away the biggest markup mistake.

I’ve hinted at it already, but it begs repeating here—the post title is the single most important piece of information you can serve to the search engines about an individual page. Ideally, your post title should give a clear indication of what people can expect to find within the content of a particular Web page, and as a result, it should be featured as prominently as possible within your markup.

Of course, the best way to do this is to serve your post title inside <h1> tags. Oh, and to be completely clear, you should only have one set of <h1> tags on any given Web page, so make them count!

4. Use sub-headlines within posts to your advantage

Breaking up your posts into sub-sections is a great idea both stylistically and also for reader comprehension. The most common way to delineate these sub-sections is through the use of sub-headlines, but the problem is that there are an infinite number of ways you could go about doing this.

Fortunately, many WordPress themes come with pre-formatted styles for sub-headlines, and if you look, you’ll find that <h3> and <h4> tags are the most popular choices. Personally, I’ve been using <h3> tags for well over a year, but I hadn’t ever given it much thought until I decided to write this guide.

Really, if you serve your post title within <h1> tags, then it stands to reason that your sub-headlines ought to be highly-focused, relevant, and served inside <h2> tags. Under this setup, your sub-headlines facilitate your post title in the most powerful way possible while still maintaining the hierarchy of semantic markup.

If you’ve been in the habit of using <h3> tags for sub-headlines, it may be too much trouble to change at this point, and in all honesty, you probably wouldn’t see much difference anyway. The only time I would ever “highly recommend” a change like this is if you were trying to rank for a term that is ridiculously competitive.

Then again, I’m a big fan of doing everything you possibly can to position yourself for future success…

5. Sidebar headlines? A la carte

Unfortunately, the WordPress-recommended sidebar architecture has sidebar headlines served inside <h2> tags. Semantically, this is ridiculous.

Take a look at this site, for instance. My sidebar headlines are “The Latest Articles,” “Must Reads,” and “Improve Your Blog.” While that third headline carries a bit of meaning, the other two are useless, at least as far as search engines are concerned. The bottom line here is that while sidebar items can add some value to a page, they can’t (and shouldn’t) touch the main content area with regard to overall value on a page.

Therefore, you shouldn’t serve sidebar headlines inside high and mighty <h2> tags. Based on everything we’ve covered so far, you should serve them inside <h3> or <h4> tags at the most.

For the record, if your sidebar headlines are tightly focused around your primary subject matter, then serving them inside <h3> tags is a great idea because they will carry as much weight and add as much value as possible. If you’ve got sidebar headlines like mine, though, relegate them to <h4> tags or a comparable element that won’t give them so much weight.

The bottom line

Before I began Celebrity Hack in February of 2007, I was in the habit of “loosely following” semantic markup principles on my sites. I had always met with reasonable success in the search engines, so I had no reason to suspect that things could improve if I tightened my markup belt, so to speak.

Operating in a highly competitive niche like celebrity gossip forced me to take a strict look at things that truly work and afford me a competitive advantage, and as a result, my thoughts on semantic markup have changed entirely.

If you’re truly interested in running your Web site at full throttle, then it will serve you well to understand the principles of semantic markup and apply them as best you can.

This article has been translated into Belorussian—thanks, Patricia!