Overview of the Canonical Tag

By now you’ve probably heard of a new “canonical tag” supported by the major search engines. You may have also heard that it’s an important new tool for developers and SEO experts. What you may not know is what it does and why it might be important to your site. This article will attempt to answer those questions.

First and foremost, let’s look at what the canonical tag actually is. The first thing to realize is that it isn’t actually a tag. It’s a new value for an attribute of a tag that’s been in use for quite some time: . You’re probably familiar with the use of this tag for linking external stylesheets, to use the CSS styles for multiple pages. In fact, the purpose of the link tag is to define relationships between pages in just that way. The exact type of relationship is specified by the “rel” attribute of the tag. Up to now, the only really practical use for the tag was to link stylesheets to documents.

So, now we have a shiny new value for that attribute – canonical. What good is it? Well, it lets us tell the search engines that a specific URL may be a duplicate of another page. That’s important because it’s not advantageous for a search engine to index pages with duplicate content. The fact is, having several pages indexed with identical or very similar content will probably harm your search engine ranking. It’s also extremely annoying for users to click on multiple search results only to end up on pages with the same content. By stating the canonical relationship to a page in the link tag, we tell the search engines not to index this page, because it’s really the same as the other. We’re also informing the search engine that the other page is the one to use for calculating rank.

There are instances where it may be to your advantage to use duplicate pages. For instance, some sites track where their visitors come from by creating and linking multiple index pages, often referred to as “gateway” pages, such as: “http://myplace.com/google_index.html” or “http://myplace.com/index.php?se=google”. In this case, the content of the pages will probably be identical to the index.html or index.php page, which would be the page we would point to in the link tag. In other instances, the duplicate pages may simply be unavoidable, as in a dynamically generated menu with options in a query string, for example: “http://myplace.com/tshirt?size=L”. There may also be a need for pages with the same content in different formats, as in “printer-friendly” version of a page. A session ID may add a query string that can cause multiple instances of pages. In all these cases, adding the canonical relationship will let the search engine know which to index.

The canonical relationship should prove to be a very useful tool for developers for optimizing their sites. Google, Yahoo and Live have not only embraced it, but also published their support and thorough explanations of how their engines will handle page indexing with implementations of it. All in all, it should bring leaner, cleaner search results and more accurate ranking to the Web.

Leave a Comment