Sitemap.xml generator engine for Multi language EPiServer 7.5 websites

A sitemap.xml file is a useful tool for helping search engines to find relevant pages on your website. This sitemap generator engine for EPiServer is based on loose strategies supporting globalized multilingual content; in other words, mysite.se/sitemap.xml and mysite.no/sitemap.xml will return swedish and norwegian URLs respectively. The strategy based pattern allows for easy custom additions of any kind to the xml file; for instance, apart from the supplied sample EPiServer PageTree strategy, you may also want one if you have virtual pages not part of the page tree, or feel the need to add other type of pages. The source code for this sitemap engine is available at GitHub (Note that this link leads to the first version of the sitemap generator.).

NOTE: An updated version supporting sitemapindex, bundling, batching is available here: Updated sitemap.xml generator functionality with bundle and batch support using sitemapindex

Originally I got the ideas behind this sitemap.xml generator engine from a solution developed a couple of years back. It was mostly done by a former as well as a current collegue of mine; Patrik Akselsson and Karl Ahlin, according to the Git history. Later it became a hobby project of mine to create a new engine that would support globalized content on multi language EPiServer 7.5 websites. Things got put on ice up until last week when I finally got a chance to complete the project while implementing it on a customer’s website together with my collegue Robin Helly.

Accessing the sitemap.xml file in different languages

The route /sitemap.xml is set up using the MVC 5 attribute routing (so yes, you will need MVC 5) in the SitemapController. For this to work you will need to add a line of code (line 25 below) to your Global.asax.cs file if you haven’t already.

Global.asax.cs

protected override void RegisterRoutes(RouteCollection routes)
{
  base.RegisterRoutes(routes);
  routes.MapMvcAttributeRoutes();

The IETF language tag is retrieved by looking at the incoming request host, and comparing it to what is defined in the EPiServer admin mode. So for requests to mysite.no/sitemap.xml, the engine will find the proper culture mapping and returning all urls connected to the no IETF language tag.

The host name culture mapping in EPiServer 7.5 admin mode.

If you don’t have a mapping for the incoming request host, you will be getting an empty sitemap with no url entries; in other words, there is no fallback built in as it is the most logical result. There is also support for hosts with non default ports, such as localhost:1234.

Extending the sitemap.xml using the strategy based pattern

The strategy based pattern is really straight forward. It involves implementing concrete classes of an interface containing the methods to be used; in this case only one. For the sitemap generator engine, StructureMap will then look for all these classes and return them in an IEnumerable<ISitemapStrategy> collection for the repository class to handle.

ISitemapStrategy.cs

public interface ISitemapStrategy
{
  void ForEach(Action<SitemapEntry> add);
}

The included sample strategy PageTreeStrategy uses EPiServer’s IContentLoader to recursively going through the page tree for each language defined in EPiServer’s admin mode host-culture-mappings (running this generating the sitemap entries for with approxmently 5k pages took about 8-9 seconds on my machine, see chapter on the EPiServer job below). For each page, the strategy examines a number of conditions in order to determine if it’s link should be included in the sitemap or not. For instance, if you decorate a PageType class with the attribute [ExcludeFromSitemap] this strategy will exclude it from the sitemap.xml. It will also ignore pages that are not published or not accessible by everyone. Should you want to, you could also add an EPiServer dynamic property in order to be able to exclude specific page instances from the sitemap; see commented section in PageTreeStrategy.cs. If you go with a checkbox you may want to use something like the checkbox wrapper described in Custom property checkbox wrapper as dynamic property in EPiServer 7.5 as there are some issues overriding the values of a normal checkbox on subpages when using it as a dynamic property.

Refreshing the sitemap.xml by a scheduled EPiServer generator job

The SitemapGeneratorJob does basically two things, first it rebuilds the sitemap persisting all entries in EPiServer’s DynamicDataStore (you could probably go with a custom database instead should you want to, we use Fluent NHibernate at my current client’s), and then it refreshes the cache for each of the languages defined in the EPiServer admin mode host name-culture bindings. The cache uses EPiServer’s cache manager together with a temporary storage, meaning that the cache handling will be the same as EPiServer’s in a multi server loadbalanced environment. Also the temporary storage previously implemented by Robin Helly will ensure that the old cache will be used until the new one is generated and ready to seamlessly take it’s place. Since there will always be a cached sitemap ready for the search engine crawlers, it does not really matter if the generator job takes 9 seconds to run.

All the different language sitemaps are streamed and provided from the cache rather than physical files making this solution a lot faster.