2013

Sitecore preview mode and loading jQuery

October 7, 2013

Sitecore

It seems like most folks these days are using jQuery on their sites. Sitecore, in fact, uses jQuery for the page editor UI. I discovered an interesting case where it can conflict with your own jQuery version that I thought I’d share.

The scenario where it happens is this:

You are in preview, page editor, or debugger mode (any mode where the webedit toolbar renders on the site)
You are loading your jQuery library in the page header and not the footer (note: you should load it in the footer if it’s at all possible)

jQuery has a feature called noConflict that is designed to allow you to isolate it from other libraries, such as Prototype, that also claim the “$” global variable. The feature can also be used to run two versions of jQuery on the same page, and Sitecore in fact does exactly this. However, the case that noConflict does not prevent is one where a second copy of jQuery is loaded on top of an existing one.

What happens here is that your jQuery loads in the heading, then Sitecore loads its jQuery and calls noConflict on it. Unfortunately, the loading of Sitecore’s JS overwrites the global jQuery variable, resulting in this mess:

jQuery = Sitecore jQuery (1.5.5 for SC7 Update-1, pretty old)
$ = your jQuery

Now load a jQuery plugin that uses an IIFE to load itself like:

(function($) {
    // plugin load code here
})(jQuery);

Yeah that’s right, the plugin invokes its function passing in jQuery aliased as $. Which means that the plugin just loaded to Sitecore’s jQuery, not yours! Now if you invoke the plugin you’ll see that it’s undefined in your jQuery!

Fortunately it’s relatively easy to manually fix this issue. You simply need to:

Save your jQuery variable immediately after you load it:
<script>var my_jquery = jQuery;</script>
Define the webedit placeholder explicitly on your layout, so you can control where Sitecore will load its version of jQuery:
<sc:Placeholder runat="server" Key="webedit"/>
Set the global jQuery variable back to your saved jQuery immediately after the webedit placeholder definition:
<script>jQuery = my_jquery;</script>
It’s safe to let this code run even when not in preview mode, as it will simply have no effect if Sitecore does not load its jQuery. Of course you can also render it only when !Sitecore.Context.PageMode.IsNormal as well if you want clean output.

If you can put your copy of jQuery in the footer, you don’t need to resort to any of this hackery. Hope this helps someone :)

Upgrading Sitecore's Password Security with PBKDF2

September 1, 2013

SitecoreCrypto

The problem

A few months ago I read a very interesting article on Ars Technica titled How I became a password cracker. Seriously, go read it. These days it’s astonishing how simple it is to brute force even long, random-looking passwords. That 1337-5p33k p@55w0rd? You might as well use dictionary words because there are rules that check for that. Using a passphrase of english words (like correct horse battery staple)? Well it’s easy to test for combinations of dictionary words too, so those are a lot less secure than they look.

Mitigation

The core problem is that most passwords are stored using hash algorithms such as SHA and MD5 that were originally designed to verify file integrity - so speed was a primary concern. If a hash algorithm is fast, so is brute-forcing passwords that use it. Modern GPUs can compute hashes obscenely fast (spend a bit and you can hit 1 billion hashes per second), so a good way to thwart attackers is simply to use a hash algorithm that is slow. Several algorithms exist that are designed to be utilized specifically for passwords (such as PBKDF2 and BCrypt), and they are all much slower to compute for this very reason.

These algorithms also usually have a “work factor” that allows you to adjust the time it takes to compute the hash, which provides even more future-proofing against even faster password cracking hardware that is yet to be developed.

When would this affect Sitecore?

To be clear this is not a remotely-exploitable vulnerability. It’s also not a Sitecore-specific vulnerability - it affects most all sites running ASP.NET’s Membership provider which until very recent providers (Universal Providers) defaults to the SHA1 hash algorithm to store passwords. In order to exploit what I’m describing here, an attacker would have to gain access to your databases, which they would then use an offline password cracker like oclHashcat-plus to break the passwords to login to Sitecore.

Using PBKDF2 with Sitecore

There’s a class in the .NET framework that implements the PBKDF2 algorithm already, but it does so in a way that makes it difficult to add to a membership provider. There’s a very nice NuGet package called Zetetic.Security that wraps this functionality into a KeyedHashAlgorithm that can be used easily with membership. It’s very easy to install PBKDF2 into a membership provider using it - you have to install the NuGet package and change two configurations.

The only problem is that existing users will be unable to login because the existing hashes are all in SHA1 (Sitecore, and membership’s, default algorithm) and the membership provider now expects PBKDF2 hashes. If there are only a few users you can just reset passwords and be done with it. But if you have an existing userbase, you probably wish you could allow existing users to keep signing in, and new users (or password changes) to convert to PBKDF2. I’ve implemented a prototype extension of the SqlMembershipProvider that does exactly this by allowing the ValidateUser method to attempt SHA1 if PBKDF2 validation fails.

Other uses

The solution discussed in this post was tested against Sitecore, but is a generic method you could use with any .NET app that uses SQL Membership to store user info. I hope this random crypto rant has been useful to you :)

Editing hidden template fields in the page editor

August 14, 2013

Sitecore

Recently on a project I came across a need to allow editors who were using the Sitecore Page Editor to modify some fields that had no logical place on the actual editor screen. In this case, the fields were relating to page SEO - meta keywords and description among other things. Since these are in the head of the page, there’s no good place to stick an edit frame or custom experience button - they aren’t even part of a rendering placed on the page.

While poking around the WebEdit customization options, I hit on the idea of adding a WebEdit ribbon button that would act as a custom experience button would, and load a subset of the item fields in a content editor window. It turns out this was actually relatively easy.

In the Core database, the default WebEdit ribbon is defined under /sitecore/content/Applications/WebEdit/Ribbons/WebEdit/Page Editor. Under this item are items for each ‘chunk’ in the ribbon, and under that each command button, like this:

To implement your own command, you just need to add a Small Button or Large Button item underneath a Chunk - new or existing - to place your button on the ribbon. The “SEO” chunk in the above screenshot is an example of what you want to end up with.

The key piece of all of this is how to have your button actually load the field editor, and telling it what fields to load. You do this by configuring the Click field on the button to run the webedit:fieldeditor command like so:

webedit:fieldeditor(fields=Browser Title|Description|Keywords, command={007A0E9E-59AA-48BB-84F2-6D25A8D2EF80})

The fields argument is pretty easy to understand - a pipe delimited list of fields to edit, just like a WebEdit Custom Experience Button has (this command is in fact the one used by custom experience buttons). The command parameter I am not sure what it is used for. Internally it seems to need to be an item in the core db that exists - I used the GUID of my button item, and that worked fine.

And here’s what you end up with when you’re done, for a very small amount of work:

This worked for me on Sitecore 7 (RTM), but I suspect the technique would work great on earlier versions of Sitecore as well.

A subtle error: using System.IO.Path in a HTTP context

June 15, 2013

Sitecore

Here’s a fun little issue that you might eventually run across. Suppose you’re writing some code that needs to retrieve the extension from a URL. For example, a URL rewriter or Sitecore pipeline that acts only on certain file types.

You might think, as I would have, “oh, that’s built in to .NET - we’ll just use System.IO.Path.GetExtension()!”

And that would work, almost all of the time. The only issue comes from an internal implementation detail of IO.Path: it checks that the path does not contain characters that are invalid in a filesystem path. Specifically, “, <, >, |, and ASCII unprintables (< 32). Well, those characters (except perhaps the unprintables) are valid in a URL - so trying to get the extension of a URL path containing these characters will throw a nice fat exception:

System.ArgumentException: Illegal characters in path.
   at System.IO.Path.CheckInvalidPathChars(String path)
   at System.IO.Path.GetExtension(String path)

Unfortunately there is no way to disable this behavior - which is logical, given the purpose of IO.Path as a processor for file system paths, not URLs. Not even first parsing the URL using the System.Uri class will fix this, as this StackOverflow question suggests. The LocalPath property still includes the invalid characters that break Path.GetFileName() or Path.GetExtension().

There are a couple of ways I could see solving this problem. The first, and simplest - though possibly prone to security issues, would be to replicate what Path.GetExtension() does but omitting the invalid characters check. Reference:

int length = path.Length;
for (int i = length; --i >= 0; )
{
    char ch = path[i];
    if (ch == '.')
    {
        if (i != length - 1)
            return path.Substring(i, length - i);
        else
            return String.Empty;
    }
    if (ch == DirectorySeparatorChar || ch == AltDirectorySeparatorChar || ch == VolumeSeparatorChar)
        break;
}
return String.Empty;

The second would be to remove any invalid characters prior to calling Path.GetExtension():

string url = "\"http://mo\"nkeys/foo/bar/\"";
            
var invalidChars = Path.GetInvalidPathChars().ToHashSet();
for (int i = url.Length - 1; i >= 0; i--)
{
    // technically you could use a stringbuilder, but since 99% of the time this won't ever be used, this seems optimized enough
    if (invalidChars.Contains(url[i])) url = url.Remove(i, 1);
}

var ext = Path.GetExtension(url);

Personally I like the second solution, since you’d benefit from any upstream fixes in Path.GetExtension() in future framework releases.

Scoping searches to the current context in Sitecore 7's LINQ

May 10, 2013

SitecoreLucene

As a followup to my previous post about gotchas with Sitecore 7’s LINQ provider, here’s another thing to consider.

The indexing providers are very greedy about indexing. This means that unlike with traditional querying with Sitecore.Data.Item, where your results are automatically filtered by the context language and latest item version, no such filtering occurs with LINQ. You will receive all versions and all languages unless you specify otherwise.

As you might imagine, this can result in unexpectedly large quantities of search results in some cases. It can also be extra sneaky since during development you might only have one version in one language - so you wouldn’t even notice the issue.

So how do you fix the issue? First, let’s talk about versions. The default indexing configuration includes a field called _latestversion.
This is a boolean field that is only set to true for items that are the latest version in their language. We can take advantage of this by implementing a property on our mapped objects and mapping it to this index field like so:

[IndexField("_latestversion")]
public bool IsLatestVersion { get; set; }

Then, when we write a query we want to limit, we simply add a clause to the query:

.Where(x => x.IsLatestVersion)
// alternatively if you don’t want to be strongly typed and have an indexer, you can use
.Where(x=> x[“_latestversion”] == “1”)

Now you’ll only get the latest version. Now for languages, which are also pretty simple. If you’re inheriting from the SearchResultItem class, you already have a Language property. Otherwise you can add one like so:

[IndexField("_language")]
public string Language { get; set; }

Then, we add the following clause to the query:

.Where(x => x.Language == Sitecore.Context.Language.Name)

Now we get results like regular queries. If you're like me, the next question you're asking is "how can I just write this once and forget about it?" For example something like:

public static IQueryable<T> GetFilteredQueryable<T>(this IProviderSearchContext context)
    where T : MyItemBaseClass
{
    return context.GetQueryable<T>().Where(x => x.Language == Sitecore.Context.Language.Name && x.IsLatestVersion);
}

Unfortunately, this seems to be nigh impossible with the current revision of Sitecore 7. The issue has to do with how LINQ resolves expressions involving generic types that are not resolved at compile time. Effectively the expression in the example above converts to:

.Where(x=> ((T)x).Language == Sitecore.Context.Language.Name)

Notice the cast to T? That throws the expression parser for a loop. I've been told this will be fixed in a later release of Sitecore 7, but will not be part of the RTM release, so for the moment it looks like we're writing filtering on each query.

Sitecore 7 LINQ gotchas

April 15, 2013

SitecoreLucene

The upcoming Sitecore 7 release brings with it a new “LINQ-to-provider” architecture that effectively allows you to query search indexes using standard LINQ syntax. Mark Stiles wrote a pretty good synopsis of the feature that you should probably read first if you’re unfamiliar with the basics of how it works. This post won’t cover the basics.

I’ve been diving in to the underpinnings of the LINQ architecture and have discovered a number of things that may well cause confusion when people start using this technology. Be warned that this post is based on a non-final release of Sitecore 7, and may well contain technical inaccuracies compared to the final release.

You have to enable field storage to get output

By default, the values of fields are not stored in the index. If the values are not stored, you can query against the index using custom object mapping (e.g. filters work), but you will not see any field values mapped into results objects. You can define field storage parameters either on a per-field basis, or a per-field-type (e.g. Single-Line Text) in the default index configuration file (in App_Config/Include).

Changes to the storage type require an index rebuild before the storage is reflected.

LINQ is not C Sharp

Yeah, you heard me right. LINQ may look exactly like C#, but it is not parsed the same way. The lambda expressions you may use to construct queries against Sitecore are compiled into an Abstract Syntax Tree (AST) - sort of a programmatic representation of the code forming the lambda - and that is in turn parsed by the Sitecore LINQ provider.

The code you write into lambdas is not executed as normal C# code is. This is important to remember, because effectively the LINQ provider is simply mapping your query in as simple terms as possible to a key-value query inside of Lucene (or SOLR, etc). For example:

// we'll use this as a complex property type to map into a lucene object<br>
public class FieldType {
    public string Value { get; set; }
    public string SomeOtherValue { get; set; }
}
// this will be the class we'll query on in LINQ
public class MappedClass {
    [IndexField("field1")]
    public FieldType Field1 { get; set; }
}

// example queries (abbreviated code)
var query1 = context.GetQueryable<mappedclass>().Where(x=>x.Field1.Value == "foo");
var query2 = context.GetQueryable<mappedclass>().Where(x=>x.Field1.SomeOtherValue == "foo");

You’d expect query1 and query2 to be different wouldn’t you? NOPE. You’re not writing C# here, you’re writing an AST in C# syntax. The Sitecore LINQ engine takes the shortest path to a key-value query pair. What this really means is that it:

Resolves the Lucene field name of the field you're querying on (in this case, "field1")
Evaluates the operator you used, and the constant value you've assigned to compare to
Constructs a Lucene query expression based on that

In effect, you can only ever have one value queried for each property. In the example above both examples are in effect x.Field1 == "foo". The query would be that way even if you did a crazy query like x.Field1.Foo.Bar.Baz.Boink.StartsWith("foo") - that would boil down to x.Field1.StartsWith("foo").

There is a facility you can tie into that controls how Sitecore converts types in queries (TypeConverters). Unfortunately, that does not solve the problem of disambiguating properties - the conversion only informs the engine how to convert the constant portion of the query (in this case, the string.

Sitecore LINQ does not care if the return entity type is mapped to a valid object for it

If you execute a query against a type, say the MappedClass type in the previous example, the LINQ layer will map all results of the query against the MappedClass type. Sounds great, but be careful - it will also map results that may not have the expected template to map to the MappedClass type onto it.

For example, suppose I made a model for the Sample Item template that comes with Sitecore. Then I queried the whole database as SampleItem. Out of my results, probably only two are really designed to be mapped to my SampleItem - the rest will have nulls everywhere. This is potentially problematic if you forget to be specific in your queries to limit the template ID to the expected one.

You must enumerate the results before the SearchContext is disposed

If you’ve ever dealt with NHibernate or other lazy-loaded ORMs, this might make perfect sense to you. A typical search query method might look something like this:

public IEnumerable<MappedClass> Foo()
{
    var index = SearchManager.GetIndex("sitecore_master_index");
    using (var context = index.CreateSearchContext(SearchSecurityOptions.EnableSecurityCheck)) {
        return context.GetQueryable<mappedclass>().Where(x=>x.Name == "foo");
    }
}

Can you guess the problem? IEnumerable doesn’t actually execute until you enumerate the values (e.g. run it through foreach, .ToArray(), etc). If you return the IEnumerable<mappedclass>, it cannot be enumerated until the SearchContext has already been disposed of. Which means that will throw an exception!

To avoid this problem you need to either:

Return a pre-enumerated object. Usually the simplest form of this is simply to return the query with .ToArray() at the end, thus enumerating the query into an array before the context is out of scope. Warning: This also means that you should filter as far as possible within the query, including paging, especially with large result sets.
Execute all the code within the context's scope. This is probably less desirable as it either means spaghetti code in your codebehind, or a request-level context manager like NHibernate sometimes does.

The object mapper uses cached reflection to set each mapped property

Yes, it uses “the R word.” Reflection is great, but it’s not all that fast. This is fine if you’re running optimized code: you’ll want to perform pagination and filtering within Lucene, and avoid returning large quantities of mapped objects. The performance is dependent on both the number of objects and the number of properties you have on each object as each property results in a reflection property set.

I would suggest trying to keep the number of mapped objects returned under 100 in most cases, and/or using output caching to minimize the effect of reflection.

It’s also possible to patch the way the mapping code works (I have a reflection-less proof of concept that in relatively unscientific tests is about 50-100x faster than the reflection method. It enforces some code generation requirements however and is not as general purpose or as simple to use as the reflection method.

It’s still awesome

While this post has largely focused on unexpected gotchas around the Sitecore LINQ provider, it’s actually pretty darn nice once you get used to its quirks. It’s certainly loads easier to use than any previous Lucene integration, and the backend extensibility allows for all sorts of interesting extensions and value storage.

Sitecore MVP 2013

February 13, 2013

Sitecore

It’s been announced that I was selected as a Sitecore MVP for 2013! Some of the folks at the office whipped up a neat infographic showing where all the MVPs this year are from - check it out.

Selecting a Sitecore Rendering Technology

February 7, 2013

Sitecore

Sitecore has a dizzying array of ways you can write renderings. While there is documentation that explains what they are, there’s a lot less about where each kind should be used. This will attempt to explain when you should use each kind of rendering.

So what kinds of renderings are there?

Sublayouts (UserControls/.ascx)
Web controls (WebControl/.cs)
Razor renderings (.cshtml) - Sitecore MVC-style
Razor renderings (.cshtml) - WebForms-style
Method renderings (.cs)
URL renderings
XSLT renderings (.xslt)
Custom - yes, you can write your own rendering provider

That's a lot of ways to emit HTML to a page. Let's take a look at each kind and examine their strengths and weaknesses.

Sublayouts (User Controls)

These are probably the kind of rendering you should be using most of the time unless you have Razor at your disposal. Sublayouts are confusingly named because most of the time they are simply a rendering and not an actual subdivision of a layout. These are basically identical to a traditional .NET User Control - they have a page lifecycle, code behinds, events and all the other traditional Web Forms trappings. They have a relatively HTML-like appearance that makes them sensible to edit if you have HTML/CSS folks collaborating with you, unlike the C#-based renderings.

However they also have the same issues as their User Control cousins. Web Forms' at times utterly verbose syntax and confusing event/postback model can introduce bugs. Highly branched markup emission is also very hard in User Controls because the markup is all encoded in the .ascx file, and you have to resort to MultiViews or PlaceHolders and setting a ton of Visible properties to make it work.

Verdict: Use these for relatively static markup emission or places where the Web Forms event model will help you - like custom form renderings.

Web Controls

Web controls are simply C# classes that derive from the Sitecore WebControl base class. Web controls are perfect if you have to do a rendering whose markup has a lot of branching, for example a list that might have two or three different kinds of elements in the list because you can modularize the rendering into C# methods.

On the other hand WebControls can be extremely hard to read if not written in a disciplined manner. There is no obvious HTML emission, so you'll have trouble with HTML/CSS folks when they need to change markup or add a class - it's all in C#. You can also write spaghetti renderings that are very hard to follow how the code runs. You also have to remember that unless you override the GetCachingID() method, your WebControl cannot be output cached.

Verdict: Use these for places where you need tight control over HTML emitted, or have a lot of branches in your markup emission.

Razor Renderings (MVC-style)

Razor is a templating language traditionally associated with ASP.NET MVC. It dispenses with a lot of the usually unnecessary page lifecycle of a Web Form for a more simple template that is both readable as HTML and allows a decent amount of flexibility in terms of composing branch-heavy renderings. If you're using Sitecore's MVC support it's a no-brainer to use Razor renderings for nearly all purposes.

However to use the built in Razor support you must use Sitecore's MVC mode - which means you have to do everything in Razor. You also have to register controllers and views as items, and lose a lot of module support - for example, Web Forms for Marketers cannot presently run in MVC mode. At present this makes it nearly untenable to implement most real world sites using Sitecore's MVC mode.

Verdict: If you've got a Sitecore MVC site, use it.

Razor Renderings (custom)

There are a couple of third party shared source modules (such as Glass) that have implemented a custom rendering type that invokes the Razor template engine outside of a MVC context. This means you could reap the benefits of a Razor template's syntax, without needing to resort to Sitecore's shaky configuration-over-convention MVC implementation. These are dependent on how you feed the view data to the Razor rendering however, and each implementation works slightly differently.

Verdict: If you're using one of these modules, you'll probably implement most of your renderings in Razor without needing Sitecore MVC

Method Renderings, URL Renderings

These render output from a C# method that returns a string, and a given URL respectively.

Verdict: There's almost no good use case for either of these rendering types.

XSLT Renderings

These were once the main type of rendering in use. They use extended XSLT syntax to transform an item "XML" into a rendering. While they can be useful for EXTREMELY simple renderings, they do not scale well to any level of complexity. Most very simple renderings may at some point gain a measure of complexity, which would then mean either very ugly and slow XSLT or a complete rewrite in a different rendering technology. Do yourself a favor and save the rewrite - use something other than XSLT in the first place.

Verdict: Don't use.

Feel free to take these recommendations with a grain of salt. These are my opinions, based on the projects I've worked on. Your project may come across good reasons why I'm absolutely wrong about one of these options. Keep your eyes open :)

2010

Sitecore multi-site installations and output caching

May 10, 2010

Sitecore

When running Sitecore in a multi-site configuration you may run into an odd issue: output caching may seem to get too greedy and not clear when you’d expect it to.

There’s a simple culprit: the default Sitecore setup includes an event handler, Sitecore.Publishing.HtmlCacheClearer, that is invoked on the publish:end event. This event handler has a list of sites assigned to it, and the default is “website” - great, until you need to have more than one site and publishing doesn’t clear your site’s output cache. Fortunately it’s easy to configure more sites: just add more site nodes to the XML. You cannot however use config includes to allow each site to individually add itself to the list from its own config file.

There’s also a nuclear option: you can implement your own event handler that clears all sites’ caches. I’m not sure if this would have a detrimental effect on any of the system sites (i.e. shell), but you could exclude it. An example of doing that:

string[] siteNames = Factory.GetSiteNames();
for (int i = 0; i < siteNames.Length; i++)
{
   SiteInfo siteInfo = Factory.GetSiteInfo(siteNames[i]);
   if (siteInfo != null)
   {
        siteInfo.HtmlCache.Clear();
   }
}