Scoping searches to the current context in Sitecore 7's LINQ

May 10, 2013

As a followup to my previous post about gotchas with Sitecore 7’s LINQ provider, here’s another thing to consider.

The indexing providers are very greedy about indexing. This means that unlike with traditional querying with Sitecore.Data.Item, where your results are automatically filtered by the context language and latest item version, no such filtering occurs with LINQ. You will receive all versions and all languages unless you specify otherwise.

As you might imagine, this can result in unexpectedly large quantities of search results in some cases. It can also be extra sneaky since during development you might only have one version in one language - so you wouldn’t even notice the issue.

So how do you fix the issue? First, let’s talk about versions. The default indexing configuration includes a field called _latestversion.
This is a boolean field that is only set to true for items that are the latest version in their language. We can take advantage of this by implementing a property on our mapped objects and mapping it to this index field like so:

[IndexField("_latestversion")]
public bool IsLatestVersion { get; set; }

Then, when we write a query we want to limit, we simply add a clause to the query:

.Where(x => x.IsLatestVersion)
// alternatively if you don’t want to be strongly typed and have an indexer, you can use
.Where(x=> x[“_latestversion”] == “1”)

Now you’ll only get the latest version. Now for languages, which are also pretty simple. If you’re inheriting from the SearchResultItem class, you already have a Language property. Otherwise you can add one like so:

[IndexField("_language")]
public string Language { get; set; }

Then, we add the following clause to the query:

.Where(x => x.Language == Sitecore.Context.Language.Name)

Now we get results like regular queries. If you're like me, the next question you're asking is "how can I just write this once and forget about it?" For example something like:

public static IQueryable<T> GetFilteredQueryable<T>(this IProviderSearchContext context)
    where T : MyItemBaseClass
{
    return context.GetQueryable<T>().Where(x => x.Language == Sitecore.Context.Language.Name && x.IsLatestVersion);
}

Unfortunately, this seems to be nigh impossible with the current revision of Sitecore 7. The issue has to do with how LINQ resolves expressions involving generic types that are not resolved at compile time. Effectively the expression in the example above converts to:

.Where(x=> ((T)x).Language == Sitecore.Context.Language.Name)

Notice the cast to T? That throws the expression parser for a loop. I've been told this will be fixed in a later release of Sitecore 7, but will not be part of the RTM release, so for the moment it looks like we're writing filtering on each query.

Sitecore 7 LINQ gotchas

April 15, 2013

SitecoreLucene

The upcoming Sitecore 7 release brings with it a new “LINQ-to-provider” architecture that effectively allows you to query search indexes using standard LINQ syntax. Mark Stiles wrote a pretty good synopsis of the feature that you should probably read first if you’re unfamiliar with the basics of how it works. This post won’t cover the basics.

I’ve been diving in to the underpinnings of the LINQ architecture and have discovered a number of things that may well cause confusion when people start using this technology. Be warned that this post is based on a non-final release of Sitecore 7, and may well contain technical inaccuracies compared to the final release.

You have to enable field storage to get output

By default, the values of fields are not stored in the index. If the values are not stored, you can query against the index using custom object mapping (e.g. filters work), but you will not see any field values mapped into results objects. You can define field storage parameters either on a per-field basis, or a per-field-type (e.g. Single-Line Text) in the default index configuration file (in App_Config/Include).

Changes to the storage type require an index rebuild before the storage is reflected.

LINQ is not C Sharp

Yeah, you heard me right. LINQ may look exactly like C#, but it is not parsed the same way. The lambda expressions you may use to construct queries against Sitecore are compiled into an Abstract Syntax Tree (AST) - sort of a programmatic representation of the code forming the lambda - and that is in turn parsed by the Sitecore LINQ provider.

The code you write into lambdas is not executed as normal C# code is. This is important to remember, because effectively the LINQ provider is simply mapping your query in as simple terms as possible to a key-value query inside of Lucene (or SOLR, etc). For example:

// we'll use this as a complex property type to map into a lucene object<br>
public class FieldType {
    public string Value { get; set; }
    public string SomeOtherValue { get; set; }
}
// this will be the class we'll query on in LINQ
public class MappedClass {
    [IndexField("field1")]
    public FieldType Field1 { get; set; }
}

// example queries (abbreviated code)
var query1 = context.GetQueryable<mappedclass>().Where(x=>x.Field1.Value == "foo");
var query2 = context.GetQueryable<mappedclass>().Where(x=>x.Field1.SomeOtherValue == "foo");

You’d expect query1 and query2 to be different wouldn’t you? NOPE. You’re not writing C# here, you’re writing an AST in C# syntax. The Sitecore LINQ engine takes the shortest path to a key-value query pair. What this really means is that it:

Resolves the Lucene field name of the field you're querying on (in this case, "field1")
Evaluates the operator you used, and the constant value you've assigned to compare to
Constructs a Lucene query expression based on that

In effect, you can only ever have one value queried for each property. In the example above both examples are in effect x.Field1 == "foo". The query would be that way even if you did a crazy query like x.Field1.Foo.Bar.Baz.Boink.StartsWith("foo") - that would boil down to x.Field1.StartsWith("foo").

There is a facility you can tie into that controls how Sitecore converts types in queries (TypeConverters). Unfortunately, that does not solve the problem of disambiguating properties - the conversion only informs the engine how to convert the constant portion of the query (in this case, the string.

Sitecore LINQ does not care if the return entity type is mapped to a valid object for it

If you execute a query against a type, say the MappedClass type in the previous example, the LINQ layer will map all results of the query against the MappedClass type. Sounds great, but be careful - it will also map results that may not have the expected template to map to the MappedClass type onto it.

For example, suppose I made a model for the Sample Item template that comes with Sitecore. Then I queried the whole database as SampleItem. Out of my results, probably only two are really designed to be mapped to my SampleItem - the rest will have nulls everywhere. This is potentially problematic if you forget to be specific in your queries to limit the template ID to the expected one.

You must enumerate the results before the SearchContext is disposed

If you’ve ever dealt with NHibernate or other lazy-loaded ORMs, this might make perfect sense to you. A typical search query method might look something like this:

public IEnumerable<MappedClass> Foo()
{
    var index = SearchManager.GetIndex("sitecore_master_index");
    using (var context = index.CreateSearchContext(SearchSecurityOptions.EnableSecurityCheck)) {
        return context.GetQueryable<mappedclass>().Where(x=>x.Name == "foo");
    }
}

Can you guess the problem? IEnumerable doesn’t actually execute until you enumerate the values (e.g. run it through foreach, .ToArray(), etc). If you return the IEnumerable<mappedclass>, it cannot be enumerated until the SearchContext has already been disposed of. Which means that will throw an exception!

To avoid this problem you need to either:

Return a pre-enumerated object. Usually the simplest form of this is simply to return the query with .ToArray() at the end, thus enumerating the query into an array before the context is out of scope. Warning: This also means that you should filter as far as possible within the query, including paging, especially with large result sets.
Execute all the code within the context's scope. This is probably less desirable as it either means spaghetti code in your codebehind, or a request-level context manager like NHibernate sometimes does.

The object mapper uses cached reflection to set each mapped property

Yes, it uses “the R word.” Reflection is great, but it’s not all that fast. This is fine if you’re running optimized code: you’ll want to perform pagination and filtering within Lucene, and avoid returning large quantities of mapped objects. The performance is dependent on both the number of objects and the number of properties you have on each object as each property results in a reflection property set.

I would suggest trying to keep the number of mapped objects returned under 100 in most cases, and/or using output caching to minimize the effect of reflection.

It’s also possible to patch the way the mapping code works (I have a reflection-less proof of concept that in relatively unscientific tests is about 50-100x faster than the reflection method. It enforces some code generation requirements however and is not as general purpose or as simple to use as the reflection method.

It’s still awesome

While this post has largely focused on unexpected gotchas around the Sitecore LINQ provider, it’s actually pretty darn nice once you get used to its quirks. It’s certainly loads easier to use than any previous Lucene integration, and the backend extensibility allows for all sorts of interesting extensions and value storage.