2015

Unicorn 3.0.1 Released

October 11, 2015

Unicorn

The first maintenance release for Unicorn 3.x has been released to NuGet. Unicorn 3.0.1 brings with it a small set of bug fixes and improvements:

Data provider config patch is compatible with Sitecore Commerce and other modules that add data providers (thanks @jonnekats!)
Fixed a bug in Transparent Sync where syncing an item based on a template that only existed in transparent sync (and not the database) would act like the template did not exist. Ironically this was fixed in 3.0.0 but some n00b forgot to merge the config to activate it! (thanks Kevin Williams!)
Fixed a race condition that could cause SFS trees to incorrectly reinitialize during a reserialize operation and cause an error (thanks @akshaysura13!)
Content editor warnings have been improved:
- The name of the Unicorn configuration that includes the item is now shown so it’s easy to trace back why something is included.
- For transparent sync items, there is an indication of whether the item is on disk only or both on disk and in the database as well. Note that items in both places may well have different field values, there is no equality check.

Due to the fix to Transparent Sync, this update is recommended for anyone using that feature.

Have fun!

Unicorn 3.0 Released

October 2, 2015

Unicorn

patrick you're a bloody genius

It’s been over a year since the last major release of Unicorn, but I haven’t been bored. Today I’m happy to announce the release of Unicorn 3.0 and Rainbow 1.0.

Unicorn 101

Sitecore development artifacts are both code and database items, such as rendering code and rendering items. As developers, we use serialization to write our database artifacts into source control along with our code so that we have a record of all our development artifacts. Unicorn is a tool to make serializing items and sharing them across teams and deployed environments easy and fun.

Unicorn takes advantage of keeping serialized items always up to date on the disk, which allows all merging to take place using your source control tool. Modern source control tools such as Git or SVN are capable of automatically merging the vast majority of item changes, which saves you time and reduces the chance of human error when merging.

Unicorn 3 is fresh off the compiler and brings with it a huge raft of improvements. This version brings its own serialization format and filesystem hierarchy that are far more friendly to source control and merge conflicts than the default Sitecore format. It’s also ridiculously fast - about 50% faster overall than Unicorn 2 or Sitecore serialization APIs.

What else is new?

In fact there are lots more new features added to Unicorn 3 since the what’s new blog post was written. Not just minor details either! Read on…

Transparent Sync

This is a feature that demanded its own blog post! Transparent Sync enables Unicorn to sync the serialized items in real time, completely automatically. It does this by using its data provider to directly read the serialized items and inject them into the Sitecore content tree. The items on disk are the items in Sitecore: it bypasses the database entirely for transparently synced items.

Seriously, it’s magic. Transparent Sync might be the best feature of Unicorn 3. You should give it a try!

New Configuration Architecture

Previously Unicorn was distributed with its default example configuration directly in Unicorn.config. This was suboptimal because it encouraged modifying the stock config file and making future upgrades more of a merging challenge than they needed to be. In the new order, Unicorn ships without any configurations defined. There are two example configuration-adding patch files distributed with the NuGet package which you can duplicate and edit to your desires, and the README distributed with the NuGet package details what you need to do to get started.

Control Panel UX Improvements

The control panel has been redesigned to have an improved UX when there are many configurations defined. In the old UI, once more than a couple configurations were defined the page would need to scroll. The new UI reduces the vertical space for each configuration significantly:

The new control panel also has an experience for selecting only the configurations you wish to sync in a single batch:

For automated builds, you can also copy the URL on the sync button when you’ve got multiple configurations selected in the control panel to run the same batch at build time. This is great if you’ve got some configurations that you may not wish to sync during a CI build.

New Items Only Evaluator

This is an evaluator that changes the behavior of syncing to only write new items from serialization into Sitecore. Existing items with the same ID or items that are not serialized are left alone. This can be useful for example if you’re wanting to push some developer-initiated content items up, like metadata or lookup source items, that you want to always exist but once they have been created they become the content editor’s to do with as they will. A sample configuration that enables the NIO evaluator ships with the NuGet package. Thanks to Nathanael Mann for the feature request.

Exclude all children with the predicate

To exclude all children of an item simply add a trailing slash to the <exclude> like so:

<include database="master" path="/sitecore/content">
    <exclude path="/sitecore/content/" />
</include>

Visual Studio Control Panel Support

Recently a plugin for Visual Studio was released that enables you to run Unicorn syncs directly within Visual Studio. Unicorn 3 supports this tool out of the box, by simply enabling the Unicorn.Remote.config.disabled patch file.

The Unicorn Control Panel for Visual Studio requires VS2013 or VS2015. It is developed by Andrii Snihyr - thank you for the community support!

TFS Support

Fellow Sitecore MVP and Connective DX’er Dave Peterson has written up a plugin for Rainbow/Unicorn that integrates it with TFS. TFS 2010 (or server workspaces in 2012 and later) have the unfortunate limitation that all files are locked by default. This interferes with Unicorn’s operation as it writes files as items are changed in Sitecore, resulting in errors.

The TFS integration hooks the TFS API to Rainbow’s SFS data store, causing it to actually check out the file Unicorn is about to write or delete before acting. You must use a 32-bit IIS app pool when using the plugin, as the TFS API is 32-bit only.

The TFS plugin is available on NuGet; for installation instructions and the latest info see the README on GitHub

Documentation

The documentation in the README, comments in the stock config files, and verbiage in the actual control panel and logs have all been reviewed and improved. If you get confused by anything, let me know and I’ll make the docs better or the error message clearer in the next release.

Thank you

Unicorn is a project by and for the Sitecore community. I’d like to thank everyone who’s contributed to the project in a major or minor way: without you, we wouldn’t have this tool today. Thank you!

Introducing Transparent Sync in Unicorn 3

October 2, 2015

Unicorn

A couple years ago Alex Shyba told me about an idea he had: what if we used a data provider to map serialized items directly into the content tree, bypassing the database and eliminating the need to sync? I was like

I went nuts and wrote a prototype that did exactly that.

The prototype, nicknamed Rhino because it has a prominent horn like certain other mythical beasts, actually worked fairly well. Unfortunately there are two hard problems in computer science: naming, cache invalidation, and off by one errors. Cache invalidation, specifically using the FileSystemWatcher to observe file changes by source control, was unreliable. Because of how core serialization is to Sitecore development practice, unreliability is not acceptable. Reluctantly, I shelved Rhino and worked on Unicorn 2 instead.

The idea of Rhino stuck around. The improvements brought around in Rainbow, such as partial item reading and tighter control around storage, enabled working around the limitations that had precluded Rhino from being useful in production. Thus the idea returns as Transparent Sync, which might just be the best part of Unicorn 3.

Transparent Sync enables Unicorn to sync serialized items in real time, completely automatically. It does this by using its data provider to directly read the serialized items and inject them into the Sitecore content tree. The items on disk are the items in Sitecore: it bypasses the database entirely for transparently synced items. Changes made on disk update nearly instantly in the Sitecore editing interfaces.

not sure if witchcraft, or transparent sync

Imagine a scenario where a development team is working with a feature-branch driven workflow, like GitHub Flow. In order to perform a code review when using Transparent Sync, you merely checkout the feature branch under review and your Sitecore is immediately configured with the items that the feature includes.

When should we use Transparent Sync?

Transparent Sync is turned off by default because you should understand how it works before enabling it.

Transparent Sync is excellent for development artifacts like templates and rendering items, but it’s inappropriate if you’ve got hundreds of thousands of content items. At startup Unicorn must build an index of metadata, which involves reading the headers of each serialized file. If you have a SSD this penalty is pretty minimal, but a traditional hard drive not so much. In testing I enabled transparent sync for the whole default core and master database (19,228 items) using a SSD and noted about 100ms increase in startup times on average.

Because Transparent Sync bypasses the normal sync process, transparent sync also bypasses anything that is hooked to sync. This would include things like custom evaluators (like NewItemsOnlyEvaluator) and the sync event pipelines. If you are relying on these customizations, turn transparent sync off for the configurations that use them.

How do we use Transparent Sync?

Note: you must perform an initial serialization of a configuration with Transparent Sync off before you enable it. Otherwise the items in the configuration will seem to disappear as transparent sync shows all zero items that are on disk!

Turning transparent sync on is really easy: take the <configuration> you want to add transparent sync to and put this line in it:

<dataProviderConfiguration enableTransparentSync="true" type="Unicorn.Data.DataProvider.DefaultUnicornDataProviderConfiguration, Unicorn" />

You can also change the setting in the global <defaults> if you want to change the default setting of transparent sync for all configurations.

Once Transparent Sync is enabled all you have to do is change items on disk and the updates will immediately appear within Sitecore.

Going to production

In production we usually remove the Unicorn data provider as it’s not normally required. However without the Unicorn data provider, transparently synced items disappear: they aren’t really in the database at all, they’re on disk. There are two approaches to solve this problem:

Transparent Synced configurations can sync in the traditional way. This will persist the disk-based items permanently in the database.
Keep the data provider enabled in production. This is appealing because it means you can just deploy files to production and be done with it - and rollback in the same way. Be aware that if the IIS app pool identity cannot write to the serialized items you will be unable to make any ‘emergency’ changes to the items in Sitecore.

What happens if I reserialize a Transparent Sync configuration?

The database is used for all reserialize operations. In a Transparent Sync configuration, it is common for items to NOT reside in the database at all. If you reserialize this configuration it will be reset to what is in the database, thus deleting any Transparent Sync items that are not already in the database. Similarly if you use ‘Dump Item’ to do a partial reserialize on a Transparent Sync item it will be reverted to what is in the database, which may well be ‘nothing’.

There are warnings in the control panel if you reserialize a Transparent Sync configuration.

Anything else?

I hope you have as much fun with Transparent Sync as I had making it!

Unicorn 3: What's new?

September 14, 2015

Unicorn

Now that I’ve finished being confusing with my last three posts about Rainbow, let’s talk about Unicorn 3.

Unicorn 3 is the product of a year of thought and implementation. The design goal was nothing less than fixing every annoyance or issue that we ran across in daily usage of Unicorn 2. So what’s new?

New serialization format

High on the list of annoyances with daily Unicorn usage was the difficulty of resolving merge conflicts in the Sitecore serialization format.

The only fix for this was to make a better format: human readable, easily mergeable, and fixing the daily annoyances of the Sitecore format. Because no tool is an island, the new serialization tools Unicorn 3 uses have been split off into their own library: Rainbow. Rainbow enables others to use the new serialization tools developed for Unicorn 3, without depending on Unicorn.

The details of the new format and why it exists can be found in part 1 of my Rainbow blog series.

New file organization

Alongside designing the new serialization format, another problem was to resolve the longstanding limitations and bugs in the way Sitecore’s built in serialization stores its file hierarchy. This was worth a whole blog post in and of itself, but in summary I think those limitations have been eliminated.

The new hierarchy also alters the way the storage tree operates. Whereas Sitecore’s model represents an entire database, Unicorn 3 represents a set of subtrees of the database. Depending on the depth of the root paths, this can result in much shorter filesystem paths and fewer over-length path loops.

Improved user experience

Improved console messaging

Unicorn 3 has greatly improved messaging, both reducing extra output that isn’t necessary as well as vastly better error formatting. This hilarity was a constant annoyance if something went wrong in Unicorn 2 (in this case, invalid serialization format):

Unicorn 3’s version of the same error:

Editor warnings for Unicorn-controlled items

Another common issue is having someone edit an item that Unicorn controls on a deployed server, and having their changes overwritten by the next automated deployment. People asked for a way they could know if an item was “Unicorned” or not. Well, now there is one:

The message shown changes depending on the environment as well; this is how it looks when “production mode” is on:

And also, if production mode is enabled if you attempt to save the item:

Unicorn-enabled serialize commands

Ever used these handy tools on the by-default-hidden Developer tab of the Sitecore ribbon?

Guess what happens if you use these commands on an item Unicorn 3 controls? You guessed it: you can do partial syncing and partial reserialization using these commands. In Unicorn terms, there is no difference between “update” and “revert”: both just mean “sync.”

Note: Using the commands on non-Unicorn items results in their default behaviour. Also “update database” and “revert database” do not interface with Unicorn at all, and perform their default actions.

Performance: 50% more of it

Unicorn 3 has had significant performance profiling applied to it, and is about 10-20% faster than Unicorn 2 or stock Sitecore serialization. On top of that, multithreading is utilized to further increase sync and reserialize speed. Between threading and optimizations, it’s generally about 50% faster than Unicorn 2. For maximum performance, using SSD storage is highly recommended as it has a major impact on sync speed.

Auto-publish synced Items

Unicorn 3 enables efficient auto-publishing of items that a sync has modified, using manual publish queue injection. This option is enabled by default by the Unicorn.AutoPublish.config file. You can remove this file to disable the feature if you don’t like it.

Deleted template field handling

In Unicorn 2, deleting template fields could cause havoc. If any other serialized item contained a value for the deleted field, and you failed to reserialize that item after deleting the field, your next sync would be greeted with a big ugly error and you’d have to go manually remove the offending field from the file.

Unicorn 3 fixes this issue, by making this error a warning instead. (We can’t just ignore it, because ignoring it would cause syncs that created template fields as well as values for that field to not load the values the first time)

Versioned to Shared field conversion

Converting a field between shared, versioned, and unversioned by syncing a template was not supported in Unicorn 2. The field itself would change, but the existing values for the field would not move to the correct database table in Sitecore. Unicorn 3 includes the necessary adaptation to migrate stored values when field sharing is changed by a sync.

Improved configuration scheme

Unicorn 2 used a single configuration file: Serialization.config. This was a bit confusing as parts of it were ideally removed when deployed to a CE or CD environment and that was not always obvious.

Version 3 addresses this situation by creating its own App_Config\Unicorn folder and splitting the configuration into files that can simply be deleted instead of modified. This folder contains:

Unicorn.config: Contains core configuration, e.g. predicates and dependency setup, can be left intact anywhere.
Unicorn.DataProvider.config: Attaches the Unicorn Data Provider, which handles automatic serialization of changed items to disk. Can be removed for any environment not collecting changes. (usually, any non-dev environment)
Unicorn.UI.config: Adds the Unicorn Control Panel, Partial Sync, editor warnings, and other end-user UI elements. Remove for CD environments.
Unicorn.AutoPublish.config: Causes Unicorn to automatically publish items that it changes during a sync. Remove if you don’t want this feature, or on CD environments.
Unicorn.Deployed.config.disabled: If this config is enabled, typically on a deployed CE instance, saving an item added to Unicorn results in a warning confirmation.

Each of these config files also has comments explaining the above as well as what the settings within do.

Technical Improvements

Unicorn 3 also includes numerous miscellaneous technical improvements.

No third party dependencies. (that means you, Ninject)
Improved default predicate configuration settings account for Sitecore 8 additions and common modules.
Added SerializationHelper, a handy class that lets you more easily invoke Unicorn operations such as Sync without a lot of setup, if you want to programmatically use Unicorn.
Added pipelines to hook to sync events. unicornSyncBegin occurs when a configuration sync begins. unicornSyncComplete occurs when a configuration sync ends. unicornSyncEnd occurs at the end of all syncs. (e.g. after all configurations have synced if batching > 1)
Thanks to Rainbow, the object model is unified (ALL items are IItemData both from Sitecore and serialized). Previously each had their own parallel models.

System Requirements

Unicorn 3 has been developed on Sitecore 7.2 Update-5, as well as Sitecore 8 Update-5. It should be compatible with all version of Sitecore 7 and 8.

Unicorn 3’s code should be easy to adapt to Sitecore 6.2-6.6, however there is no official support for it so you’d want to compile from source.

Ok, ok. Is it released yet?

No. But it is in an actively updated beta on NuGet that is fairly stable :)

Rainbow Part 3: What is Rainbow?

September 10, 2015

Unicorn

It may seem a bit weird that in part 3 of a series, we’re talking about what something is.

Nobody can tell you what it is, you have to see it for yourself

After all, shouldn’t that be part 1? Nope, this is iterative development :)

So, what is Rainbow? Rainbow is designed to be a complete replacement for the Sitecore serialization format and filesystem organization, as well as enabling cross-source item comparison. It is a pure code library that comes with no UI of any kind, that is designed to be used with other libraries that use its serialization services with their own UIs. Libraries that consume Rainbow - such as Unicorn - gain the ability to abstract themselves from serialization details. Libraries that extend Rainbow can add new serialization formats, new places to store serialized items, and new ways to organize them.

Rainbow Features

Universal Item Data and Data Stores

Rainbow implements a set of interfaces that wrap the structure of a Sitecore item - item, version, and field. These interfaces provide a universal language that all Rainbow data stores can implement against. You could get an IItemData from Sitecore and write it out to disk as a YAML formatted item. You could get an IItemData from a web service, and deserialize it into a Sitecore database. You could construct an IItemData programmatically, and serialize it to a Sitecore database. Implementations of IDataStore provide places to store item data. It’s completely universal, and everything Rainbow does revolves around these abstractions.

YAML-based serialization formatter improves the storage format for serialized items

This post goes into more detail about the hows and whys of the YAML serializer
The format is valid YAML. YAML is a language that is designed to store object graphs in a human readable fashion. It uses significant whitespace and indentation to denote data boundaries. Note: only a subset of the YAML spec is allowed for performance reasons.
Any type of endline support. Yes, even \r because one of the default Sitecore database items uses that in its text! No more .gitattributes needed.
No more Content-Length on fields that requires manual recalculation after merge conflicts
Multilists are stored multi-line so fewer conflicts can occur
XML fields (layout, rules) are stored pretty-printed so that fewer conflicts can occur
Customize how fields are stored when serialized yourself with Field Formatters

Serialization File System (SFS) storage hierarchy

This post goes into more detail about SFS
Human readable file hierarchy
Extremely long item name support
Unlimited path length support
Supports non-unique paths (two items under the same parent with the same name), while keeping it human-readable
Stores each included subtree in its own hierarchy, reducing file name length
You can plug in the serialization formatter you desire - such as the YAML provider - to format items how you want in the tree

Deserialize abstract items into Sitecore with the Sitecore storage provider

Turn Sitecore items into IItemData with the ItemData class
Deserialize an IItemData instance into a Sitecore item with DefaultDeserializer (used via SitecoreDataStore)
Query the Sitecore tree in abstract with SitecoreDataStore, as if it were any other serialization store

Item comparison APIs

Compare any two IItemData instances regardless of source
Customize comparison for field types or specific fields with Field Comparers
Get a complete readout of changes as an object model

Improvements

Improvements are in comparison to Sitecore serialization and the functionality in Unicorn 2.

Deleting template fields will no longer cause errors on deserialization for items that are serialized with a value for the deleted field (e.g. standard values)
Deserialization knows how to properly change field sharing or field versioning and port existing data to the new sharing setting

Rainbow Organization

Rainbow consists of several projects:

The core Rainbow project contains core interfaces and components that most all serialization components might need, for example SFS, item comparison tools, field formatters, and item filtering
Rainbow.Storage.Yaml implements serializing and deserializing items using a YAML-based format. This format is ridiculously easier to read and merge than standard Sitecore serialization format, lacking any content length attributes, supporting any type of newline characters, and supporting pretty-printing field values (e.g. multilists, layout) for simpler merging when conflicts occur.
Rainbow.Storage.Sc implements a data store using the Sitecore database. This can be used to read and write items to Sitecore using the IDataStore interface.

Extending Rainbow

Rainbow is designed to be loosely coupled. It’s recommended that you employ a Dependency Injection framework (e.g. SimpleInjector) to make your life constructing Rainbow objects easier. Of course you can also construct objects without DI.

Rainbow has extensive unit and integration tests and hopefully easy to understand code, which serve as its living documentation. As of now, Rainbow has 90% test coverage and 220 tests. Unicorn, which uses Rainbow as a service, can also be a useful source of examples.

If you can’t find what you’re looking for you can find me on Twitter (@kamsar) or on Sitecore Community Slack.

Next time, we’ll start talking about what’s new in Unicorn 3 - other than all the Rainbow things that come along because Unicorn uses Rainbow.

Reinventing the Serialization File System: Rainbow Preview, Part 2

August 5, 2015

Unicorn

This is the second post in a series about Rainbow, an advanced serialization library for Sitecore. Part 1, dealing with improving the serialization file format, can be found here.

Introducing Serialization File System

Rainbow supports the idea of a data store, which is an abstraction of the necessary components to store and retrieve Sitecore items. A data store need not be serialized: Sitecore’s database is itself implemented as a data store as far as Rainbow is concerned.

This time we’ll be talking about the Serialization File System data store. Serialization File System, or SFS for short, is a pattern for organizing files on disk to represent a Sitecore item tree. It’s only a pattern: it depends on a serialization formatter, such as the YAML formatter, to do the actual serialization and deserialization. This means that you can use whatever format you want, without having to reimplement the whole organizational structure. Or if you can’t stand SFS you can make your own data store and keep the YAML format :)

Why do we need SFS?

To understand why we need SFS, let’s get down to data structures for a minute here. Windows’ file system is essentially a B-tree. Sitecore’s content tree is also essentially a B-tree. So we can map the two together pretty easily, right? WRONG.

Path Length

Sitecore’s content tree is effectively infinite (by definitions of “infinite” that mean “up to 20 levels by default”) in depth. The Win32 filesystem APIs on the other hand, have a maximum path length of 240 characters. So now we run into a problem where you can have a path in Sitecore that is unrepresentable on the filesystem, because the content path is too long. Whoops. Sitecore’s serialization APIs handle this situation, albeit a bit clunkily. They take the item path and apply a trivial hash algorithm to it, then put the item at the root of the serialization tree in a folder named that hash. For example (slightly simplified):

given /sitecore/foo/bar/baz/quux/quince (imagine this is longer)
baz might be written to c:\serialization\master\sitecore\foo\bar\baz.item
but quux might go to c:\serialization\A8CF73\quux.item
and quince might go to c:\serialization\C732F1\quince.item - not even a child of quux

This solution works, but it also means that the hierarchy becomes nearly unintelligible once short-paths start being used. The hash is not a standard algorithm; there is no obvious way to see what it means without reading the .item file within to see what its path is. It also makes merging very unintuitive if the short-pathed items are changed because the file path tells you nearly nothing.

So now we’ve seen path length problems, but what about a special case of that? What happens if you create an item name that is so long that it alone becomes too long to fit in the Windows filesystem path limits? Imagine an item with a 300-character name. Well if this occurs - as can pretty easily with Web Forms for Marketers - the Sitecore serialization APIs all choke. Unicorn 2 fails because it uses the Sitecore APIs. Pretty sure TDS does too, but I could be wrong.

Filenames are not unique

Sitecore’s content tree, and the APIs to retrieve items “by path,” are quite misleading. Why? On the Windows filesystem the file name is a unique key. However under the covers of Sitecore the item’s ID is the unique key. Duplicate names are totally allowed - which means that getting something “by path” is horribly ambiguous (.5% of the time, but still). But this is the basis upon which the Sitecore standard serialization hierarchy is built: ambiguity.

But they made the right choice. Do you want to merge conflicts in items named by ID on disk? How about deal with that you’d hit the path length limit in about 3 levels with 35+ character file names? Thought not. But this means we have a problem to solve: how do we map non-unique nodes in the database (the item name) onto unique nodes (file names) on the file system.

Sitecore’s serialization APIs solve this in a somewhat decent fashion. When an item is serialized the parent paths are evaluated for items of the same name, and if one exists then the item path has the parent ID appended to it. For example:

given two items with path /sitecore/foo
when you write them to disk they would get a path like c:\serialization\master\sitecore\foo_d0ec0aa931eb46ecb241d1ca18b4c5b2.item where each item has its ID postfixed to the filename

Seems legit, right? Unfortunately it’s very broken and can actually corrupt your serialization tree. Don’t believe me? Try this:

given an item with path /sitecore/foo, serialize it
now we have c:\serialization\master\sitecore\foo.item
next we create another foo item with a different ID
now we’ll have foo.item, and foo_d0ec0aa931eb46ecb241d1ca18b4c5b2.item in the same folder
so far, so good. but now serialize the original foo item again.
now we have foo.item with old data, foo_d0ec0aa931eb46ecb241d1ca18b4c5b2.item, AND foo_2195e766591d4baf8ac63a1efa43526d.item
yep. two items on disk for the original foo.item, and a corrupted tree. Bad.

Pathing bugs in the API

The Sitecore serialization pathing APIs are unfortunately pretty buggy. There are several methods that assume you’d never serialize an item anywhere outside the default serialization folder - and in fact throw an error if you try it. These methods are all static, and thus the only way to change their behavior is by decompiling them and using your own fixed copy of them. That’s not in the least suboptimal, and it’s quite intentional that Rainbow has zero dependency on the Sitecore serialization APIs.

Tired of hearing me rant about bugs? Me too, how about we talk about solving these problems instead!

How does SFS work

The SFS data store is capable of storing practically infinite item path depths, as long of an item name as you please, handling duplicate filenames in all cases, and writing to any path you please. SFS is based on the idea of a ‘solid’ tree, where every node with children must also contain the serialized parent, for example:

given /sitecore as the root
you could serialize /sitecore/foo
but you could not serialize /sitecore/foo/bar without also serializing foo

Sitecore serialization does allow for ‘sparse’ trees where you can have unserialized parents. The astute may be reading this and asking “wtf, do we have to serialize the whole database then?”

SFS supports relative trees instead of a single monolithic tree that represents an entire database like Sitecore uses. For example:

given /sitecore/templates/User Defined as the root of a tree
User Defined might be serialized as c:\rainbows\User Defined\User Defined.yml

See how Rainbow ignores the relative Sitecore path? This means for deep tree roots your filesystem path length can be much shorter because it doesn’t need empty parents, resulting in fewer over-length file paths. It also means that you can browse to the items with fewer clicks in Windows Explorer. Unicorn 3 also allows you to name your trees (which in Unicorn terms map to an <include> entry on your predicate), so your serialization folder might resemble:

 c:\rainbow
     Templates
         User Defined.yml
     Funny Giphys
         Animations.yml
     Old School Content\
         Home.yml
         Home
             About us.yml

Note how the tree root folder name need not match the root item name (though it does by default, but it’s pretty easy to have duplicate names doing that). In the above case the c:\rainbow\templates item might be rooted in Sitecore at /sitecore/templates/User Defined. Note: the use of solid trees does preclude some kinds of exclusions, namely anything not path-based because other exclusions could result in a sparse tree.

Long Path Handling

SFS handles long content paths by using loopback paths. These are similar in concept to Sitecore’s hashed paths, but unlike hash-paths they actually transplant any children of that path under the loopback path. The loopback paths are also named by the item ID of the parent of the items in the loopback. Let’s look at an example (with a contrived very short max path length):

Given c:\rainbow\root\some\rather\long\path\length\thing\parent.yml (ID: 2195e766-591d-4baf-8ac6-3a1efa43526d)
Suppose that adding 3 characters to the end of “parent” makes the child path over-max-length. So if we add loopchild under parent,
c:\rainbow\root\2195e766591d4baf8ac63a1efa43526d\loopchild.yml becomes the child’s path, based on the parent item’s ID
If we add quux under loopchild, it goes to c:\rainbow\root\2195e766591d4baf8ac63a1efa43526d\loopchild\quux.yml - under the same loopback, adding a level of human readability hash-paths lack

Loopback paths may loop multiple times for extremely long path lengths. Loopbacks also handle the case where some children are short enough to live under the parent and others’ name puts it over the limit into a loopback.

Duplicate File Name Handling

If you have items of the same name under the same parent in Sitecore, SFS uses a similar approach to Sitecore’s APIs with item_id.yml as the filename format. However SFS does a lot of correctness verification that Sitecore does not, because SFS generates the paths based on the filesystem and not Sitecore. For example, suppose you had two items of the same name and then added different children to each. SFS will actually resolve the path by evaluating down the filesystem paths until it finds the matching path regardless of parentage - and it returns all matches instead of whichever one it feels like. The case of writing items at different times is also handled; SFS checks existing same named items for any with the same ID and reuses the name. So you never get renames for no reason or corrupted trees.

This may sound like a lot of file reading. Yes, it can cause more file reads than Sitecore’s approach. However with a smart path to ID cache, and the ability to read item metadata (which in the case of YAML items means only reading the first 4 lines of the file, which is 3x or more faster than reading the whole thing), both dumping and syncing items from SFS is up to 50% faster than Unicorn 2 on the same items.

Long Item Name Handling

Obscenely long item names are handled by simply truncating them before putting them on the filesystem. A setting controls the maximum name length. Parent items are properly disambiguated using ID suffixes if two differently named items truncate to the same short name. I serialized a whole Sitecore database with max filename length = 5 to test this. So many duplicate names then!

Reliability

SFS (and the YAML formatting pieces we talked about last time) has 95% code coverage, and all bugs that have been found so far are covered with additional tests.

Hey, what about an index file to speed up querying?

In fact Rainbow was originally designed to be a general purpose data store, where you could directly query an item by ID, path, template ID, or parent ID. It had in memory indexes that it would maintain, backed variously by a single global index file, and reading the headers of each serialized file. The indexes worked pretty well, in fact. But they had several major problems that caused me to scrap them:

There’s no good way to maintain an index cache, because you may not be the only writer to the index file (e.g. you git pull someone else’s changes). FileSystemWatcher is not 100% reliable, and that would be a necessity to avoid data corruption, which is a big deal when you’re talking about serialization.
A centralized index file precludes the possibility of easily copying items between trees because the index entries would have to go with it
A filesystem logically organized for a computer and index, where items are stored by ID-based filenames, is nearly unintelligible to a human and merging becomes hairy, and commit errors due to not being able to see the item path easily would be possible

Can I have it yet?

Why yes, yes you can. But…

Back up in your *** with the resurrection

…so if you could just go ahead and download the beta, that’d be great.

Rainbow and Unicorn 3 betas are currently available from NuGet (you’ll need to enable prerelease packages). For a fresh install it should be as simple as installing the Unicorn package - unless you want to hack around the Rainbow APIs, in which case Unicorn is not required. For upgrading from Unicorn 2, there’s a doc for that.

It’s vaguely stable, but hasn’t had extended testing in real life Sitecore development like a final release will have. It no doubt has some bugs. The bugs may be more than minor. Feel free to test it if that doesn’t scare you; send me a ticket on GitHub if you find issues. Now’s a great time for feature requests too!

Rethinking the Sitecore Serialization Format: Rainbow Preview, part 1

July 22, 2015

Unicorn

If you’ve worked with Sitecore in a team setting for any length of time, you’ve probably had to deal with item serialization. Item serialization is nearly a requirement to be effective when you need to share templates, renderings, placeholder settings, custom experience buttons, and all the other Sitecore items stored in the databases that are effectively development artifacts, as opposed to content. Being development artifacts, we want to keep them under source control so we can version them, develop feature branches with them, and deploy them using continuous integration to our shared Sitecore installations.

If you’ve dealt with Sitecore’s serialization format in a team environment for any length of time, you’ve probably started to realize some of its shortcomings. Because,

One does not simply serialize without merge conflicts

Let’s pick on multilist fields, for example. This is what a multilist looks like in SSF:

----field----
field: {E391B526-D0C5-439D-803E-17512EAE6222}
name: Allowed Controls
key: allowed controls
content-length: 194

{E11BDB3B-1436-4059-90F6-DE2EE52A4EB4}|{D9C54253-37FF-4D64-8894-5373D8799361}|{F118E540-CC75-4AA9-A62B-D6ED9E6F77E4}|{A813194F-32F4-4501-A430-6602ABF73535}|{2F4ADF0B-9633-4EE9-B339-8CA32E2C3293}

Now let’s imagine using this in a team environment. Alice creates a new rendering, and needs to add it to placeholder settings so it can be used in Experience Editor. Meanwhile on another branch, Bob adds a different rendering he’s made to the same placeholder settings. What happens? Merge conflict. And not a simple, easy to solve conflict: one where a very long single line has to be merged by hand - because merging is line oriented. On top of that, you must not forget to recalculate the content-length. Oh joy.

Then let’s take a look at the data that’s stored. Do we need key to load the field? Do we need name even? Nope, though having a name around makes it easier to understand - we just don’t need two. Then how about that the format is endline specific - don’t leave home in Git without the special .gitattributes to leave those alone, or the files won’t be read by the Sitecore APIs.

Here’s another gremlin about serialization: it saves fields that aren’t important to development artifacts that are version controlled. Yes, I’m talking about you __Revision, __Updated by and __Updated. Certain Sitecore tools - like say the template editor - cause a ton of item saves and they are not picky about if any actual data fields have changed. This means that if I add a Foo field to my Bar template, the last updated on Bar, Foo, and every one of Bar‘s fields gets changed. Even if there is an actual data change, and the data change is auto-mergeable, you can still get conflicts on the statistical fields. Welcome to annoying merge conflict city, folks!

Part I: The JSON era

I’ve been at this a while. Version 1 is largely on the junk pile. Why? JSON.

JSON seemed like an obvious candidate format: it’s quick to parse, mature, and easy as pie to implement with JSON.NET and a few POCOs. In fact, its performance is quite similar to ye olde content-length above. It was certainly a step up, and I was not the only person to think of this idea. I learned a lot and got many ideas from Robin’s prototype, not the least of which was that we should reformat field values to make merging easier.

Oddly enough it was the idea of field reformatting that made JSON an untenable format. The problem is that merging is line oriented - so we would want to reformat that multilist from the first example with one GUID per line such that merge tools could make sense of it. Well JSON does not allow literal newlines in data values, instead it uses the string literal \r and/or \n. Unhelpful - but it makes parsing really fast.

Part II: YAML Ain’t Markup Language

With JSON in the bin, I started poking around for formats that had already been created which would support our needs. YAML (the acronym is the title of this section) fit the bill nicely. The design goals of YAML are to be a more human readable superset of JSON for things like configuration files and object persistence. Allows multiple lines, human readable, allows lists and nesting - nice.

Well the downside is that because of the flexibility of the format, YAML parsers are on the order of 10-100x slower than JSON parsers. It was slooooow. But the good news is that the YAML-based serialization format I had designed was much, much simpler than the entire YAML specification. So I wrote my own reader and writer that supported only the subset of YAML that was necessary. It was fast. The format was easy to read and understand. It had the ability to add field formatters to make values mergeable (at present, for multilists and layout fields). I wanted to start using it pretty badly in real projects :)

So without further ado, here is the same item that we took the multilist from above, but in YAML:

---
ID: 38ddd69e-fb0a-4970-926e-dfb0e5b9a5e1
Parent: 68e4c671-797d-4a89-8fa6-775926f1381d
Template: 5c547d4e-7111-4995-95b0-6b561751bf2e
Path: /sitecore/layout/Placeholder Settings/reductio/ad/absurdum
SharedFields:
- ID: 7256bdab-1fd2-49dd-b205-cb4873d2917c
  # Placeholder Key
  Value: heading
- ID: e391b526-d0c5-439d-803e-17512eae6222
  # Allowed Controls
  Type: TreelistEx
  Value: |
    {E11BDB3B-1436-4059-90F6-DE2EE52A4EB4}
    {D9C54253-37FF-4D64-8894-5373D8799361}
    {F118E540-CC75-4AA9-A62B-D6ED9E6F77E4}
    {A813194F-32F4-4501-A430-6602ABF73535}
    {2F4ADF0B-9633-4EE9-B339-8CA32E2C3293}
Languages:
- Language: en
  Versions:
  - Version: 1
    Fields:
    - ID: 25bed78c-4957-4165-998a-ca1b52f67497
      # __Created
      Value: 20100310T143300
    - ID: 52807595-0f8f-4b20-8d2a-cb71d28c6103
      # __Owner
      Value: sitecore\admin
    - ID: 5dd74568-4d4b-44c1-b513-0af5f4cda34f
      # __Created by
      Value: sitecore\admin
    - ID: 87871ff5-1965-46d6-884f-01d6a0b9c4c1
      # Description
      Value: <p>The heading of a page, above any content renderings.</p>

Notice how the fields’ names - nonessential data - are in YAML comments. Present for humans to read, not necessary for the machine. The Allowed Controls TreelistEx field is also an example of the YAML multiline format, starting with the pipe and the data on a newline, indented further. YAML uses significant whitespace to define structure, making it easy to read and also not requiring hacks like content-length to efficiently parse.

You may notice that only the Allowed Controls field has a field type value. This is because the value was formatted with a FieldFormatter, so when deserializing the value it uses the type to figure out which formatter to “unformat” the value back into Sitecore with.

The way that languages are structured is also slightly different here: each language has a top level list item, but versions are all grouped hierarchically under their language. So in this format, language is a construct aside from “da-DK #1” like the Sitecore database uses.

We’ve also used field level ignores to not even store the constantly changing statistics fields (e.g. __Updated). This is optional, if you choose to do so, but it does lead to wonderfully compact files on disk.

The YAML format is also completely endline ignorant - it doesn’t care if it gets \n or \r\n.

Yes, please?

Well sorry - it’s not quite done yet, though the YAML part is pretty stable. The YAML format is a component of my upcoming Rainbow library. Rainbow is essentially a modernized serialization API that aggressively improves both the serialization format and the storage hierarchy, plus providing deep item comparison capabilities.

Rainbow aims to only be an API and has no default configuration or frontend. It will be freely available to use.

I have several projects that I intend to use Rainbow for - maybe you can think of some uses too?

Unicorn 3.0, being developed alongside it, will use Rainbow for storage, comparison, and formatting needs.
Rainbow for SPE will enable serializing and deserializing items in YAML using Sitecore Powershell Extensions

At present, Rainbow is alpha-quality. I wouldn’t use it unless you want to get your hands dirty, and I might change APIs as needed.

In the next post of this series, I’ll go over thinking behind improvements to the standard storage hierarchy to enable infinite depth while still being human readable. But there are still some bugs to fix before I write that ;)

2014

Now available: Unicorn 2

March 28, 2014

Unicorn

I’m happy to announce that the second major version of Unicorn has been released. Unicorn is a free and open source tool to automatically synchronize item changes between Sitecore instances. If you’re new to Unicorn, the README describes what it is in more detail.

Unicorn, like many projects, evolved out of a proof of concept. Like most proof of concept projects, Unicorn suffered from a lack of flexibility because it did just enough to prove that it worked. Almost as soon as the first public version was released, work started on the refactoring that would become Unicorn 2. This second major revision of Unicorn is the result of more than 100 hours of work. Both code and UX have been refactored, decoupled, fixed, and improved. Many of the improvements are due to the input of the Sitecore community. Thank you to the folks who have sent issues, questions, and chatted about Unicorn since it was released. You all are awesome.

What’s new in Unicorn 2

More reliable change detection. Instead of detecting changes using event handlers, a data provider is now used. This makes Unicorn immune to EventDisabler and thus missing changes to content.
Support for multiple configurations. A configuration is a set of dependencies, such as a predicate, serialization provider, and evaluator. These allow you to configure sets of content to serialize differently.
A built-in control panel that walks you through common configuration issues and initial setup, and allows executing sync and reserialize operations once setup.
Greatly improved logging. All operations are logged to the Sitecore logs, and the formatting has been streamlined to be terse and relevant.
Consistency checking during a sync that finds common merge errors and flags them as errors.
Better item comparison. All item fields are now compared when deciding whether to update an item, whereas v1 only evaluated the updated and revision fields.
Field-level exclusion. You can now have certain fields be ignored when checking for updates or deserializing an item.
Supports syncing media items (such as rendering thumbnails for page editor)
Tons of bug fixes for unusual situations (for example, moving and copying items between included and not included paths)
Compatible with Sitecore 6.5 and later. The original version was for Sitecore 7 only.

How do I get it and install it?

Upgrading from 1.x? Read this
You’ll need Sitecore 6.5.0 (121009) or later. Tested with Sitecore through 7.1 Update-1.
Install Unicorn. This is as simple as adding the Unicorn NuGet package to your project.
Configure what to serialize in the example configuration’s Predicate registration. There will be an App_Config/Include/Serialization.config file installed, which has a commented example of this syntax.
Run a build in Visual Studio to make sure the output files are up to date.
Visit $yoursite/unicorn.aspx and it will walk you through initial serialization. This will take the preset you configured and serialize all of the included items in it to disk.
- NOTE: make sure to serialize an authoritative database with all items present. Other databases will be made to look just like this one when sync occurs.
- NOTE: if you’re using Git, you need to make sure that Git doesn’t fool with the line endings of your serialized files. Add *.item -text to a .gitattributes file in the repo root. See this blog post for details.
Commit your serialized items to source control.

If you want to install Unicorn from source, the procedure is quite easy. The README has directions for this.

Neat things you can do with Unicorn

Code Review and Branching

Recently I’ve started doing a lot more code review using GitHub pull requests. Unicorn makes Sitecore code review very easy - all you have to do is switch to the branch under review and sync Unicorn. Now your Sitecore has all the templates and renderings from that branch. When you’re done you checkout the original branch and sync again - and you’re back where you started. Very convenient.

Syncing test content between developers

If your site’s information architecture allows it, you can include a path under the site’s home item, such as ‘Samples’ to be synced using Unicorn. This content can be used by developers to create examples of content types under development to share for QA. The same technique can also be used to share rendering thumbnails in the media library.

Integrating with continuous integration

Unicorn is very simple to run using any CI server or script that can make HTTP requests. Once that is set up, your integration server - or even live - will never have outdated templates again. Example scripts for this can be found here

Integrate with deployment tools

Several people have integrated Unicorn with tools such as Sitecore Courier and Sitecore.Ship to create update packages to deploy to remote environments.

What’s new in the backend

The largest change in the backend is having it be a modular dependency-based system. Unlike v1 which had issues like the inability to use anything but the default preset, v2 allows you to reconfigure nearly all aspects of the system by changing the classes registered in the Serialization.config file. Multiple configurations also allow for multiple configurations of the dependencies, so you can have more than one implementation of these extension points in any given project. This allows you to not only have different behavior per configuration but also to sync configurations separately if you wish to.

There is a rather long description of customization in the repository’s README file on GitHub if you’re interested.

Should I use this now?

That’s up to you. I am using it for my daily team development tasks, and consider it to be generally more stable than v1 because of the safety improvements (data provider, syncing all fields, consistency checking, better log messages).

Hope you enjoy it, and as always you can get ahold of me on Twitter or GitHub if you run across issues or have ideas.

2013

Unicorn 1.0.4 Released

July 17, 2013

Unicorn

I’ve released Unicorn 1.0.4 to NuGet. This release fixes bugs that would cause serialization inconsistencies to occur if items having children under Unicorn’s control were renamed or moved (the paths stored in the moved children would still be the original path).

The solution is to re-serialize children after moving or renaming, which means Unicorn will not scale very well if you put a giant number of items under it and then move the parent of them all. But then it’s not designed for that in the first place - a template with 30-40 subitems should be tolerably quick for a relatively unusual operation.

This fix has been verified using Rhino.Fsck, a library for verifying the consistency of a serialized tree of items.

Rhino solves this issue slightly more elegantly by simply reading the items from disk, changing the path, and re-writing the SyncItem, but it’s also lower level. Eventually I’d like to deprecate Unicorn in favor of Rhino.

Unicorn 1.0.3 Released

June 21, 2013

Unicorn

I’ve pushed Unicorn 1.0.3 to NuGet. This release fixes a number of bugs, improves logging, and removes some annoyances:

Deleting a whole tree could cause the app pool to be killed due to a bug in Sitecore’s serialization event handler (380479) - this fix expands on a previous workaround to cover recursive deletes
Fixed an issue where if you changed an item too rapidly after saving or moving it, you could get file read errors or spurious “conflicts” reported due to the async ShadowWriter still being in process.
“Inconsequential” item saves are now ignored. For example, the Template Editor gleefully changes the revision and last updated date of every template field when it saves, not merely changed ones. Those, as well as rename events where the name does not actually change (again, Template Editor ftw), are ignored and updated files are not written to disk. These fields are also ignored when calculating serialization conflicts on item save.
Items deleted by Unicorn are now placed in the recycle bin instead of completely removed, so they can be restored if mistakenly deleted.
Unicorn sync logs are automatically written to the Sitecore log as well (requires dependency upgrade to Kamsar.WebConsole 1.2.2 for the new TeeProgressStatus)