Towards healing a fork (ab)using traits

By stan shepherd February 25th, 2010

Traits seem to offer intriguing possibilities for refactoring. One situation they hint at simplifying is that where large amounts of duplicate code crop up in Smalltalk; this can occur even where the code itself is attractively written. We tend naturally to find duplicate code off-putting; it’s not the Smalltalk way.
Because traits functionality is separate from the inheritance hierarchy, it appears to offer a way to factor out these duplicates without otherwise affecting the functionality provided by inheritance. There is a trade off to be made between the saving in duplicate code, compared with the new elements that need to be introduced- the traits themselves (for a discussion of this see this paper, or for something more robust this traits thread on Nabble).
A
recent discussion, about why the new  OmniBrowser had forked from the original,  offered a test case; it would be interesting to measure the code and understandability benefits vs the trade offs in attempting to ‘partially heal’ a fork using traits. The results offer some insights about trait usage. They also show a (to me) surprising result regarding the statelessness of traits.

Why it might be hard

Consider the situation where a package has been forked, and now two versions of a class exist:

PreForkedClass

#changeInstVar
date := Date today

PostForkedClass

#changeInstVar
date := Date today

We observe that the methods are still common, and hence candidates for merging.

We could move the common method into a trait:

THealedTrait

#changeInstVar
date := Date today

and our original classes now use the trait:

PreForkedClass
uses: THealedTrait

PostForkedClass
uses: THealedTrait

However, we now have a trait that is storing into a variable. So now we will run into the following check, that ensures traits are stateless:

TOBCmdMoveToTrait
#checkIfOK: compiledMethod

...

compiledMethod hasInstVarRef
ifTrue: [self cannotMoveError: 'The method to move refer to instance variables. Please, add accessors'
for: compiledMethod]

There is a discussion of the essentially stateless nature of traits in this paper (pdf),
and here a nice summary of the drawbacks of statelessness:

The incompleteness of traits results in a number of annoying limitations,
namely: (i) trait reusability is impacted because the required interface is typ-
ically cluttered with uninteresting required accessors, (ii) client classes are
forced to implement boilerplate glue code, (iii) the introduction of new state
in a trait propagates required accessors to all client classes, and (iv) public
accessors break encapsulation of the client class.

Because traits are stateless, state references need to be abstracted out before the trait is used. hence:

THealedTrait

#changeInstVar
self date: Date today

This then puts the onus back on the original classes to provide the accessors to the variable:

PreForkedClass

#date: aDate
date := aDate

#date
^date

Following this pattern, you provide accessors for each instance variable or class variable. You also need to pay attention to where the variable is defined in the hierarchy, where accessors may be overridden, etc. Refactoring tools don’t see a getter with lazy initialization as a getter, meaning that an automated process will likely create an extra accessor in these cases. The profusion of accessors can lead to pollution of the interface of the package we are trying to simplify, and runs counter to the benefit of introducing traits.

One approach to calming this profusion of methods is to delegate state for a trait to a helper class with this sole task. The scenario looks like:

PreForkedClass

#plug
^plug ifNil: [plug := HealedClassPlug new]

Now we can put all the accessors in the common trait:

THealedTrait

#date: aDate
self plug date: aDate

and the variable itself is in the plug class:

HealedClassPlug

#date: aDate
date := aDate

Now our original classes don’t have to define and maintain the accessors themselves- they need only provide a single variable being the plug, with its accessors.This becomes tidier, and can be worthwhile if the commonality between the forked versions is high.

Curiouser and curiouser

A slightly surprising discovery is that traits can be ‘tricked’ to handle state directly. If we remove the check, mentioned above, for variable references, the traits will compose and operate fine.

In this case, moving the method

#date: aDate
date := aDate

into its trait unchanged, date will continue to bind with the instance variable in the class using the trait. This allows for an enormous boost to the payoff of using traits here. For example, merging the (reluctantly) forked OmniBrowser2 and OmniBrowser, we end up with the following figures:

mutually matched classes 292
classes with all methods merged
(ie classes are now effectively identical,
except for inheritance)
64
merged methods 1374
methods different between packages 566
methods with no equivalent 479

When semi healed, over the combined OmniBrowser, OmniBrowser2, OB, O2 packages, the arithmetic is:

Forked Partially healed
classes methods classes methods
OmniBrowser 57 689 57 205
OmniBrowser2 57 724 57 240
HETestTraits 0 0 279 1379
OB 407 2895 407 1983
O2 424 3473 424 2572
Total 945 7781 1224 6379
delta 279 -1402

i.e. some 279 extra traits, and 1402 less methods.

Since there is some leakage in the counting (you might expect the reduction in methods to equal the number of methods in the shared traits), let us take the lower number of 1379 as the duplicate methods saved. Against this, we have the increase in traits of 279. So, is there a net benefit?

While the new traits are unfortunate, they are essentially static. That is, there is one trait for each pair of classes that correspond between forked and original packages. Once created, these traits will not need maintenance in themselves – it is their methods that will be maintained. Here is the payoff- there are nearly 1400 methods that can still be maintained jointly for the original and the forked projects. This scheme also means that the forks don’t diverge as quickly, for example each fixing the same bug in its own copy of a method. Over time this process of divergence makes the possibility of healing the fork recede. For example, in OB and O2, small bugs are fixed separately by the maintainers: Google code bug report

This might point to a practical application of traits in reducing forking. Suppose you want to extend a package, and subclassing doesn’t allow you to achieve what you want. Being aware that you will want to minimize the extra maintenance, even that you may want to merge later, you can create a new version of the package using traits. Then only methods that specifically need to be different are changed. The two packages can be jointly maintained in all but their deltas. And if a point is reached where the fork can be healed, only those parts not in the common traits will need attention. The common code can be trivially reintegrated.

Practical consideration – and does it work?.

Current browser support will not allow the ‘quasi-stateful traits’ to be maintained, as they contain references that the compiler doesn’t recognise:

browser showing unknown variable in quasi-stateful trait

browser showing unknown variable in quasi-stateful trait

We can demonstrate that this limitation need not stop our approach, by editing the method in a text editor and re-importing to our trait:

browser showing shared modified trait

shared modified trait

The reimported method correctly binds the instance variables in the separate browsers.

The partially merged image also passes the same full system tests as a fresh release candidate image plus OmniBrowser2, with the exception of:

self assert: Undeclared isEmpty

So the next time you want to prototype an alternative version of some functionality, and subclassing won’t do it for you, and most importantly  you’re feeling experimental, you could try an approach like this:

Original hierarchy

forking schematic start showing sample hierarchy

forking schematic start showing sample hierarchy

Copy hierarchy and extract all functionality to traits

forking scheme step 2 - dissect classes. There are two copies of the hierarchy, and all the methods have been moved out into corresponding traits. The original classes are skeletons

forking scheme step 2 - dissect classes

Now changes can be made to those parts only that need to be different

Forking scheme step 3 - devolved versions. Changed methods can be overrides, new methods separate to forked classes, or new common methods. The changes show up distinctly in blue

forking scheme devolved versions

Further work

A comparison of a conventional refactoring of these two packages to avoid duplication could be of great interest. To make this approach productive, some change to the toolset is necessary. The browser would need to accept a status of ‘requires a variable’ in the same way it now requires a method.
Before rewriting your mission-critical high speed trading system in this way, some rigour needs to be applied to ascertain whether it should work. For now, it is certainly safe to refactor your implementation of Pong this way (as long as you save your image first).
The reverse process, of reversing all the traits back into separate hierarchies, appears doable.

Ω If you have a short or longer term project involving financial modeling and / or refactoring, I will refactor for cash.

Please leave a comment and I’ll forward contact details

This entry was posted on Thursday, February 25th, 2010 at 3:02 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply