Home » JVM Languages » Scala » Refactoring with Kleisli Composition

About Yoav Abrahami

Yoav Abrahami
Yoav is the Chief Architect at Wix.com, working with developers and operations to build the company's future products as well as accelerating and improving development processes. Prior to joining Wix, Yoav was an Architect at Amdocs Cramer OSS division. Yoav holds a MS in Physics and a BS in Computer Science from the Tel Aviv University.

Refactoring with Kleisli Composition

For quite awhile we have been maintaining an application that processes XML and JSON data. Usually the maintenance consists of fixing defects and adding minor features, but sometimes it requires refactoring old code.

Consider, for example, a function that extracts an XML node by path:

import scala.xml.{Node => XmlNode}
 
def getByPath(path: List[String], root: XmlNode): Option[XmlNode] =
  path match {
    case name::names =>
      for {
        node1 <- root.child.find(_.label == name)
        node2 <- getByPath(names, node1)
      } yield node2
    case _ => Some(root)
  }

This function works fine, but requirements change and now we need to:

  • Extract nodes from JSON and other tree-like data structures, not only XML
  • Return a descriptive error message if the node is not found

This post explains how to refactor getByPath to meet the new requirements.

Refactoring with Kleisli Composition

Let’s factor out a piece of code that creates a function to extract a child node by name. We could name it createFunctionToExtractChildNodeByName, but let’s name it child for brevity.

val child: String => XmlNode => Option[XmlNode] =
  name => node => node.child.find(_.label == name)

Now we can make the key observation: our getByPath is a sequential composition of functions that extract child nodes.The code below shows an implementation of this composition:

def compose(getChildA: XmlNode => Option[XmlNode],
            getChildB: XmlNode => Option[XmlNode]): XmlNode => Option[XmlNode] =
  node => for {
            a  <- getChildA(node)
            ab <- getChildB(a)
          } yield ab

Fortunately, the Scalaz library provides a more generic way to compose function A => M[A], where M is a monad. The library defines Kleisli[M, A, B]: a wrapper for A => M[B], which has method >=> to chain the Kleisli wrappers in the same way as andThen chains regular functions. We will call this chain Kleisli composition. The code below provides a composition example:

val getChildA: XmlNode => Option[XmlNode] = child(“a”)
val getChildB: XmlNode => Option[XmlNode] = child(“b”)
 
import scalaz._, Scalaz._
 
val getChildAB: Kleisli[Option, XmlNode, XmlNode] =
  Kleisli(getChildA) >=> Kleisli(getChildB)

Note the point-free style we are using here. It is very common for functional programmers to write functions as a composition of other functions, never mentioning the actual arguments they will be applied to.

The Kleisli composition is exactly what we need to implement our getByPath as the composition of functions extracting child nodes.

import scalaz._, Scalaz._
 
def getByPath(path: List[String], root: XmlNode): Option[XmlNode] =
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[Option, XmlNode]) {_ >=> _}
    .run(root)

Note the using of Kleisli.ask[Option, XmlNode] as the neutral element of the fold. We need this neutral element to handle a special case when path is Nil. Kleisli.ask[Option, XmlNode] is just an alias of a function from any node to Some(node).

Abstracting over XmlNode

Let’s generalize our solution and abstract it over XmlNode. We can rewrite it as the following generic function:

def getByPathGeneric[A](child: String => A => Option[A])
                       (path: List[String], root: A): Option[A] =
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[Option, A]) {_ >=> _}
    .run(root)

Now we can reuse this generic function to extract a node from JSON (we use json4s here):

import org.json4s._
 
def getByPath(path: List[String], root: JValue): Option[JValue] = {
  val child: String => JValue => Option[JValue] = name => json =>
    json match {
      case JObject(obj) => obj collectFirst {case (k, v) if k == name => v}
      case _ => None
    }
  getByPathGeneric(child)(path, root)
}

Note that we wrote a new function, child: JValue => Option[JValue], to handle JSON instead of XML, but getByPathGeneric remains unmodified and handles both XML and JSON.

Abstracting over Option

We can generalize getByPathGeneric even further and abstract it over Option with Scalaz, which provides an instance of scalaz.Monad[Option]. So we can rewrite getByPathGeneric as follows:

import scalaz._, Scalaz._
 
def getByPathGeneric[M[_]: Monad, A](child: String => A => M[A])
                                    (path: List[String], root: A): M[A]=
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[M, A]) {_ >=> _}
    .run(root)

Now we can implement our original getByPath with getByPathGeneric:

def getByPath(path: List[String], root: XmlNode): Option[XmlNode] = {
  val child: String => XmlNode => Option[XmlNode] = name => node =>
    node.child.find(_.label == name)
  getByPathGeneric(child)(path, root)
}

Next we can reuse getByPathGeneric to return an error message if the node is not found.

To do this, we will use scalaz.\/ (aka disjunction), which is a monadic right-biased version of scala.Either. On top of that, Scalaz provides implicit class OptionOps with method toRightDisjunction[B](b: B), which converts Option[A] to scalaz.B\/A so that Some(a) becomes Right(a) and None becomes Left(b). You can find more info about \/ in other blogs.

Thus we can write a function, which reuses getByPathGeneric, to return an error message instead of None if the node is not found:

type Result[A] = String\/A
 
def getResultByPath(path: List[String], root: XmlNode): Result[XmlNode] = {
  val child: String => XmlNode => Result[XmlNode] = name => node =>
    node.child.find(_.label == name).toRightDisjunction(s"$name not found")
  getByPathGeneric(child)(path, root)
}

Conclusion

The original getByPath function handled only XML data and returned None if the node was not found. We also needed it to handle JSON and return a descriptive message instead of None.

We have seen how using Kleisli composition provided by Scalaz can factor out the generic function getByPathGeneric, which we abstracted further using generics (in order to support JSON) and disjunction (to generalize over Option).

Reference: Refactoring with Kleisli Composition from our JCG partner Michael Dagaev at the Wix IO blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

 

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

 

and many more ....

 

 

One comment

  1. I have enjoyed your article, thank you. Perhaps it would be good to point out that the real benefit of using the Kleisli arrow comes only when one abstracts over the monad. All the rest (abstracting over Node etc.) could also have been done with the original recursive formulation of getByPath. Also, the factoring out of the “glue” code comes at a price. While the original recursive implementation would run in constant space (with tail recursion optimization), the other approach builds up the entire function chain in memory before applying it all at once, requiring space linear in the size of the input.

    I’m not sure about the motivation for disjunction. Why not just throw an exception? This is Scala, after all, not Haskell. The link to blog.evilmonkeylabs.com seems to be dead

    Once we have the fully general implementation, more possibilities come to mind: for example, it would now be possible not only to return a single node at a path, but even a list of nodes without changing the implementation (List is also a Monad, and path interpreted as shorthand for an XPath expression may match several elements). Or contents of directory folders. We can even think of the state transitions of a finite state machine in the same way. Even if you don’t use the general approach in practice (for any reason), just knowing it is there broadens the mind.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns