Akka: An Introduction

Akkas documentation is immense. This series helps tackle the many components by providing a working example of the master slave design pattern built with this powerful tool. The following article reviews the higher level concepts behind Akka and its usage.

Links are provided to different parts of the Akka documentation throughout the article.

Akka

Akka is a software tool  used to build multi-threaded and distributed systems based on the actor model. It takes care of lower level systems building by providing high level APIs for node and actor generation.

Actors are the primitives behind Akka. They are useful for performing repeated tasks concurrently.  Actors run until terminated, receiving work through message passing.

actorsys

Resource Usage

Akka is extremely lightweight. The creators boast that the tool can handle thousands of actors on a single machine.

Message passing occurs through mailboxes. The maximum number of messages a mailbox holds is configurable with a 1000 messages default but messages must be under one megabyte.

The Actor

The actor is the universal primitive used in Akka. Unlike when using threading in a program language, this primitive runs like a daemon server. As such, it should be shut down gracefully.

Actors are user created.

class MyActor extends Actor with ActorLogging{

     override def preStart()= log.debug("Starting")
     override def postStop()= log.debug("Stopping")
     override def preRestart(reason: Throwable, message: Option[Any]) = log.error(s"Restarting because of ${reason.message}. ${message}")     
     override def postRestart(reason : Throwable) = 

     override def receive():Receive={
         case _ => sender ! "Hello from Actor"
     }
}

object MyActor{
   def setupMyActor()={
        val conf = ConfigFactory.load()
        val system = ActorSystem("MySystem",conf)
        val actor : ActorRef = system.actorOf(Props[MyActor],name = "myactor") 
   }
}

 

The example above creates an actor and a Scala companion class for instantiation.

Actors must extend Actor. ActorLogging provides the log library. The optional functions preRestart and postRestart handle exceptions, while the optional preStart and postStop methods handle setup and tear down tasks. The basic actor above incorporates logging and error processing.

An actor can:

  • Create and supervise other actors
  • Maintain a State
  • Control the flow of work in a system
  • Perform a unit of work on request or repeatably
  • Send and receive messages
  • Return the results of a computation

Akka’s serialization is extremely powerful. Anything available across a cluster or on the classpath that implements Serializable can be sent to and from an actor. Instances of classes are de-serialized without having the programmer recast them.

When to Use Akka

Actor systems are not a universal solution. When not performing repeated tasks and not benefiting from high levels of concurrency, they are a hindrance.

State persistence also weighs heavily in the use of an actor system. Take the example of cookies in network requests. Maintaining different network sessions across different remote actors can be highly beneficial in ingestion.

Any task provided to an actor should contain very little new code and a limited number of configuration variables.

Questions that should be asked based on these concepts include:

  • Can I break up tasks into sufficiently large quantities of work to benefit from concurrency?
  • How often will  tasks be repeated in the system?
  • How minimal can I make the configuration for the actor if necessary?
  • Is there a need for state persistence?
  • Is there a significant need for concurrency or is it a nice thought?
  • Is there a resource constraint that distribution can solve or that will limit threading?

State is definitely a large reason to use Akka. This could be in the form of actually  maintaining variables or in the actor itself.

In some distributed use cases involving the processing of enormous numbers of short lived requests, the actors own state and Akka’s mailbox capabilities are what is most important. This is the reasoning behind tools built on Akka such as Spark.

As is always the case when deciding to create a new system, the following should be asked as well:

  • Can I use an existing tools such as Spark or Tensor Flow?
  • Does the time required to build the system outweigh the overall benefit it will provide?

Clustering

Clustering is available in Akka. Akka provides high level APIs for generating distributed clusters. Specified seed nodes handle communications, serving as the system’s entry point.

Network design is provided entirely by the developer. Since node generation, logging, fault tolerance, and basic communication are the only pieces of a distributed system Akka handle’s, any distribution model will suffice. Two common models are the master-slave and graph based models.

Akka clusters are resilient with well developed fault tolerance.

Configuration

Configuration is performed either through a file or in a class. Many components can be configured including logging levels, cluster components, metrics collection, and routers.

Conclusion

This article is the entry point for the Akka series, providing the basic understanding needed as we begin to build a cluster.

Sbt Pack With Xerial

Adding Jars to a classpath should not be a chore. Often, using retrieveManaged in an SBT build is not quite what we want. When dealing with more than a few minor dependencies, having each dependency placed in its own folder is problematic. This article discusses a solution to this issue using Xerial’s sbt-pack plugin.

 

Xerial Pack

Xerial offers a plugin that will package all jars in a single folder and create a bat for executing the configurable main class. This allows for every jar to be placed on the classpath without listing every folder. It also creates a single directory for all dependencies.

Simply place the following in project/plugins.sbt:

addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.8.2")  // for sbt-0.13.x or higher

Then specify the packaging options in build.sbt:

packAutoSettings

In this instance, the main class will be automatically found. More options are discussed at the Xerial Github page.

Packaging the jars requires a single command:

sbt pack

Conclusion

Using the classpath with many dependencies does not need to be a chore. Simply import the Xerial plugin and run sbt pack

Enriching Scala With Implicits

Imagine you are an ETL developer using Spring Cloud Data Flow. Nothing is really available for distributed systems and streaming ETL that is as powerful as this tool. Alteryx and Pentaho are at least a year away from pushing out anything as capable. While Pentaho might work, there are just too many holes to fill.

However, you could do with a more compact code language than Java when programming for Spring. A powerful solution is to combine the Spring ecosystem with Scala, using implicits to eliminate redundant code.

This article focuses on using the enrichment pattern in Scala code through the IterableLike library and the concept of the implicit.

 

clock

Time is Money

Implicits

Implicits allow code that is in scope to be utilized when variables are not defined. Only one implicit of a type may be defined within a class:

implicit val myImplicitString : String = "hello there"

def printHello(str : String)={
   println(str)//should print "hello there"
}

This class creates an implicit string and utilizes it in the method printHello. Having two implicit strings confuses the compiler and causes it to crash.

Implicits create the possibility of using the enrichment pattern as described in the next to last section by attaching an implicit definition to a library when a non-existing function is called.

Remap An Object

Our enrichment example contains a method that removes an item from an IterableLike object only when a specific condition holds:

class IterableScalaFuncs[A,Repr](xs : IterableLike[A,Repr]){

    /**
      * Remove an object by a single property value.
      * @param f              The thunk to use
      * @param cbf            The CanBuildFrom to get a builder which should not be touched
      * @tparam That          The result type
      * @return That which should just be of type A
    */
    def removeObjectMatching[That](f : A => Boolean)(implicit cbf : CanBuildFrom[Repr,A,That]):That={
        val builder = cbf(xs.repr)
        val it = xs.iterator

        while(it.hasNext){
            val o = it.next()
            if(!f(o)){
                builder += o
            }
        }
        builder.result()
    }

}

IterableLikeScalaFuncs contains a removeObjectMatching method that takes the result as a type parameter, the thunk to match with in the parameter list, and implicitly connects the existing CanBuildFrom from IterableLike in the net parameter list. It then creates a builder of our type and proceeds to populate it with objects not matching the thunk before returning a new collection with the appropriate items removed.

Enrichment Pattern

The enrichment pattern in Scala embellishes libraries by appending code to them as opposed to creating a wrapper class which must be instantiated. This requires implicitly attaching method definitions to existing libraries but allows Scala to be easily used from the console and in classes.

The class in the previous section can be attached to all IterableLike collections implicitly:

/**
  * The enrichment for Iterables wrapping IterablLike with IterableScalaFuncs
  * @param xs       Our IterableLike object
  * @tparam A       The type of the Iterable
  * @tparam Repr    Traversable Repr
*/
implicit def enrichIterable[A, Repr](xs: IterableLike[A, Repr]) = new IterableScalaFuncs(xs)

The method enrichIterable attaches to the target collections and is used as follows:

import ScalaFuncImplicits._
val list : List[(Int,Int)] = List[(Int,Int)]((1,2),(2,3)).removeObjectMatching(_.1 == 1) //should product List((2,3))

Conclusion

This article reviewed the power of Scala Implicits to reduce redundant code without accounting for every type. The enrichment pattern can be used to easily integrate methods with existing libraries.

Code is available at Github

Avoiding Duplication Issues in SBT

It goes without saying that any series on sbt and sbt assembly needs to also have a small section on avoiding the dreaded deduplication issue.

This article reviews how to specify merging in sbt assembly as described on the sbt assembly Github page and examines the PathList for added depth.

Related Articles:

Merge Strategies

When building a fat JAR in sbt assembly, it is common to run into the following error:

[error] (*:assembly) deduplicate: different file contents found in the following:

This error proceeds a list of files with duplication issues.

The build.sbt file offers a way to avoid this error via the merge strategy. Using the error output, it is possible to choose an appropriate strategy to deal with duplication issues in assembly:

assemblyMergeStrategy in assembly := {
  case "Logger.scala" => MergeStrategy.first
  case "levels.scala" => MergeStrategy.first
  case "Tidier.scala" => MergeStrategy.first
  case "logback.xml" => MergeStrategy.first
  case "LogFilter.class" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "LogFilter" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "Logger" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "Tidier" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "FastDate" => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

In this instance. The first discovered file listed in the sbt error log is chosen. The PathList obtains the entire path with last choosing the last part of the path.

A file name may be matched directly.

PathList

Sbt merge makes use of the PathList. The full object is quite small:

object PathList {
  private val sysFileSep = System.getProperty("file.separator")
  def unapplySeq(path: String): Option[Seq[String]] = {
    val split = path.split(if (sysFileSep.equals( """\""")) """\\""" else sysFileSep)
    if (split.size == 0) None
    else Some(split.toList)
  }
}

This code utilizes the the specified system separator, “\” by default, to split a path. The return type is a List of strings.

List has some special Scala based properties. For instance, it is possible to search for anything under javax.servlet.* using:

PathList("javax", "servlet", xs @ _*) 

xs @_* searches for anything after the javax.servlet package.

Conclusion

This article reviews some basics of the merge strategy in sbt with a further explanation of the PathList.

A Quick Note on SBT Packaging: Packaging Library Dependencies, Skipping Tests, and Publishing Fat Jars

Sbt is a powerful build tool for recurring builds using in a nearly automated way with the tilde to simplifying the Maven build system.

There is the matter of slow build times with assembly. Avoiding slow builds with sbt assembly and automating the process with sbt ~ package is simple. However, using the output code requires having dependencies.

Even then, compiling a fat JAR for use across an organization may be necessary in some circumstances such as when using native libraries. This can avoid costly build problems.

This article reviews packaging dependencies with sbt assembly, offers tips for avoiding tests in all of sbt, and provides an overview of publishing fat JARS with assembly using the sbt commands publish and publish-local.

Related Articles:

Packaging Dependencies

Getting library dependencies to package alongside your jar is simple. Add the following code to build.sbt

retrieveManaged := true

The jars will be under root at /lib_managed.

Skipping Tests

If you want to avoid testing as well, add:

test in assembly := {} //your static test class array

This tells assembly not to run any tests by passing a dynamic structure with no class name elements.

Publishing An Assembled Fat JAR

There may be a need to publish a final fat JAR locally or to a remote repository such as Nexus. This may create a merge in other assemblies or with other dependencies so care should be exercised.

The following code, direcectly from the sbt Github repository, pushes an assembled jar to a repository:

artifact in (Compile, assembly) := {
  val art = (artifact in (Compile, assembly)).value
  art.copy(`classifier` = Some("assembly"))
}

addArtifact(artifact in (Compile, assembly), assembly)

The sbt publish and sbt publish-local commands should now push a fat JAR to the target repository.

Conclusion

Avoiding running assembly constantly and publishing fat JARS requires a simple change to build.sbt. With them, it is possible to obtain dependencies for running on the classpath, avoid running tests on every assembly, and publishing fat JARS.

An Introduction to Using Spring With Scala: A Positive View with Tips

Many ask why mix Spring and Scala. Why not?

Scala is a resilient language and the movement to the Scala foundation has only made it and its interaction with Java stronger. Scala reduces those many lines of Java clutter to fewer significantly more readable lines of elegant code with ease. It is faster than Python and yet acquiring the same serialization capability while already having a much better functional aspect.

Spring is the powerful go to too for any Java programmer, abstracting everything from web app security and email to wiring a backend. Together, Scala 2.12+ and Spring make a potent duo.

This article examines a few key traits for those using Scala 2.12+ with Spring 3+.

bean

Mr. Bean

Some of the Benefits of Mixing

Don’t recreate the wheel in a language that already uses your favorite Java libraries.

Scala mixed with Spring:

  • Elminate Lines of Java Code
  • Obtains more functional power than Javaslang
  • Makes use of most if not all of the functionality of Spring in Scala
  • Places powerful streamlined threading capacity in the hands of the programmer
  • Creates much broader serialization capacity than Java

This is a non-exhaustive list of benefits.

When Dependency Injection Matters

Dependency injection is useful in many situations. At my workplace, I have developed tools that reduce thousands of lines of code to a few configuration scripts using Scala and Spring.

Dependency injection is useful when writing large amounts of code hinders productivity. It may be less useful when speed is the primary concern.

Annotation Configs

Every annotation in Spring works with Scala. @Service, @Controller, and the remaining stereotypes, @Autowired and all of the popular annotations are useable.

Using them is the same in Java as in Scala.

@Service
class QAController{
   ....
}

Scala.Beans

Unfortunately, Scala does not create getters and setters for beans. It is therefore necessary to use the specialized @BeanProperty from Scala.Beans. This property cannot be attached to a private variable.

@BeanProperty
val conf: String = null

If generating a boolean getter and setter, @BooleanBeanProperty should be used.

@BooleanBeanProperty
val isWorking : Boolean = false

Scala’s Beans package contains other useful tools that give some power over configuration.

Autowiring

Autowiring does not require jumping through hoops. The underlying principal is the same as when using Java. It is only necessary to combine the @BeanProperty with @Autowired.

@BeanProperty
@Autowired(required = false)
val tableConf : TableConfigurator = null

Here, the autowired tableConf property is not required.

Configuration Classes

The XML context in Spring is slated for deprecation. To make code that will last, it is necessary to use a configuration class. Scala works seamlessly with the @Configuration component.

@Configuration
class AppConfig{
  @Bean
  def getAnomalyConfigurator():AnomalyConfigurator= {
    new AnomalyConfigurator {
      override val maxConcurrent: Int = 3
      override val errorAcceptanceCriteria: Double = 5.0
      override val columLevelStatsTable: String = "test"
      override val maxHardDifference: Int = 100
      override val schemaLevelStatsTable: String = "test"
      override val runId: Int = 1
    }
  }


  @Bean
  def getQAController():QAController={
    new QAController
  }
}

As with Java, the configuration generates beans. There is no difference in useage between the languages.

Create Classes without Defining Them

One of the more interesting features of Java is the ability to define a class as needed. Scala can do this as well.

new AnomalyConfigurator {
      ... //properties to use
}

This feature is useful when creating configuration classes whose traits take advantage of Spring.

Creating a Context

Creating contexts is the same in Scala as in Java. The difference is that classOf[] must be used in place of the .class property to obtain class names.

val context: AnnotationConfigApplicationContext = new AnnotationConfigApplicationContext()
context.register(classOf[AppConfig])
context.refresh()
val controller: QAController = context.getBean(classOf[QAController])

Conclusion

Scala 2.12+ works seamlessly with Spring. Requisite annotations and tools are all available in the language which compiles to comparable Java byte code.

Code for this article is available on Github

Faster Compilation of Scala Jars

Compiling Scala code is a pain. For testing and even deployment, there is a much faster method to deploy Scala code. Consider a recent project I am working on. The addition of three large fat jars and OpenCV caused a compile time of roughly ten minutes. Obviously, that is not production quality. As much as it is fun to check how the president is causing stocks to roller coaster when he speaks or yell explitives around co-workers, it is probably not desirable.

This article reviews continuous packaging with sbt. The sbt assembly project is not covered here as in other articles in the series.

Related Articles:

Repeated Compiling and Packaging

Sbt offers an extremely easy way to continuously package sources. Use it.

sbt ~ package

The tilde works for any command and forces SBT to wait for a source change. For instance, ~ compile has the same effect as ~ package but runs the compile command instead of package.

This spins up a check for source file changes and packages sources whenever changes are discovered. Pressing enter will exit the process.

Running the Packaged Sources

We were using sbt assembly for standalone jars. In test, this is fast becoming a bad idea. Instead, place a large dependency jar or multiple smaller dependency jars in a folder and add them to the class path. The following code took compilation from 10 minutes to under one. Packaging takes mere seconds but a full update will take longer.

$ sbt clean
$ sbt package
$ java -cp ".:/path/to/my/jars/*" package.path.to.main.Main [arguments]

This set of commands cleans, packages, and then runs the Main class with your arguments.

This works miracles. The classpath is system dependent, a colon separates entries in Linux and a semi-colon separates entries in Windows.

It is possible to avoid building fat jars using this method. This greatly reduces packaging times which can grow to minutes. In combination with a continuous build process, it takes a fraction of the time to run Scala code as when using the assembly command.

Make Sure You Use the Right Jars

When moving to running with the classpath command and continual packaging in Scala, any merge strategy will disappear. To avoid this, try creating an all in one jar and only placing this in your classpath alongside any jar containing your main class.

Conclusion

This article reviewed a way to more efficiently package and run scala jars without waiting for assembly. The method here works well in test or in a well defined environment.