Akka: An Introduction

Akkas documentation is immense. This series helps tackle the many components by providing a working example of the master slave design pattern built with this powerful tool. The following article reviews the higher level concepts behind Akka and its usage.

Links are provided to different parts of the Akka documentation throughout the article.

Akka

Akka is a software tool  used to build multi-threaded and distributed systems based on the actor model. It takes care of lower level systems building by providing high level APIs for node and actor generation.

Actors are the primitives behind Akka. They are useful for performing repeated tasks concurrently.  Actors run until terminated, receiving work through message passing.

actorsys

Resource Usage

Akka is extremely lightweight. The creators boast that the tool can handle thousands of actors on a single machine.

Message passing occurs through mailboxes. The maximum number of messages a mailbox holds is configurable with a 1000 messages default but messages must be under one megabyte.

The Actor

The actor is the universal primitive used in Akka. Unlike when using threading in a program language, this primitive runs like a daemon server. As such, it should be shut down gracefully.

Actors are user created.

class MyActor extends Actor with ActorLogging{

     override def preStart()= log.debug("Starting")
     override def postStop()= log.debug("Stopping")
     override def preRestart(reason: Throwable, message: Option[Any]) = log.error(s"Restarting because of ${reason.message}. ${message}")     
     override def postRestart(reason : Throwable) = 

     override def receive():Receive={
         case _ => sender ! "Hello from Actor"
     }
}

object MyActor{
   def setupMyActor()={
        val conf = ConfigFactory.load()
        val system = ActorSystem("MySystem",conf)
        val actor : ActorRef = system.actorOf(Props[MyActor],name = "myactor") 
   }
}

 

The example above creates an actor and a Scala companion class for instantiation.

Actors must extend Actor. ActorLogging provides the log library. The optional functions preRestart and postRestart handle exceptions, while the optional preStart and postStop methods handle setup and tear down tasks. The basic actor above incorporates logging and error processing.

An actor can:

  • Create and supervise other actors
  • Maintain a State
  • Control the flow of work in a system
  • Perform a unit of work on request or repeatably
  • Send and receive messages
  • Return the results of a computation

Akka’s serialization is extremely powerful. Anything available across a cluster or on the classpath that implements Serializable can be sent to and from an actor. Instances of classes are de-serialized without having the programmer recast them.

When to Use Akka

Actor systems are not a universal solution. When not performing repeated tasks and not benefiting from high levels of concurrency, they are a hindrance.

State persistence also weighs heavily in the use of an actor system. Take the example of cookies in network requests. Maintaining different network sessions across different remote actors can be highly beneficial in ingestion.

Any task provided to an actor should contain very little new code and a limited number of configuration variables.

Questions that should be asked based on these concepts include:

  • Can I break up tasks into sufficiently large quantities of work to benefit from concurrency?
  • How often will  tasks be repeated in the system?
  • How minimal can I make the configuration for the actor if necessary?
  • Is there a need for state persistence?
  • Is there a significant need for concurrency or is it a nice thought?
  • Is there a resource constraint that distribution can solve or that will limit threading?

State is definitely a large reason to use Akka. This could be in the form of actually  maintaining variables or in the actor itself.

In some distributed use cases involving the processing of enormous numbers of short lived requests, the actors own state and Akka’s mailbox capabilities are what is most important. This is the reasoning behind tools built on Akka such as Spark.

As is always the case when deciding to create a new system, the following should be asked as well:

  • Can I use an existing tools such as Spark or Tensor Flow?
  • Does the time required to build the system outweigh the overall benefit it will provide?

Clustering

Clustering is available in Akka. Akka provides high level APIs for generating distributed clusters. Specified seed nodes handle communications, serving as the system’s entry point.

Network design is provided entirely by the developer. Since node generation, logging, fault tolerance, and basic communication are the only pieces of a distributed system Akka handle’s, any distribution model will suffice. Two common models are the master-slave and graph based models.

Akka clusters are resilient with well developed fault tolerance.

Configuration

Configuration is performed either through a file or in a class. Many components can be configured including logging levels, cluster components, metrics collection, and routers.

Conclusion

This article is the entry point for the Akka series, providing the basic understanding needed as we begin to build a cluster.

Sbt Pack With Xerial

Adding Jars to a classpath should not be a chore. Often, using retrieveManaged in an SBT build is not quite what we want. When dealing with more than a few minor dependencies, having each dependency placed in its own folder is problematic. This article discusses a solution to this issue using Xerial’s sbt-pack plugin.

 

Xerial Pack

Xerial offers a plugin that will package all jars in a single folder and create a bat for executing the configurable main class. This allows for every jar to be placed on the classpath without listing every folder. It also creates a single directory for all dependencies.

Simply place the following in project/plugins.sbt:

addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.8.2")  // for sbt-0.13.x or higher

Then specify the packaging options in build.sbt:

packAutoSettings

In this instance, the main class will be automatically found. More options are discussed at the Xerial Github page.

Packaging the jars requires a single command:

sbt pack

Conclusion

Using the classpath with many dependencies does not need to be a chore. Simply import the Xerial plugin and run sbt pack

Enriching Scala With Implicits

Imagine you are an ETL developer using Spring Cloud Data Flow. Nothing is really available for distributed systems and streaming ETL that is as powerful as this tool. Alteryx and Pentaho are at least a year away from pushing out anything as capable. While Pentaho might work, there are just too many holes to fill.

However, you could do with a more compact code language than Java when programming for Spring. A powerful solution is to combine the Spring ecosystem with Scala, using implicits to eliminate redundant code.

This article focuses on using the enrichment pattern in Scala code through the IterableLike library and the concept of the implicit.

 

clock

Time is Money

Implicits

Implicits allow code that is in scope to be utilized when variables are not defined. Only one implicit of a type may be defined within a class:

implicit val myImplicitString : String = "hello there"

def printHello(str : String)={
   println(str)//should print "hello there"
}

This class creates an implicit string and utilizes it in the method printHello. Having two implicit strings confuses the compiler and causes it to crash.

Implicits create the possibility of using the enrichment pattern as described in the next to last section by attaching an implicit definition to a library when a non-existing function is called.

Remap An Object

Our enrichment example contains a method that removes an item from an IterableLike object only when a specific condition holds:

class IterableScalaFuncs[A,Repr](xs : IterableLike[A,Repr]){

    /**
      * Remove an object by a single property value.
      * @param f              The thunk to use
      * @param cbf            The CanBuildFrom to get a builder which should not be touched
      * @tparam That          The result type
      * @return That which should just be of type A
    */
    def removeObjectMatching[That](f : A => Boolean)(implicit cbf : CanBuildFrom[Repr,A,That]):That={
        val builder = cbf(xs.repr)
        val it = xs.iterator

        while(it.hasNext){
            val o = it.next()
            if(!f(o)){
                builder += o
            }
        }
        builder.result()
    }

}

IterableLikeScalaFuncs contains a removeObjectMatching method that takes the result as a type parameter, the thunk to match with in the parameter list, and implicitly connects the existing CanBuildFrom from IterableLike in the net parameter list. It then creates a builder of our type and proceeds to populate it with objects not matching the thunk before returning a new collection with the appropriate items removed.

Enrichment Pattern

The enrichment pattern in Scala embellishes libraries by appending code to them as opposed to creating a wrapper class which must be instantiated. This requires implicitly attaching method definitions to existing libraries but allows Scala to be easily used from the console and in classes.

The class in the previous section can be attached to all IterableLike collections implicitly:

/**
  * The enrichment for Iterables wrapping IterablLike with IterableScalaFuncs
  * @param xs       Our IterableLike object
  * @tparam A       The type of the Iterable
  * @tparam Repr    Traversable Repr
*/
implicit def enrichIterable[A, Repr](xs: IterableLike[A, Repr]) = new IterableScalaFuncs(xs)

The method enrichIterable attaches to the target collections and is used as follows:

import ScalaFuncImplicits._
val list : List[(Int,Int)] = List[(Int,Int)]((1,2),(2,3)).removeObjectMatching(_.1 == 1) //should product List((2,3))

Conclusion

This article reviewed the power of Scala Implicits to reduce redundant code without accounting for every type. The enrichment pattern can be used to easily integrate methods with existing libraries.

Code is available at Github

A Quick Note on SBT Packaging: Packaging Library Dependencies, Skipping Tests, and Publishing Fat Jars

Sbt is a powerful build tool for recurring builds using in a nearly automated way with the tilde to simplifying the Maven build system.

There is the matter of slow build times with assembly. Avoiding slow builds with sbt assembly and automating the process with sbt ~ package is simple. However, using the output code requires having dependencies.

Even then, compiling a fat JAR for use across an organization may be necessary in some circumstances such as when using native libraries. This can avoid costly build problems.

This article reviews packaging dependencies with sbt assembly, offers tips for avoiding tests in all of sbt, and provides an overview of publishing fat JARS with assembly using the sbt commands publish and publish-local.

Related Articles:

Packaging Dependencies

Getting library dependencies to package alongside your jar is simple. Add the following code to build.sbt

retrieveManaged := true

The jars will be under root at /lib_managed.

Skipping Tests

If you want to avoid testing as well, add:

test in assembly := {} //your static test class array

This tells assembly not to run any tests by passing a dynamic structure with no class name elements.

Publishing An Assembled Fat JAR

There may be a need to publish a final fat JAR locally or to a remote repository such as Nexus. This may create a merge in other assemblies or with other dependencies so care should be exercised.

The following code, direcectly from the sbt Github repository, pushes an assembled jar to a repository:

artifact in (Compile, assembly) := {
  val art = (artifact in (Compile, assembly)).value
  art.copy(`classifier` = Some("assembly"))
}

addArtifact(artifact in (Compile, assembly), assembly)

The sbt publish and sbt publish-local commands should now push a fat JAR to the target repository.

Conclusion

Avoiding running assembly constantly and publishing fat JARS requires a simple change to build.sbt. With them, it is possible to obtain dependencies for running on the classpath, avoid running tests on every assembly, and publishing fat JARS.

Faster Compilation of Scala Jars

Compiling Scala code is a pain. For testing and even deployment, there is a much faster method to deploy Scala code. Consider a recent project I am working on. The addition of three large fat jars and OpenCV caused a compile time of roughly ten minutes. Obviously, that is not production quality. As much as it is fun to check how the president is causing stocks to roller coaster when he speaks or yell explitives around co-workers, it is probably not desirable.

This article reviews continuous packaging with sbt. The sbt assembly project is not covered here as in other articles in the series.

Related Articles:

Repeated Compiling and Packaging

Sbt offers an extremely easy way to continuously package sources. Use it.

sbt ~ package

The tilde works for any command and forces SBT to wait for a source change. For instance, ~ compile has the same effect as ~ package but runs the compile command instead of package.

This spins up a check for source file changes and packages sources whenever changes are discovered. Pressing enter will exit the process.

Running the Packaged Sources

We were using sbt assembly for standalone jars. In test, this is fast becoming a bad idea. Instead, place a large dependency jar or multiple smaller dependency jars in a folder and add them to the class path. The following code took compilation from 10 minutes to under one. Packaging takes mere seconds but a full update will take longer.

$ sbt clean
$ sbt package
$ java -cp ".:/path/to/my/jars/*" package.path.to.main.Main [arguments]

This set of commands cleans, packages, and then runs the Main class with your arguments.

This works miracles. The classpath is system dependent, a colon separates entries in Linux and a semi-colon separates entries in Windows.

It is possible to avoid building fat jars using this method. This greatly reduces packaging times which can grow to minutes. In combination with a continuous build process, it takes a fraction of the time to run Scala code as when using the assembly command.

Make Sure You Use the Right Jars

When moving to running with the classpath command and continual packaging in Scala, any merge strategy will disappear. To avoid this, try creating an all in one jar and only placing this in your classpath alongside any jar containing your main class.

Conclusion

This article reviewed a way to more efficiently package and run scala jars without waiting for assembly. The method here works well in test or in a well defined environment.

JavaCV Basics: Basic Image Processing

Here, we analyze some of the basic image processing tools in OpenCV and their use in GoatImage.

All code is available on GitHub under GoatImage. To fully understand this article, read the related articles and look at this code.

Select functions are exemplified here. In GoatImage, JavaDocs can be generated further explaining the functions. The functions explained are:

  • Sharpen
  • Contrast
  • Blur

Dilate, rotate, erode, min thresholding, and max thresholding are left to the code. Thresholding in OpenCV is described in depth with graphs and charts via the documentation.

Related Articles:

Basic Processing in Computer Vision

Basic processing is the key to successful recognition. Training sets come in a specific form. Pre-processing is usually required to ensure the accuracy and quality of a program. JavaCV and OpenCV are fast enough to work in a variety of circumstances to improve algorithmic performance at a much lower speed reduction cost. Each transform applied to an image takes time and memory but will pay off handsomely if done correctly.

Kernel Processing

Most of these functions are linear transformations. A linear transformation uses a function to map one matrix to another (Ax = b). In image processing, the matrix kernel is used to do this. Basically a weighted matrix can be used to map a certain point or pixel value.

For an overview of image processing kernels, see wikipedia.

Kernels may be generated in JavaCV.


  /**
    * Create a kernel from a double array (write large kernels more understandably)
    * @param kernelArray      The double array of doubles with the kernel values as signed ints
    * @return                 The kernel mat
    */
  def generateKernel(kernelArray: Array[Array[Int]]):Mat={
    val m = if(kernelArray != null) kernelArray.length else 0
    if(m == 0 ){
      throw new IllegalStateException("Your Kernel Array Must be Initialized with values")
    }

    if(kernelArray(0).length != m){
      throw new IllegalStateException("Your Kernel Array Must be Square and not sparse.")
    }

    val kernel = new Mat(m,m,CV_32F,new Scalar(0))
    val ki = kernel.createIndexer().asInstanceOf[FloatIndexer]

    for(i <- 0 until m){
      for(j <- 0 until m){
        ki.put(i,j,kernelArray(i)(j))
      }
    }
    kernel
  }

More reliably, there is a function for generating a Gaussian Kernel.

/**
    * Generate the square gaussian kernel. I think the pattern is a(0,0)=1 a(1,0) = n a(2,0) = n+2i with rows as a(2,1) = a(2,0) * n and adding two to the middle then subtracting.
    * However, there were only two examples on the page I found so do not use that without verification.
    *
    * @param kernelMN    The m and n for our kernel matrix
    * @param sigma       The sigma to multiply by (kernel standard deviation)
    * @return            The resulting kernel matrix
    */
  def generateGaussianKernel(kernelMN : Int, sigma : Double):Mat={
    getGaussianKernel(kernelMN,sigma)
  }

Sharpen with A Cutom Kernel

Applying a kernel in OpenCV can be done with the filter2D method.

filter2D(srcMat,outMat,srcMat.depth(),kernel)

Here a sharpening kernel using the function above is applied.

/**
    * Sharpen an image with a standard sharpening kernel.
    * @param image    The image to sharpen
    * @return         A new and sharper image
    */
  def sharpen(image : Image):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val karr : Array[Array[Int]] = Array[Array[Int]](Array(0,-1,0),Array(-1,5,-1),Array(0,-1,0))
    val kernel : Mat = this.generateKernel(karr)
    filter2D(srcMat,outMat,srcMat.depth(),kernel)
    new Image(new IplImage(outMat),image.name,image.itype)
  }

Contrast

Contrast kicks up the color intensity in images by equation, equalization, or based on neighboring pixels.

One form of Contrast applies a direct function to an image:

/**
    * Use an equation applied to the pixels to increase contrast. It appears that
    * the majority of the effect occurs from converting back and forth with a very
    * minor impact for the values. However, the impact is softer than with equalizing
    * histograms. Try sharpen as well. The kernel kicks up contrast around edges.
    *
    * (maxIntensity/phi)*(x/(maxIntensity/theta))**0.5
    *
    * @param image                The image to use
    * @param maxIntensity         The maximum intensity (numerator)
    * @param phi                  Phi value to use
    * @param theta                Theta value to use
    * @return
    */
  def contrast(image : Image, maxIntensity : Double, phi : Double = 0.5, theta : Double = 0.5):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val usrcMat = new Mat()
    val dest = new Mat(srcMat.rows(),srcMat.cols(),usrcMat.`type`())
    srcMat.convertTo(usrcMat,CV_32F,1/255.0,0)

    pow(usrcMat,0.5,dest)
    multiply(dest,(maxIntensity / phi))
    val fm = 1 / Math.pow(maxIntensity / theta,0.5)
    multiply(dest, fm)
    dest.convertTo(outMat,CV_8U,255.0,0)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Here the image is manipulated using matrix equations to form a new image where pixel intensities are improved for clarity.

Another form of contrast equalizes the image histogram:

/**
* A form of contrast based around equalizing image histograms.
*
* @param image The image to equalize
* @return A new Image
*/
def equalizeHistogram(image : Image):Image={
val srcMat = new Mat(image.image)
val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
equalizeHist(srcMat,outMat)
new Image(new IplImage(outMat),image.name,image.itype)
}

The JavaCV method equalizeHist is used here.

Blur

Blurring uses averaging to dull images.

Gaussian blurring uses a Gaussian derived kernel to blur. This kernel uses an averaging function as opposed to equal weighting of neighboring pixels.

 /**
    * Perform a Gaussian blur. The larger the kernel the more blurred the image will be.
    *
    * @param image              The image to use
    * @param degree             Strength of the blur
    * @param kernelMN           The kernel height and width should match (for instance 5x5)
    * @param sigma              The sigma to use in generating the matrix
    * @param depth              The depth to use
    * @param brightenFactor     A factor to brighten the result by with  0){
      outImage = this.brighten(outImage,brightenFactor)
    }
    outImage
  }

A box blur uses a straight kernel to blur, often weighting pixels equally.

 /**
    * Perform a box blur and return a new Image. Increasing the factor has a significant impact.
    * This algorithm tends to be overly powerful. It wiped the lines out of my test image.
    *
    * @param image   The Image object
    * @param depth   The depth to use with -1 as default corresponding to image.depth
    * @return        A new Image
    */
  def boxBlur(image : Image,factor: Int = 1,depth : Int = -1):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //build kernel
    val kernel : Mat = this.generateKernel(Array(Array(factor,factor,factor),Array(factor,factor,factor),Array(factor,factor,factor)))
    divide(kernel,9.0)

    //apply kernel
    filter2D(srcMat,outMat, depth, kernel)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Unsharp Masking

Once a blurred Mat is achieved, it is possible to perform an unsharp mask. The unsharp mask brings out certain features by subtracting the blurred image from the original while taking into account an aditional factor.

def unsharpMask(image : Image, kernelMN : Int = 3, sigma : Double = 60,alpha : Double = 1.5, beta : Double= -0.5,gamma : Double = 2.0,brightenFactor : Int = 0):Image={
    val srcMat : Mat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
    val retMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //using htese methods allows the matrix kernel size to grow
    GaussianBlur(srcMat,outMat,new Size(kernelMN,kernelMN),sigma)
    addWeighted(srcMat,alpha,outMat,beta,gamma,retMat)

    var outImage : Image = new Image(new IplImage(outMat),image.name,image.itype)

    if(brightenFactor > 0){
      outImage = this.brighten(outImage,brightenFactor)
    }

    outImage
  }

Conclusion

This article examined various image processing techniques.

JavaCV Basics: Splitting Objects

Here we put together functions from previous articles to describe a use case where objects are discovered in an image and rotated.

All code is available on GitHub under the GoatImage project.

Related Articles:

Why Split Objects

At times, objects need to be tracked reliably, OCR needs to be broken down to more manageable tasks, or there is another task requiring splitting and rotation. Particularly, recognition and other forms of statistical computing benefit from such standardization.

Splitting allows object by object recognition which may or may not improve accuracy depending on the data used to train an algorithm and even the type of algorithm used. Bayesian based networks, including RNNs, benefit from this task significantly.

Splitting and Rotating

The following function in GoatImage performs contouring to find objects, creates minimum area rect, and finally rotates objects based on their skew angle.

/**
    * Split an image using an existing contouring function. Take each RIO, rotate, and return new Images with the original,
    *
    * @param image              The image to split objects from
    * @param contourType        The contour type to use defaulting to CV_RETR_EXTERNAL
    * @param minBoxArea         Minumum box area to accept (-1 means everything and is default)
    * @param maxBoxArea         Maximum box area to accept (-1 means everything and is default)
    * @param show               Whether or not to show the image. Default is false.
    * @param xPosSort           Whether or not to sort the objects by their x position. Default is true. This is faster than a full sort
    * @return                   A tuple with the original Image and a List of split out Image objects named by the original_itemNumber
    */
  def splitObjects(image : Image, contourType : Int=  CV_RETR_LIST,minBoxArea : Int = -1, maxBoxArea : Int = -1, show : Boolean= false,xPosSort : Boolean = true):(Image,List[(Image,BoundingBox)])={
    //contour
    val imTup : (Image, List[BoundingBox]) = this.contour(image,contourType)

    var imObjs : List[(Image,BoundingBox)] = List[(Image,BoundingBox)]()

    var boxes : List[BoundingBox] = imTup._2

    //ensure that the boxes are sorted by x position
    if(xPosSort){
      boxes = boxes.sortBy(_.x1)
    }

    if(minBoxArea > 0){
        boxes = boxes.filter({x => (x.width * x.height) > minBoxArea})
    }

    if(maxBoxArea > 0){
      boxes = boxes.filter({x => (x.width * x.height) < maxBoxArea})
    }

    //get and rotate objects
    var idx : Int = 0
    for(box <-  boxes){
      println(box)
      val im = this.rotateImage(box.image,box.skewAngle)
      if(show){
        im.showImage(s"My Box ${idx}")
      }
      im.setName(im.getName().replaceAll("\\..*","")+s"_${idx}."+im.itype.toString.toLowerCase())
      imObjs = imObjs :+ (im,box)
      idx += 1
    }

    (image,imObjs)
  }

Contours are filtered after sorting if desired. For each box, rotation is performed and the resulting image returned as a new Image.

Conclusion

Here the splitObjects function of GoatImage is reviewed, revealing how the library and OpenCV splits and rotates objects as part of standardization for object recognition and OCR.