Avoiding Duplication Issues in SBT

It goes without saying that any series on sbt and sbt assembly needs to also have a small section on avoiding the dreaded deduplication issue.

This article reviews how to specify merging in sbt assembly as described on the sbt assembly Github page and examines the PathList for added depth.

Related Articles:

Merge Strategies

When building a fat JAR in sbt assembly, it is common to run into the following error:

[error] (*:assembly) deduplicate: different file contents found in the following:

This error proceeds a list of files with duplication issues.

The build.sbt file offers a way to avoid this error via the merge strategy. Using the error output, it is possible to choose an appropriate strategy to deal with duplication issues in assembly:

assemblyMergeStrategy in assembly := {
  case "Logger.scala" => MergeStrategy.first
  case "levels.scala" => MergeStrategy.first
  case "Tidier.scala" => MergeStrategy.first
  case "logback.xml" => MergeStrategy.first
  case "LogFilter.class" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "LogFilter" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "Logger" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "Tidier" => MergeStrategy.first
  case PathList(ps @ _*) if ps.last startsWith "FastDate" => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

In this instance. The first discovered file listed in the sbt error log is chosen. The PathList obtains the entire path with last choosing the last part of the path.

A file name may be matched directly.

PathList

Sbt merge makes use of the PathList. The full object is quite small:

object PathList {
  private val sysFileSep = System.getProperty("file.separator")
  def unapplySeq(path: String): Option[Seq[String]] = {
    val split = path.split(if (sysFileSep.equals( """\""")) """\\""" else sysFileSep)
    if (split.size == 0) None
    else Some(split.toList)
  }
}

This code utilizes the the specified system separator, “\” by default, to split a path. The return type is a List of strings.

List has some special Scala based properties. For instance, it is possible to search for anything under javax.servlet.* using:

PathList("javax", "servlet", xs @ _*) 

xs @_* searches for anything after the javax.servlet package.

Conclusion

This article reviews some basics of the merge strategy in sbt with a further explanation of the PathList.

A Quick Note on SBT Packaging: Packaging Library Dependencies, Skipping Tests, and Publishing Fat Jars

Sbt is a powerful build tool for recurring builds using in a nearly automated way with the tilde to simplifying the Maven build system.

There is the matter of slow build times with assembly. Avoiding slow builds with sbt assembly and automating the process with sbt ~ package is simple. However, using the output code requires having dependencies.

Even then, compiling a fat JAR for use across an organization may be necessary in some circumstances such as when using native libraries. This can avoid costly build problems.

This article reviews packaging dependencies with sbt assembly, offers tips for avoiding tests in all of sbt, and provides an overview of publishing fat JARS with assembly using the sbt commands publish and publish-local.

Related Articles:

Packaging Dependencies

Getting library dependencies to package alongside your jar is simple. Add the following code to build.sbt

retrieveManaged := true

The jars will be under root at /lib_managed.

Skipping Tests

If you want to avoid testing as well, add:

test in assembly := {} //your static test class array

This tells assembly not to run any tests by passing a dynamic structure with no class name elements.

Publishing An Assembled Fat JAR

There may be a need to publish a final fat JAR locally or to a remote repository such as Nexus. This may create a merge in other assemblies or with other dependencies so care should be exercised.

The following code, direcectly from the sbt Github repository, pushes an assembled jar to a repository:

artifact in (Compile, assembly) := {
  val art = (artifact in (Compile, assembly)).value
  art.copy(`classifier` = Some("assembly"))
}

addArtifact(artifact in (Compile, assembly), assembly)

The sbt publish and sbt publish-local commands should now push a fat JAR to the target repository.

Conclusion

Avoiding running assembly constantly and publishing fat JARS requires a simple change to build.sbt. With them, it is possible to obtain dependencies for running on the classpath, avoid running tests on every assembly, and publishing fat JARS.

An Introduction to Using Spring With Scala: A Positive View with Tips

Many ask why mix Spring and Scala. Why not?

Scala is a resilient language and the movement to the Scala foundation has only made it and its interaction with Java stronger. Scala reduces those many lines of Java clutter to fewer significantly more readable lines of elegant code with ease. It is faster than Python and yet acquiring the same serialization capability while already having a much better functional aspect.

Spring is the powerful go to too for any Java programmer, abstracting everything from web app security and email to wiring a backend. Together, Scala 2.12+ and Spring make a potent duo.

This article examines a few key traits for those using Scala 2.12+ with Spring 3+.

bean

Mr. Bean

Some of the Benefits of Mixing

Don’t recreate the wheel in a language that already uses your favorite Java libraries.

Scala mixed with Spring:

  • Elminate Lines of Java Code
  • Obtains more functional power than Javaslang
  • Makes use of most if not all of the functionality of Spring in Scala
  • Places powerful streamlined threading capacity in the hands of the programmer
  • Creates much broader serialization capacity than Java

This is a non-exhaustive list of benefits.

When Dependency Injection Matters

Dependency injection is useful in many situations. At my workplace, I have developed tools that reduce thousands of lines of code to a few configuration scripts using Scala and Spring.

Dependency injection is useful when writing large amounts of code hinders productivity. It may be less useful when speed is the primary concern.

Annotation Configs

Every annotation in Spring works with Scala. @Service, @Controller, and the remaining stereotypes, @Autowired and all of the popular annotations are useable.

Using them is the same in Java as in Scala.

@Service
class QAController{
   ....
}

Scala.Beans

Unfortunately, Scala does not create getters and setters for beans. It is therefore necessary to use the specialized @BeanProperty from Scala.Beans. This property cannot be attached to a private variable.

@BeanProperty
val conf: String = null

If generating a boolean getter and setter, @BooleanBeanProperty should be used.

@BooleanBeanProperty
val isWorking : Boolean = false

Scala’s Beans package contains other useful tools that give some power over configuration.

Autowiring

Autowiring does not require jumping through hoops. The underlying principal is the same as when using Java. It is only necessary to combine the @BeanProperty with @Autowired.

@BeanProperty
@Autowired(required = false)
val tableConf : TableConfigurator = null

Here, the autowired tableConf property is not required.

Configuration Classes

The XML context in Spring is slated for deprecation. To make code that will last, it is necessary to use a configuration class. Scala works seamlessly with the @Configuration component.

@Configuration
class AppConfig{
  @Bean
  def getAnomalyConfigurator():AnomalyConfigurator= {
    new AnomalyConfigurator {
      override val maxConcurrent: Int = 3
      override val errorAcceptanceCriteria: Double = 5.0
      override val columLevelStatsTable: String = "test"
      override val maxHardDifference: Int = 100
      override val schemaLevelStatsTable: String = "test"
      override val runId: Int = 1
    }
  }


  @Bean
  def getQAController():QAController={
    new QAController
  }
}

As with Java, the configuration generates beans. There is no difference in useage between the languages.

Create Classes without Defining Them

One of the more interesting features of Java is the ability to define a class as needed. Scala can do this as well.

new AnomalyConfigurator {
      ... //properties to use
}

This feature is useful when creating configuration classes whose traits take advantage of Spring.

Creating a Context

Creating contexts is the same in Scala as in Java. The difference is that classOf[] must be used in place of the .class property to obtain class names.

val context: AnnotationConfigApplicationContext = new AnnotationConfigApplicationContext()
context.register(classOf[AppConfig])
context.refresh()
val controller: QAController = context.getBean(classOf[QAController])

Conclusion

Scala 2.12+ works seamlessly with Spring. Requisite annotations and tools are all available in the language which compiles to comparable Java byte code.

Code for this article is available on Github

Faster Compilation of Scala Jars

Compiling Scala code is a pain. For testing and even deployment, there is a much faster method to deploy Scala code. Consider a recent project I am working on. The addition of three large fat jars and OpenCV caused a compile time of roughly ten minutes. Obviously, that is not production quality. As much as it is fun to check how the president is causing stocks to roller coaster when he speaks or yell explitives around co-workers, it is probably not desirable.

This article reviews continuous packaging with sbt. The sbt assembly project is not covered here as in other articles in the series.

Related Articles:

Repeated Compiling and Packaging

Sbt offers an extremely easy way to continuously package sources. Use it.

sbt ~ package

The tilde works for any command and forces SBT to wait for a source change. For instance, ~ compile has the same effect as ~ package but runs the compile command instead of package.

This spins up a check for source file changes and packages sources whenever changes are discovered. Pressing enter will exit the process.

Running the Packaged Sources

We were using sbt assembly for standalone jars. In test, this is fast becoming a bad idea. Instead, place a large dependency jar or multiple smaller dependency jars in a folder and add them to the class path. The following code took compilation from 10 minutes to under one. Packaging takes mere seconds but a full update will take longer.

$ sbt clean
$ sbt package
$ java -cp ".:/path/to/my/jars/*" package.path.to.main.Main [arguments]

This set of commands cleans, packages, and then runs the Main class with your arguments.

This works miracles. The classpath is system dependent, a colon separates entries in Linux and a semi-colon separates entries in Windows.

It is possible to avoid building fat jars using this method. This greatly reduces packaging times which can grow to minutes. In combination with a continuous build process, it takes a fraction of the time to run Scala code as when using the assembly command.

Make Sure You Use the Right Jars

When moving to running with the classpath command and continual packaging in Scala, any merge strategy will disappear. To avoid this, try creating an all in one jar and only placing this in your classpath alongside any jar containing your main class.

Conclusion

This article reviewed a way to more efficiently package and run scala jars without waiting for assembly. The method here works well in test or in a well defined environment.

JavaCV Basics: Basic Image Processing

Here, we analyze some of the basic image processing tools in OpenCV and their use in GoatImage.

All code is available on GitHub under GoatImage. To fully understand this article, read the related articles and look at this code.

Select functions are exemplified here. In GoatImage, JavaDocs can be generated further explaining the functions. The functions explained are:

  • Sharpen
  • Contrast
  • Blur

Dilate, rotate, erode, min thresholding, and max thresholding are left to the code. Thresholding in OpenCV is described in depth with graphs and charts via the documentation.

Related Articles:

Basic Processing in Computer Vision

Basic processing is the key to successful recognition. Training sets come in a specific form. Pre-processing is usually required to ensure the accuracy and quality of a program. JavaCV and OpenCV are fast enough to work in a variety of circumstances to improve algorithmic performance at a much lower speed reduction cost. Each transform applied to an image takes time and memory but will pay off handsomely if done correctly.

Kernel Processing

Most of these functions are linear transformations. A linear transformation uses a function to map one matrix to another (Ax = b). In image processing, the matrix kernel is used to do this. Basically a weighted matrix can be used to map a certain point or pixel value.

For an overview of image processing kernels, see wikipedia.

Kernels may be generated in JavaCV.


  /**
    * Create a kernel from a double array (write large kernels more understandably)
    * @param kernelArray      The double array of doubles with the kernel values as signed ints
    * @return                 The kernel mat
    */
  def generateKernel(kernelArray: Array[Array[Int]]):Mat={
    val m = if(kernelArray != null) kernelArray.length else 0
    if(m == 0 ){
      throw new IllegalStateException("Your Kernel Array Must be Initialized with values")
    }

    if(kernelArray(0).length != m){
      throw new IllegalStateException("Your Kernel Array Must be Square and not sparse.")
    }

    val kernel = new Mat(m,m,CV_32F,new Scalar(0))
    val ki = kernel.createIndexer().asInstanceOf[FloatIndexer]

    for(i <- 0 until m){
      for(j <- 0 until m){
        ki.put(i,j,kernelArray(i)(j))
      }
    }
    kernel
  }

More reliably, there is a function for generating a Gaussian Kernel.

/**
    * Generate the square gaussian kernel. I think the pattern is a(0,0)=1 a(1,0) = n a(2,0) = n+2i with rows as a(2,1) = a(2,0) * n and adding two to the middle then subtracting.
    * However, there were only two examples on the page I found so do not use that without verification.
    *
    * @param kernelMN    The m and n for our kernel matrix
    * @param sigma       The sigma to multiply by (kernel standard deviation)
    * @return            The resulting kernel matrix
    */
  def generateGaussianKernel(kernelMN : Int, sigma : Double):Mat={
    getGaussianKernel(kernelMN,sigma)
  }

Sharpen with A Cutom Kernel

Applying a kernel in OpenCV can be done with the filter2D method.

filter2D(srcMat,outMat,srcMat.depth(),kernel)

Here a sharpening kernel using the function above is applied.

/**
    * Sharpen an image with a standard sharpening kernel.
    * @param image    The image to sharpen
    * @return         A new and sharper image
    */
  def sharpen(image : Image):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val karr : Array[Array[Int]] = Array[Array[Int]](Array(0,-1,0),Array(-1,5,-1),Array(0,-1,0))
    val kernel : Mat = this.generateKernel(karr)
    filter2D(srcMat,outMat,srcMat.depth(),kernel)
    new Image(new IplImage(outMat),image.name,image.itype)
  }

Contrast

Contrast kicks up the color intensity in images by equation, equalization, or based on neighboring pixels.

One form of Contrast applies a direct function to an image:

/**
    * Use an equation applied to the pixels to increase contrast. It appears that
    * the majority of the effect occurs from converting back and forth with a very
    * minor impact for the values. However, the impact is softer than with equalizing
    * histograms. Try sharpen as well. The kernel kicks up contrast around edges.
    *
    * (maxIntensity/phi)*(x/(maxIntensity/theta))**0.5
    *
    * @param image                The image to use
    * @param maxIntensity         The maximum intensity (numerator)
    * @param phi                  Phi value to use
    * @param theta                Theta value to use
    * @return
    */
  def contrast(image : Image, maxIntensity : Double, phi : Double = 0.5, theta : Double = 0.5):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val usrcMat = new Mat()
    val dest = new Mat(srcMat.rows(),srcMat.cols(),usrcMat.`type`())
    srcMat.convertTo(usrcMat,CV_32F,1/255.0,0)

    pow(usrcMat,0.5,dest)
    multiply(dest,(maxIntensity / phi))
    val fm = 1 / Math.pow(maxIntensity / theta,0.5)
    multiply(dest, fm)
    dest.convertTo(outMat,CV_8U,255.0,0)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Here the image is manipulated using matrix equations to form a new image where pixel intensities are improved for clarity.

Another form of contrast equalizes the image histogram:

/**
* A form of contrast based around equalizing image histograms.
*
* @param image The image to equalize
* @return A new Image
*/
def equalizeHistogram(image : Image):Image={
val srcMat = new Mat(image.image)
val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
equalizeHist(srcMat,outMat)
new Image(new IplImage(outMat),image.name,image.itype)
}

The JavaCV method equalizeHist is used here.

Blur

Blurring uses averaging to dull images.

Gaussian blurring uses a Gaussian derived kernel to blur. This kernel uses an averaging function as opposed to equal weighting of neighboring pixels.

 /**
    * Perform a Gaussian blur. The larger the kernel the more blurred the image will be.
    *
    * @param image              The image to use
    * @param degree             Strength of the blur
    * @param kernelMN           The kernel height and width should match (for instance 5x5)
    * @param sigma              The sigma to use in generating the matrix
    * @param depth              The depth to use
    * @param brightenFactor     A factor to brighten the result by with  0){
      outImage = this.brighten(outImage,brightenFactor)
    }
    outImage
  }

A box blur uses a straight kernel to blur, often weighting pixels equally.

 /**
    * Perform a box blur and return a new Image. Increasing the factor has a significant impact.
    * This algorithm tends to be overly powerful. It wiped the lines out of my test image.
    *
    * @param image   The Image object
    * @param depth   The depth to use with -1 as default corresponding to image.depth
    * @return        A new Image
    */
  def boxBlur(image : Image,factor: Int = 1,depth : Int = -1):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //build kernel
    val kernel : Mat = this.generateKernel(Array(Array(factor,factor,factor),Array(factor,factor,factor),Array(factor,factor,factor)))
    divide(kernel,9.0)

    //apply kernel
    filter2D(srcMat,outMat, depth, kernel)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Unsharp Masking

Once a blurred Mat is achieved, it is possible to perform an unsharp mask. The unsharp mask brings out certain features by subtracting the blurred image from the original while taking into account an aditional factor.

def unsharpMask(image : Image, kernelMN : Int = 3, sigma : Double = 60,alpha : Double = 1.5, beta : Double= -0.5,gamma : Double = 2.0,brightenFactor : Int = 0):Image={
    val srcMat : Mat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
    val retMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //using htese methods allows the matrix kernel size to grow
    GaussianBlur(srcMat,outMat,new Size(kernelMN,kernelMN),sigma)
    addWeighted(srcMat,alpha,outMat,beta,gamma,retMat)

    var outImage : Image = new Image(new IplImage(outMat),image.name,image.itype)

    if(brightenFactor > 0){
      outImage = this.brighten(outImage,brightenFactor)
    }

    outImage
  }

Conclusion

This article examined various image processing techniques.

JavaCV Basics: Splitting Objects

Here we put together functions from previous articles to describe a use case where objects are discovered in an image and rotated.

All code is available on GitHub under the GoatImage project.

Related Articles:

Why Split Objects

At times, objects need to be tracked reliably, OCR needs to be broken down to more manageable tasks, or there is another task requiring splitting and rotation. Particularly, recognition and other forms of statistical computing benefit from such standardization.

Splitting allows object by object recognition which may or may not improve accuracy depending on the data used to train an algorithm and even the type of algorithm used. Bayesian based networks, including RNNs, benefit from this task significantly.

Splitting and Rotating

The following function in GoatImage performs contouring to find objects, creates minimum area rect, and finally rotates objects based on their skew angle.

/**
    * Split an image using an existing contouring function. Take each RIO, rotate, and return new Images with the original,
    *
    * @param image              The image to split objects from
    * @param contourType        The contour type to use defaulting to CV_RETR_EXTERNAL
    * @param minBoxArea         Minumum box area to accept (-1 means everything and is default)
    * @param maxBoxArea         Maximum box area to accept (-1 means everything and is default)
    * @param show               Whether or not to show the image. Default is false.
    * @param xPosSort           Whether or not to sort the objects by their x position. Default is true. This is faster than a full sort
    * @return                   A tuple with the original Image and a List of split out Image objects named by the original_itemNumber
    */
  def splitObjects(image : Image, contourType : Int=  CV_RETR_LIST,minBoxArea : Int = -1, maxBoxArea : Int = -1, show : Boolean= false,xPosSort : Boolean = true):(Image,List[(Image,BoundingBox)])={
    //contour
    val imTup : (Image, List[BoundingBox]) = this.contour(image,contourType)

    var imObjs : List[(Image,BoundingBox)] = List[(Image,BoundingBox)]()

    var boxes : List[BoundingBox] = imTup._2

    //ensure that the boxes are sorted by x position
    if(xPosSort){
      boxes = boxes.sortBy(_.x1)
    }

    if(minBoxArea > 0){
        boxes = boxes.filter({x => (x.width * x.height) > minBoxArea})
    }

    if(maxBoxArea > 0){
      boxes = boxes.filter({x => (x.width * x.height) < maxBoxArea})
    }

    //get and rotate objects
    var idx : Int = 0
    for(box <-  boxes){
      println(box)
      val im = this.rotateImage(box.image,box.skewAngle)
      if(show){
        im.showImage(s"My Box ${idx}")
      }
      im.setName(im.getName().replaceAll("\\..*","")+s"_${idx}."+im.itype.toString.toLowerCase())
      imObjs = imObjs :+ (im,box)
      idx += 1
    }

    (image,imObjs)
  }

Contours are filtered after sorting if desired. For each box, rotation is performed and the resulting image returned as a new Image.

Conclusion

Here the splitObjects function of GoatImage is reviewed, revealing how the library and OpenCV splits and rotates objects as part of standardization for object recognition and OCR.

JavaCV Basics: Cropping

The ROI code is  broken on the JavaCV example site. Here we will look at cropping an image by defining a region of interest. The remaining JavaCV example code should work.

All code is available on GitHub under the GoatImage project.

Related Articles:

Defining an ROI

Setting a Region of Interest (ROI) requires using the cvSetImageROI function which takes an IplImages and a Rect representing the region of interest.

cvSetImageROI(image, rect)

Putting it all Together By Cropping

Cropping takes our ROI and generates a new image fairly directly.

/**
    * Crop an existing image.
    *
    * @param image      The image to crop
    * @param x          The starting x coordinate
    * @param y          The starting y coordinate
    * @param width      The width
    * @param height     The height
    * @return           A new Image
    */
  def crop(image : Image, x : Int, y : Int, width : Int, height : Int): Image={
    val rect = new CvRect(x,y,width,height)
    val uImage : IplImage = image.image.clone()
    cvSetImageROI(uImage, rect)
    new Image(cvCreateImage(cvGetSize(uImage),image.image.depth(),image.image.nChannels()),image.name,image.itype)
  }

Conclusion

Simple cropping was introduced to rectify an issue with the ROI example from JavaCV.