JavaCV Basics: Basic Image Processing

Here, we analyze some of the basic image processing tools in OpenCV and their use in GoatImage.

All code is available on GitHub under GoatImage. To fully understand this article, read the related articles and look at this code.

Select functions are exemplified here. In GoatImage, JavaDocs can be generated further explaining the functions. The functions explained are:

  • Sharpen
  • Contrast
  • Blur

Dilate, rotate, erode, min thresholding, and max thresholding are left to the code. Thresholding in OpenCV is described in depth with graphs and charts via the documentation.

Related Articles:

Basic Processing in Computer Vision

Basic processing is the key to successful recognition. Training sets come in a specific form. Pre-processing is usually required to ensure the accuracy and quality of a program. JavaCV and OpenCV are fast enough to work in a variety of circumstances to improve algorithmic performance at a much lower speed reduction cost. Each transform applied to an image takes time and memory but will pay off handsomely if done correctly.

Kernel Processing

Most of these functions are linear transformations. A linear transformation uses a function to map one matrix to another (Ax = b). In image processing, the matrix kernel is used to do this. Basically a weighted matrix can be used to map a certain point or pixel value.

For an overview of image processing kernels, see wikipedia.

Kernels may be generated in JavaCV.


  /**
    * Create a kernel from a double array (write large kernels more understandably)
    * @param kernelArray      The double array of doubles with the kernel values as signed ints
    * @return                 The kernel mat
    */
  def generateKernel(kernelArray: Array[Array[Int]]):Mat={
    val m = if(kernelArray != null) kernelArray.length else 0
    if(m == 0 ){
      throw new IllegalStateException("Your Kernel Array Must be Initialized with values")
    }

    if(kernelArray(0).length != m){
      throw new IllegalStateException("Your Kernel Array Must be Square and not sparse.")
    }

    val kernel = new Mat(m,m,CV_32F,new Scalar(0))
    val ki = kernel.createIndexer().asInstanceOf[FloatIndexer]

    for(i <- 0 until m){
      for(j <- 0 until m){
        ki.put(i,j,kernelArray(i)(j))
      }
    }
    kernel
  }

More reliably, there is a function for generating a Gaussian Kernel.

/**
    * Generate the square gaussian kernel. I think the pattern is a(0,0)=1 a(1,0) = n a(2,0) = n+2i with rows as a(2,1) = a(2,0) * n and adding two to the middle then subtracting.
    * However, there were only two examples on the page I found so do not use that without verification.
    *
    * @param kernelMN    The m and n for our kernel matrix
    * @param sigma       The sigma to multiply by (kernel standard deviation)
    * @return            The resulting kernel matrix
    */
  def generateGaussianKernel(kernelMN : Int, sigma : Double):Mat={
    getGaussianKernel(kernelMN,sigma)
  }

Sharpen with A Cutom Kernel

Applying a kernel in OpenCV can be done with the filter2D method.

filter2D(srcMat,outMat,srcMat.depth(),kernel)

Here a sharpening kernel using the function above is applied.

/**
    * Sharpen an image with a standard sharpening kernel.
    * @param image    The image to sharpen
    * @return         A new and sharper image
    */
  def sharpen(image : Image):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val karr : Array[Array[Int]] = Array[Array[Int]](Array(0,-1,0),Array(-1,5,-1),Array(0,-1,0))
    val kernel : Mat = this.generateKernel(karr)
    filter2D(srcMat,outMat,srcMat.depth(),kernel)
    new Image(new IplImage(outMat),image.name,image.itype)
  }

Contrast

Contrast kicks up the color intensity in images by equation, equalization, or based on neighboring pixels.

One form of Contrast applies a direct function to an image:

/**
    * Use an equation applied to the pixels to increase contrast. It appears that
    * the majority of the effect occurs from converting back and forth with a very
    * minor impact for the values. However, the impact is softer than with equalizing
    * histograms. Try sharpen as well. The kernel kicks up contrast around edges.
    *
    * (maxIntensity/phi)*(x/(maxIntensity/theta))**0.5
    *
    * @param image                The image to use
    * @param maxIntensity         The maximum intensity (numerator)
    * @param phi                  Phi value to use
    * @param theta                Theta value to use
    * @return
    */
  def contrast(image : Image, maxIntensity : Double, phi : Double = 0.5, theta : Double = 0.5):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    val usrcMat = new Mat()
    val dest = new Mat(srcMat.rows(),srcMat.cols(),usrcMat.`type`())
    srcMat.convertTo(usrcMat,CV_32F,1/255.0,0)

    pow(usrcMat,0.5,dest)
    multiply(dest,(maxIntensity / phi))
    val fm = 1 / Math.pow(maxIntensity / theta,0.5)
    multiply(dest, fm)
    dest.convertTo(outMat,CV_8U,255.0,0)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Here the image is manipulated using matrix equations to form a new image where pixel intensities are improved for clarity.

Another form of contrast equalizes the image histogram:

/**
* A form of contrast based around equalizing image histograms.
*
* @param image The image to equalize
* @return A new Image
*/
def equalizeHistogram(image : Image):Image={
val srcMat = new Mat(image.image)
val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
equalizeHist(srcMat,outMat)
new Image(new IplImage(outMat),image.name,image.itype)
}

The JavaCV method equalizeHist is used here.

Blur

Blurring uses averaging to dull images.

Gaussian blurring uses a Gaussian derived kernel to blur. This kernel uses an averaging function as opposed to equal weighting of neighboring pixels.

 /**
    * Perform a Gaussian blur. The larger the kernel the more blurred the image will be.
    *
    * @param image              The image to use
    * @param degree             Strength of the blur
    * @param kernelMN           The kernel height and width should match (for instance 5x5)
    * @param sigma              The sigma to use in generating the matrix
    * @param depth              The depth to use
    * @param brightenFactor     A factor to brighten the result by with  0){
      outImage = this.brighten(outImage,brightenFactor)
    }
    outImage
  }

A box blur uses a straight kernel to blur, often weighting pixels equally.

 /**
    * Perform a box blur and return a new Image. Increasing the factor has a significant impact.
    * This algorithm tends to be overly powerful. It wiped the lines out of my test image.
    *
    * @param image   The Image object
    * @param depth   The depth to use with -1 as default corresponding to image.depth
    * @return        A new Image
    */
  def boxBlur(image : Image,factor: Int = 1,depth : Int = -1):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //build kernel
    val kernel : Mat = this.generateKernel(Array(Array(factor,factor,factor),Array(factor,factor,factor),Array(factor,factor,factor)))
    divide(kernel,9.0)

    //apply kernel
    filter2D(srcMat,outMat, depth, kernel)

    new Image(new IplImage(outMat),image.name,image.itype)
  }

Unsharp Masking

Once a blurred Mat is achieved, it is possible to perform an unsharp mask. The unsharp mask brings out certain features by subtracting the blurred image from the original while taking into account an aditional factor.

def unsharpMask(image : Image, kernelMN : Int = 3, sigma : Double = 60,alpha : Double = 1.5, beta : Double= -0.5,gamma : Double = 2.0,brightenFactor : Int = 0):Image={
    val srcMat : Mat = new Mat(image.image)
    val outMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())
    val retMat = new Mat(srcMat.rows(),srcMat.cols(),srcMat.`type`())

    //using htese methods allows the matrix kernel size to grow
    GaussianBlur(srcMat,outMat,new Size(kernelMN,kernelMN),sigma)
    addWeighted(srcMat,alpha,outMat,beta,gamma,retMat)

    var outImage : Image = new Image(new IplImage(outMat),image.name,image.itype)

    if(brightenFactor > 0){
      outImage = this.brighten(outImage,brightenFactor)
    }

    outImage
  }

Conclusion

This article examined various image processing techniques.

Advertisements

JavaCV Basics: Splitting Objects

Here we put together functions from previous articles to describe a use case where objects are discovered in an image and rotated.

All code is available on GitHub under the GoatImage project.

Related Articles:

Why Split Objects

At times, objects need to be tracked reliably, OCR needs to be broken down to more manageable tasks, or there is another task requiring splitting and rotation. Particularly, recognition and other forms of statistical computing benefit from such standardization.

Splitting allows object by object recognition which may or may not improve accuracy depending on the data used to train an algorithm and even the type of algorithm used. Bayesian based networks, including RNNs, benefit from this task significantly.

Splitting and Rotating

The following function in GoatImage performs contouring to find objects, creates minimum area rect, and finally rotates objects based on their skew angle.

/**
    * Split an image using an existing contouring function. Take each RIO, rotate, and return new Images with the original,
    *
    * @param image              The image to split objects from
    * @param contourType        The contour type to use defaulting to CV_RETR_EXTERNAL
    * @param minBoxArea         Minumum box area to accept (-1 means everything and is default)
    * @param maxBoxArea         Maximum box area to accept (-1 means everything and is default)
    * @param show               Whether or not to show the image. Default is false.
    * @param xPosSort           Whether or not to sort the objects by their x position. Default is true. This is faster than a full sort
    * @return                   A tuple with the original Image and a List of split out Image objects named by the original_itemNumber
    */
  def splitObjects(image : Image, contourType : Int=  CV_RETR_LIST,minBoxArea : Int = -1, maxBoxArea : Int = -1, show : Boolean= false,xPosSort : Boolean = true):(Image,List[(Image,BoundingBox)])={
    //contour
    val imTup : (Image, List[BoundingBox]) = this.contour(image,contourType)

    var imObjs : List[(Image,BoundingBox)] = List[(Image,BoundingBox)]()

    var boxes : List[BoundingBox] = imTup._2

    //ensure that the boxes are sorted by x position
    if(xPosSort){
      boxes = boxes.sortBy(_.x1)
    }

    if(minBoxArea > 0){
        boxes = boxes.filter({x => (x.width * x.height) > minBoxArea})
    }

    if(maxBoxArea > 0){
      boxes = boxes.filter({x => (x.width * x.height) < maxBoxArea})
    }

    //get and rotate objects
    var idx : Int = 0
    for(box <-  boxes){
      println(box)
      val im = this.rotateImage(box.image,box.skewAngle)
      if(show){
        im.showImage(s"My Box ${idx}")
      }
      im.setName(im.getName().replaceAll("\\..*","")+s"_${idx}."+im.itype.toString.toLowerCase())
      imObjs = imObjs :+ (im,box)
      idx += 1
    }

    (image,imObjs)
  }

Contours are filtered after sorting if desired. For each box, rotation is performed and the resulting image returned as a new Image.

Conclusion

Here the splitObjects function of GoatImage is reviewed, revealing how the library and OpenCV splits and rotates objects as part of standardization for object recognition and OCR.

JavaCV Basics: Cropping

The ROI code is  broken on the JavaCV example site. Here we will look at cropping an image by defining a region of interest. The remaining JavaCV example code should work.

All code is available on GitHub under the GoatImage project.

Related Articles:

Defining an ROI

Setting a Region of Interest (ROI) requires using the cvSetImageROI function which takes an IplImages and a Rect representing the region of interest.

cvSetImageROI(image, rect)

Putting it all Together By Cropping

Cropping takes our ROI and generates a new image fairly directly.

/**
    * Crop an existing image.
    *
    * @param image      The image to crop
    * @param x          The starting x coordinate
    * @param y          The starting y coordinate
    * @param width      The width
    * @param height     The height
    * @return           A new Image
    */
  def crop(image : Image, x : Int, y : Int, width : Int, height : Int): Image={
    val rect = new CvRect(x,y,width,height)
    val uImage : IplImage = image.image.clone()
    cvSetImageROI(uImage, rect)
    new Image(cvCreateImage(cvGetSize(uImage),image.image.depth(),image.image.nChannels()),image.name,image.itype)
  }

Conclusion

Simple cropping was introduced to rectify an issue with the ROI example from JavaCV.

JavaCV Basics: Rotating

Rotating an image is a common task. This article reviews how to rotate a matrix using JavaCV.

These tutorials utilize GoatImage. The Image object used in the code examples comes from this library.

Related Articles:

Rotation Matrix

The rotation matrix  is used to map from one pixel position to another. The matrix, shown below, uses trigonometric functions.

rotation

Rotation is a linear transformation. A linear transformation uses a function to map from one matrix to another. In image processing, the matrix kernel is used to perform this mapping.

Rotation in JavaCV 3

Rotation in JavaCV 3 utilizes a generated rotation matrix and the warp affine function. The function getRotationMatrix2D generates a two dimensional matrix using a center Point2f, angle, and a scale.

/**
    * Rotate an image by a specified angle using an affine transformation.
    *
    * @param image      The image to rotate
    * @param angle      The angle to rotate by
    * @return           A rotated Image
    */
  def rotateImage(image : Image,angle : Double):Image={
    val srcMat = new Mat(image.image)
    val outMat = new Mat(srcMat.cols(),srcMat.rows(),srcMat.`type`())
    val cosv = Math.cos(angle)
    val sinv = Math.sin(angle)
    val width = image.image.width
    val height = image.image.height
    val cx = width/2
    val cy = height/2

    //(image.image.width*cosv + image.image.height*sinv, image.image.width*sinv + image.image.height*cosv);
    val rotMat : Mat = getRotationMatrix2D(new Point2f(cx.toInt,cy.toInt),angle,1)

    warpAffine(srcMat,outMat,rotMat,srcMat.size())
    new Image(new IplImage(outMat),image.name,image.itype)
  }

The Angle

The angle in OpenCV and thus JavaCV is centered at -45 degrees due to the use of vertical contouring. If an image is less than -45 degrees, adding 90 degrees will correct this offset.

val angle = if(minAreaRect.angle < -45.0) minAreaRect.angle + 90 else minAreaRect.angle

Conclusion

In this tutorial we reviewed the function in GoatImage for rotating images using OpenCv. The functions getRotationMatrix2D and warpAffine were introduced. Basic kernel processing was introduced as well.

JavaCV Basics: Contouring

Here, we look at contouring in JavaCV 3. Contouring discovers the boundaries in an image that stand out from the background. It is a key part of object tracking, rotation, and many other tasks. JavaCV and OpenCV allow for the creation of bounding boxes around objects discovered through contouring.

The tutorials utilizes GoatImage.  The Image object is from this library.

All code is available on GitHub under GoatImage.

Related Articles:

Contouring

Remember contouring in Calc3. The same principal is used in image processing. Contour lines have a constant value. In imaging the value can be of a certain intensity or color. This differs from contouring a shape which uses values such as those obtained from the derivative of an equation.

contour

JavaCV

JavaCV includes functions for finding contours. The MatVector is used to store the discovered contour lines. MatVector in JavaCV is used in place of the CvSeq from OpenCV.

 val chainType : Int = CHAIN_APPROX_SIMPLE //chaining explained below
 val contourType : Int = CV_RETR_LIST //return values described below 
 val contours = new MatVector()
 val hierarchy = new Mat()
 val contImage : IplImage = image.image.clone()
 findContours(new Mat(image.image.clone()), contours,contourType, chainType, new Point(0, 0))

Contour lines can be stored in the MatVector in several forms. These options are:

  • CV_RETR_EXTERNAL : Returns only external contours
  • CV_RETR_LIST : Returns a list of every contour including nested contours
  • CV_RETR_CCOMP : Returns contours organized by inner and outer contours
  • CV_RETR_TREE : Returns a hierarchical ordering of contours nested by tree level

Similarly, several forms of chaining are available with varying effects. Chaining defines the level of approximation used in estimating the points forming the contour line. Approximation types include:

  • CHAIN_APPROX_NONE : Store every point
  • CHAIN_APPROX_SIMPLE : Encode values by storing only the endpoints of an interval and coefficients for the line through the endpoints
  • CHAIN_APPROX_TC89_L1 : Use a variant of the Ten Chin algorithm
  • CHAIN_APPROX_TC89_KCOS : Another variant of Ten Chin

A paper is available describing the Ten Chin algorithm.

Draw Image Contours

JavaCV contains a method for drawing your contours. Specifically, drawContours may be used from opencv_imgproc.

/**
    * A test function that draws the contours found in the Image.
    * @param image
    * @param contourType
    * @return
    */
  def drawImageContours(image : Image, contourType : Int = CV_RETR_LIST):(Image,Image,Long)={
    val dstMat : Mat = new Mat(image.image.clone())
    val srcMat = new Mat(image.image)
    val storage : CvMemStorage =cvCreateMemStorage(0)
    val contours = new MatVector()
    val hierarchy = new Mat()
    findContours(new Mat(image.image.clone()), contours,contourType, CHAIN_APPROX_SIMPLE, new Point(0, 0))
    val colorDst = new Mat(srcMat.size(), CV_8UC3, new Scalar(0))
    drawContours(colorDst, contours, -1  , new Scalar(255,0,0,0))
    (image,new Image(new IplImage(colorDst),image.name,image.itype),contours.size())
  }

Bounding Boxes and Min Area Rectangles

In this tutorial, bounding boxes and minimum area rectangles are obtained through JavaCV specific functions.

The bounding box is an upright Rect. Rect is a JavaCV variant of the cvRect. The bottom left coordinate, width, height, and position are stored in the box. The center is calculable as x + width/2, y + height /2.

val idx : Int  = 0
boundingRect(contours.get(idx))

The minimum bounding rectangle is a Rotated Rect which stores points relative to the actual bounding rectangle. This object has angle and point attributes in addition to relative x and y values and width and height.

val idx : Int = 0
minAreaRect(contours.get(idx))

To get a bounding circle, use the circle function ported from OpenCV.

Putting it all Together

The following code obtains contours and a list of bounding boxes.

 /**
    * Contour and return the bounding boxes and the slope of the found box. Boxes that overlap
    * will be combined to make a much bigger box.
    *
    * @param image            The image to use in detection.
    * @param contourType      The type of contour to use by default grab only the external RETR_CCOMP returns and organizes inner and outer contours & RETR_LIST gives everything
    * @return           A tuple of the image and a list of tuples with bounding boxes and their bottom line slopes
    */
  def contour(image : Image, contourType : Int =  CV_RETR_LIST):(Image,List[BoundingBox])={
    var minRects : List[BoundingBox] = List[BoundingBox]()
    val total : Int = 0
    val storage : CvMemStorage =cvCreateMemStorage(0)
    val contours = new MatVector()
    val hierarchy = new Mat()
    val contImage : IplImage = image.image.clone()
    findContours(new Mat(image.image.clone()), contours,contourType, CHAIN_APPROX_SIMPLE, new Point(0, 0))

    for(idx <- 0 until contours.size().toInt) {
      val clMat = new Mat()
      new Mat(image.image).copyTo(clMat)
      val clImage : IplImage = new IplImage(clMat)
      val boundRect = minAreaRect(contours.get(idx))
      val bbx = boundRect.boundingRect()
      cvSetImageROI(clImage,cvRect(bbx.x(),bbx.y(),bbx.width(),bbx.height()))
      val cropped : IplImage = cvCreateImage(cvGetSize(clImage),clImage.depth(),clImage.nChannels())
      cvCopy(clImage,cropped)
      val angle : Double = if(boundRect.angle() < -45.0) boundRect.angle + 90 else boundRect.angle()
      minRects = minRects :+ new BoundingBox(boundRect.boundingRect().x,boundRect.boundingRect().y,boundRect.boundingRect().width,boundRect.boundingRect().height,angle,new Image(cropped,s"${image.name}_${idx}",image.itype))
    }

    //setup bounding boxes
    (image,minRects)
  }

Conclusion

Here, we looked at contouring in JavaCV. JavaCV functions were used to find contours, draw them, and obtain bounding boxes.

Headless Testing and Scraping with Java FX

There is a lot of JavaScript in the world today and there is a need to get things moving quickly. Whether testing multiple websites or acquiring data for ETL and/or analysis, a tool needs to exist that does not leak memory as much as Selenium. Until recently, Selenium was really the only option for webkit, JCEF and writing native bindings for Chromium have been options for a while. Java 7 and Java 8 have stepped into the void with the JavaFX tools. These tools can be used to automate scraping and testing where network calls for HTML, Json, CSVs, pdfs, or what not are more tedious and difficult.

The FX Package

FX is much better than the television channel with some exceptions. Java created a sleeker version of Chromium based on webkit. While webkit suffers from some serious setbacks, Java FX also incorporates nearly any part of the java.net framework. Setting SSL Handlers, proxies, and the like works the same as with java.net. Therefore, FX can be used to intercept traffic (e.g. directly stream images that are incoming to a file named by URL without making more network calls), present a nifty front end controlled by JavaScript and querying for components,

Ui4J

Ui4j is as equally nifty as the FX package. While FX is not capable of going headless without a lot of work, Ui4j takes the work out of such a project using Monocle or Xvfb. Unfortunately, there are some issues getting Monocle to run by setting -Dui4j.headless=true on command line or using system properties after jdk1.8.0_20. Oracle removed Monocle from the jdk after this release and forced the programs using the server to OpenMonocle. However, xvfb-run -a works equally well. The -a option automatically chooses a server number. The github site does claim compatibility with Monocle though.

On top of headless mode, the authors have made working with FX simple. Run JavaScript as needed, incorporate interceptors with ease, run javascript, and avoid nasty waitFor calls and Selanese (this is an entire language within your existing language).

TestFX

There is an alternative to Ui4j in TestFX. It is geared towards testing. Rather than using an Assert after calling or with ((String) page.executeScript(“document.documentElement.innerHTML”)), methods such as verifyThat exist. Combine with Scala and have a wonderfully compact day. The authors have also managed to get a workaround for the Monocle problem.

Multiple Proxies

The only negative side effect of FX is that multiple instances must be run to use multiple proxies. Java and Scala for that matter set one proxy per JVM. Luckily, both Java and Scala have subprocess modules. The lovely data friendly language that is Scala makes this task as simple as Process(“java -jar myjar.jar -p my:proxy”).!. Simply run the command which returns the exit status and blocks until complete (see Futures to make this a better version of non-blocking) and use tools like Scopt to get the proxy and set it in a new Browser session. Better yet, take a look at my Scala macros article for some tips on loading code from a file (please don’t pass it as command line). RMI would probably be a bit better for large code but it may be possible to better secure a file than compiled code using checksums.

Conclusion

Throw out Selenium, get rid of the extra Selanese parsing and get Ui4J or TestFX for webkit testing. Sadly, it does not work with Gecko so Chromium is needed to replace these tests and obtain such terrific options as –ignore-certificate-errors. There are cases where fonts in the SSL will wreak havoc before you can even handle the incoming text no matter how low level you write your connections. For simple page pulls, stick to Apache HTTP Components which contains a fairly fast, somewhat mid-tier RAM usage asynchronous thread pool useable in Java or Scala. Sorry for the brevity folks but I tried to answer a question or two that was not in tutorials or documentation. Busy!

Mixing Java XML and Scala XML for Maximum Speed

Lets face it. ETL and data tasks are expected to be finished quickly. However, the amount of crud pushed through XML can be overwhelming. XML manipulation itself in java can take a while despite being relatively easy due to the need to loop through every attribute that one wished to create or read from. On the other hand, Scala, despite being a terrifically complete language based on the JVM suffers from some of the issues of a functional language. Only Scales XML offers a way to actually perform in place changes on XML. It is not, however, necessary to download an entirely new dependency on top of the existing Java and Scala libraries already available when using Scala.

Scala XML

There are definitive advantages to Scala XML. Imported under scala.xml._ and an other relevant subpackages, creation of XML is much faster and easier than with Java. All nodes and attributes are created as strings, making picking up Scala XML a breeze.

   
   val txtVal="myVal"
   val otherTxtVal="Hello World"
   val node = <tag attr={txtVal} attr2="other"> {otherTxtVal} </tag>

In order to generate child nodes, simply create an iterable with the nodes.

   val attrs:List[String] = List("1","2","3") //because we rarely know what the actual values are, these will be mapped
   val node = <property name="list"><list>{attrs.map(<value> li </value>)}</list></property> // a Spring list

Another extremely decent addition that brings speed to Scala XML parsing is the Pretty Printer under scala.xml.PrettyPrinter(width:int,spacer:int)

     val printer = new PrettyPrinter(250,5) //will create lines of width 250 and a tab space of 5 chars at each depth level
     val output = printer.format(xml)

There is one serious issue with Scala XML. Scala XML uses immutable lists. This means that nodes are difficult to append to. Rewrite
Rules and transformations
are available and particularly useful for filtering and deleting
but not necessarily good at appending.

   class Filter extends RewriteRule{
         override def transform(n:Node):Seq[Node] = n match{
            case (n \\ "bean[@id='myId']).length > 0 => NodeSeq.Empty //please bear with me as I am still learning Scala
            case n.Elem(prefix, label, attribs, scope, child @ _*) => NodeSeq.Empty
            case e:Elem => n
            case _ =>{
               println("Unrecognized")
               throw new RuntimeExceptione
            }
         }
   }

   val newXML = new RuleTransformer(new Filter()).transform(xml) //returns filtered xml

Such code can become cumbersome when writing a ton of appendage rules each with a special case. Looking at the example from the github page makes this abundantly clear. Another drawback is the speed impact a large number of cases can generate since XML is re-written with every iteration. >a href=”https://github.com/chris-twiner/scalesXml”>Scales XML overcomes this burden but has a much less Scala-esque API and requires an additional download. Memory will be impacted as well as copies of the original information are stored.

Java Document

Unlike Scala XML, Java already has a built in in-place XML editor available after the release of Java 4. Importing this library is not outside the standard uses of Scala. Perhaps the Scala developers intended for their library to mainly be used in parsing. Hence, there is a mkString within the native library but a developer or programmer must download sbt with sbt.IO for a scala File System API that itself uses Java’s File class. There is also the lack of networking tools leading to third party tools and a Json library not quite excelling at re-writing Json in older Scala versions (non-existant in 2.11.0+). While it would be nice to see this change, there is no guarantee that it will. Instead, java fills in the gaps.

Here the javax.xml libraries come particularly in handy. The parsers and transformers can read and generate strings, making interoperability a breeze. Xpath expressions may not be as simple as in Scala where xml // “xpath” or xml / “xpath” will return the nodes but it is painless.

   import javax.xml.parsers._ //quick and dirty
   import org.w4c.dom._
   import javax.xml._    
   val xmlstr = <tag>My Text Value</tag>
   val doc:Document = DocumentBuilderFactory.newInstance.newDocumentBuilder.parse(new ByteArrayInputStream(xmlstr.getBytes))
   val nodes:NodeSet = xpath.compile("//tag").evaluate(doc,NodeConstants.NodeSet)

With a nodelist, one can getAttributes(), setAttribute(key,value), appendChild(node), insertBefore(nodea,nodeb) and call other useful methods, such as getNextSibling() or getPreviousSibling(), from the API. Just be careful to use the document element when making calls on the nodes where necessary.

Inter-operability

Inter-operability between scala and java is quite simple. Use a builder with an input stream obtained from the string to read an XML document to Java and a transformer to read that XML back to something Scala enjoys.

//an example getting a string with Pretty Printer from an existing document
   val xmlstr = <tag>My Text Value</tag>
   val doc: Document = DocumentBuilderFactory.newInstance.newDocumentBuilder.parse(new ByteArrayInputStream(xmlstr.getBytes))
   val printer = new scala.xml.PrettyPrinter(250,5)
   val sw = new java.io.StringWriter()
   val xmlString = javax.xml.transform.TransformerFactory.newInstance().newTransformer().transform(new javax.xml.transform.dom.DOMSource(doc),new javax.xml.transform.stream.StreamResult(sw))
   val xml = printer.format()