Akka: Properly build a New Stream Source with Video Stream Outpu

The Akka documentation is extensive and yet is limited in some ways. One such way is in describing the breadth of issues required to build a functional streaming source in Akka Streams. This article covers source generation in more detail to help you avoid a major catastrophe.

We will use video processing with FFMPEG in our example source as we need to break apart frames and then emit every frame in a stream, an intensive real world task. There is no guarantee that our video stream will be production grade but it can start you on the path to success.

See also:

Building A Source

Building a source does not need to be a struggle but there are several actions that each source must account for.

It is recommended to read Lightbend’s customer stream processing documentation.

These actions are:

  • Handling the Completion Downstream
  • Handling a Success
  • Handling a Failure

The Akka graph stage, used to create a custom stream offers a downstream finish method as shown in the example below. This is not well documented.

Using a Materialized Value

Often, we want to create a future from a stream to ensure a stream completed as wanted. Promises in Scala offer a way to create futures while notifying the programmer of failures and successes.

A promise is easy to create:

     var promise : Promise[Long] = Promise[Long]()
          promise.tryFailure(new Exception("Test Failed"))

A promise is generated and given a success or failure. A future is then generated which can be handled normally.

Putting It All Together

Each concept can be applied to the generation of a custom source:

case class VFrame(id : Long, path : Path, frame : IplImage,captureDate : Long = DateTime.now().getMillis)

class VideoSource(path : Path) extends GraphStageWithMaterializedValue[SourceShape[VFrame],Future[Boolean]]{
  val out : Outlet[VFrame] = Outlet("VideoFrameSource")
  override def shape: SourceShape[VFrame] = SourceShape(out)

  override def createLogicAndMaterializedValue(inheritedAttributes: Attributes): (GraphStageLogic, Future[Boolean]) = {
    val promise : Promise[Boolean] = Promise[Boolean]()
    var readImages : Long  = 0L
    var convertToIpl : OpenCVFrameConverter.ToIplImage = new OpenCVFrameConverter.ToIplImage()
    var grabber : FFmpegFrameGrabber = null
    var frame : Frame = null
    var converter = new OpenCVFrameConverter.ToIplImage

    val logic = new GraphStageLogic(shape){
      setHandler(out, new OutHandler{
        override def onPull(): Unit = {
          try {
            if(grabber == null){
                throw new FileNotFoundException(s"Path to ${path.toFile.getAbsolutePath} does not Exist")

              grabber = new FFmpegFrameGrabber(path.toFile.getAbsolutePath)

            if(grabber != null){
              try {
                frame = grabber.grab()
                case t : Throwable =>{
              if(frame != null) {
                val im =  converter.convert(frame)
                readImages += 1
            case t : Throwable =>{

        def fail(ex : Throwable)={
          if(grabber != null){

        def success()={
          if(grabber != null){
            try {
            }catch {
              case t : Throwable =>{

        override def onDownstreamFinish(): Unit = {


    logic -> promise.future

This class creates a source with a materialized value, a topic not fully covered in the Lightbend documentation. The number of frames is returned. The source overrides onDownStreamFinish and implements custom success and failure processing.

The anatomy of the above source is simple. A class is created which takes in a Path to a file source. FFMpeg is used to stream the obtain frames from the video. The source requires the creation of specialized source logic after specifying the source shape and type through createLogicAndMaterializedValue. The Promise promise contains the results of the materialized values. The resulting logic method yields a future when success  or failure are called. The method completeStage is called to complete the source.


This article examined the creation of new sources in Akka. We reviewed handling the completion of downstream sources, success, failure, and the return of a materialized value. Creating sources is easier than Lightbend lets on.


Open Source Data Science, the Great Resource No One Knows About

There is a growing industry for online technology courses that is starting to gain traction among many who may have been in school when certain fields like data science were still the plaything of graduate students and phds in Computer Science, statistics, and even, to a degree, biology. However, these online courses will never match the pool of knowledge one could drink from by even taking an undergraduate Computer Science or mathematics class at a middling state school today (I would encourage everyone to avoid business schools like the plague for technology).

In an industry that is constantly transforming itself and especially where the field of data will provide long-term work, these courses may appear quite appealing. However, they are often too shallow to provide much breadth and just thinking that it is possible to pick up and understand the depth of the 1000 page thesis that led to the stochastic approach to matrix operations and eventually Spark is ridiculous. We are all forgetting about the greatest resources available today. The internet, open source code, and a search engine can add layers of depth to what would otherwise be an education not able to provide enough grounding for employment.

Do Take the Online Courses

First off, the online courses from Courses from Coursera are great. They can provide a basic overview of some of the field. Urbana offers a great data science course and I am constantly stumbling into blogs presenting concepts from them. However, what can someone fit into 2-3 hours per week for six weeks in a field that may encompass 2-3 years of undergraduate coursework and even some masters level topics to begin to become expert-level.

You may learn a basic K Means or deploy some subset of algorithms but can you optimize them and do you really know more than Bayesian probabilities that you likely also learned in a statistics class.

Where Open Source Fits In

Luckily, many of the advanced concepts and a ton of research is actually available online for free. The culmination of decades of research is available at your fingertips in open source projects.

Sparse Matrix research, edge detection algorithms, information theory, text tiling, hashing, vectorizing, and more are all available to anyone willing to put in the time to learn them adequately.


Documentation is widely available and often on github for:

These github accounts also contain useful links to websites explaining the code, containing further documentation (javadocs), and giving some conceptual depth and further research opportunities.

A wide majority of conceptual literature can be found with a simple search.

Sit down, read the conceptual literature. Find books on topics like numerical analysis, and apply what you spent tens or even hundreds of thousands of dollars to learn in school.

Secure Your Data Series: Why a Captcha Alone Fails

Since captchas are meant to be unreadable by a computer, they are a great tool for better learning the task of OCR. As even Google now admits, Captcha’s are breakable. This is more concerning from a security standpoint, revealing that even an open source OCR like Tesseract can defeat this system. A little computer vision and some basic pre-processing in python will break most Captchas. That is why I champion the use of a mapping and analysis of click stream data with Latent Dirichlet Allocation to classify human from non-human or hacker from non-hacker (stay tuned its coming). Adding the LDA approach to a captcha system with a higher probability of failure for automated processes, guessing here, and use of click stream data to form vectors (literal mathematical vectors) and security becomes a lot better.

Let’s do some Captcha breaking but beware this is purely educational and not for breaking the law! Many Captchas have sound options to comply with handicap laws of which simpler puzzles can be broken with sound recognition such as Sphinx4. However, the dilution of the sound in modern Captchas can make OCR useful for aiding the disabled. Basically, there are uses of this code that are likely to remain legal as companies look to narrow the definition of authorized access.

Image Preprocessing

Captcha images contain all sorts of clutter and manipulations with the goal of eliminating readability. This makes pre-processing critical to the goal at hand. Speed is the crucial consideration in this task so any library or custom code needs to be extremely efficient.

Two modules exist in Python that help with preprocessing. They are OpenCV (cv2) and pillow (PIL). My image toolset can also be used for these tasks.

OpenCV is a powerful open source library with the aim of making a lot of calculus and differential equation based code for computer vision incredibly easy to deploy. It runs extremely quickly. The modules are largely written in C and there is also a C++ API. OpenCV is great if you want to write custom code too as the tutorials also dive deeply into the mathematics behind each program. For this case, classes from cv2 including resize (which goes further than basic expansion),Canny edge detection, and blurring are highly effective. After writing the classes in Java and even using a graphics card library in python to do the same tasks, it appears that OpenCV matches or is only slightly slower than the custom code. The images are small though.

Other modules are incredibly good at performing contouring, corner detection, and key point finding. If you have a computer vision or artificial intelligence task, Open CV is the go-to API.

import cv2

For basic pre-processing, pillow is also an incredibly fast library. Again, compared to my own Java code, the modules work at about the same speed. The idea behind them is the use of kernels, small matrices filled with weights that can be used to distribute color in an image via a window.

from PIL import ImageEnhance
from PIL import ImageFilter
from PIL import Image

All of the necessary pre-processing, whether custom or module based can be completed in less than one second, producing the result shown below. However, it is necessary to fiddle with the images until they look as close to they way a normal input would.

Overall, the total time taken to break a captcha ranges from roughly one second or less to four seconds on a dual core machine with 4gv of RAM. Completing tasks with custom code may improve speed when using faster matrix libraries but numpy is fairly efficient in today’s world.

One extremely useful trick is to resize the image and improve contrast.



If using numpy, there is an extremely useful way to apply a function to all pixels of an image as well using some Python magic.


Decluttering with Statistics

Certain transforms and other techniques may leave unwanted clutter in the image. Ridding small or abnormally sized objects from an image is performable with basic statistics. Remember that 1.5 standard deviations [sum(x-mean)^2/n] is a normal outlier and 3 standard deviations is an extreme outlier. This can be used to eliminate elements that are longer than others. The example below follows an object and eliminates it based on width and has proven successful. If vertical objects are present, surface area coverage may be a better consideration. These work better than contouring here because the images are not always connected properly. They need to be readable by a human and not a computer.

def declutter(self,inarr):
        """Declutter an Image"""
        #get the avg, total
        for i in range(height):
            for c in arr[i]:
                if c < 128 and account is True:
        #calculate sd
        for n in wsarr:
        #perform declutter
        for i in range(height):
            for c in arr[i]:
                if c128 and account is True:
                    if (j-ws) > (avg+o) or (j-ws) <(avg-o):
                        for j in range(j-ws):
        print str(total)+" objects removed"
        return (arr,total)       


Rotating with the Bounding Box and an Ode to OpenCV

In order to complete our work, it is a good idea to know how to find the minimum bounding box and rotate the image. This is made difficult in many instances by the fact that the letters in a Captcha are not always continuous black lines. Thankfully, OpenCV contains a multitude of feature detectors that can help outline or find key features for the different objects.

My initial attempts at this involved a gaggle of different methods. After attempts at using a threshold with Euclidean distances to produce a set further reduced by basic statistics revolving around centroids, and a few other methods, I arrived at contours. The Euclidean distance method would likely work on regular text lines but, here, characters are intentionally smashed together or unwanted lines mixed in. I kept getting double letters with different Captchas.  The result of using these methods is the feeling of frustration.

In contrast to the feeling of being lost, OpenCV’s contour class can find angles, bounding boxes that are rotated, and many other useful features. Contours rock.

A contour basically uses defined edges. Algorithms include using Latent Dirichlet Allocation to group objects found from edge detection and marching squares with defined edges following certain angles and patterns, much like a Haar cascade in a way. Obviously any AI implementation is much stronger.

   def getContours(self,img):
        Get Contours from CV2
        return cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)    

With the contours found, it is a good idea to discover any patterns in spacing and use those to combine the boxes with an expanded intersection condition and a while loop.

Now, we can rotate our individual letters. The equation here is simple and the code can be written easily. The following kernel is useful for rotation about the center of an image.

The code for the equation in scipy is as follows. It is possible to use the above matrix on each pixel to discover the new location as well. I wrote the kernel into the sample.

   def rotate(self,img,cnt):
        Utilizes contouring via cv2 to recognize letters and rotate them.
        Takes in a numpy array to work with and returns a set of numpy arrays.
        This will rotate on a provided contour.
        *Required Parameters*
        :param img: image as a numpy array
        :param cnt: contour to rotate on
            #get the basic points
            #get the rotational angle
            print "Rotational Degrees: "+str(degree)
            return im
        #rotate with interpolation in scipy
        return ndimage.rotate(im,degree,mode='nearest',cval=100)

Another, more powerful rotation uses the same ndimage (whose cropping algorithm likely interpolates much better than simply multiplying [(x,y),(x1,y1)] by [(x),(y)] and crops. This uses a best fit line with slope found using one of a variety of least squares calculations. The cv2 function used returns vectors collinear to x and y with length one in the other place.

   def rotateFromPixels(self,image,cnt,tolerance=(10*math.pi)/180):
        Takes in contour information and rotates based on
        the line of best fit through contour points. I discovered
        that this could be done after getting the original box though
        the box may reduce the effect of large skews.
        This program also considers the minimum area rectangle in 
        case a letter is actually skewed only stretched (skew not detectable from box). I'm guessing
        that these will not always be the same since a statistical best fit
        is not always the same as finding the absolute max and min points.
        *Required Parameters*
        :param image: the image to rotate
        :param cnt: the contour information
        *Optional Parameters*
        :param tolerance: the allowable tolerance (deviation from 0 degrees in radians)
        print str(math.atan(y1-y0/x1-x0))
        print "BoxPheta: "+str(boxpheta)
        if abs(boxpheta-d90) > tolerance:
            #find the perpendicular slope to the given slope
            if vx[0] is 0 and vy[0] is 1:
                return image2
                print "Slope Points: "+str(vx[0])+","+str(vy[0])
                print "Slope: "+str(slope)
                print "Pheta: "+str(pheta)
                print "Pheta (degrees)"+str((pheta*180)/math.pi)
                print "\n\n\n\n\n"
            if vx[0] >0:
        return image2 

If rotation is necessary, it may be possible to stick letters in a set and then attempt to read them individually rather than stitch an image together. Since a white background is present, it is also possible to rotate the letters and stick them back into the image. Please read the alignment section with code regarding alignment to see how this is done.


Aligning individual images is normally necessary, especially for systems considering shapes that use LDA or a similar AI task to learn different groupings. If a u is too far above a proceeding letter like an S, the response from the system may be ” instead of u.

Using cv2, it is not difficult to align letters that are nearly good enough to run through OCR. Open CV includes powerful contouring tools based on machine learning techniques. Contouring allows individual letters to be discerned, allowing for even more techniques to be applied such as a rotational matrix to the image matrix as described in rotation.

The proceeding code exemplifies the process of alignment. It does not, however, consider whether a letter has a stem.

   def alignLetters(self,img,maxBoxArea=None,minBoxArea=None,printBoxArea=False,imShow=False):
        Align Letters in a Pre-Processed Image. Options are provided
        to limit the size of accepted bounding boxes, helping eliminate 
        non-contours and the usual box covering the image as a whole.
        Returns a PIL Image.
        *Required Parameters*
        :param img: numpy array image to use
        *Optional Parameters*
        :param maxBoxArea: maximum bounding box area to accept (None is a non check)
        :param minBoxArea: minimum bounding box area to accept (None is a non check)
        :param printBoxArea: boolean to state whether to print box areas (list trimmed if maxBoxArea or minBoxArea set)
        :param imShow: show the individual contoured images and final image
        #setup image
        #get the contours and bounding boxes and filter out bad bounding boxes
        for cnt in contours:
            #obtain only bounding boxes meeting a certain criteria
            if (maxBoxArea is None or w*hminBoxArea):
                if printBoxArea is True: 
                    print str(w*h)
                if h>maxheight:
                if w>maxwidth:
        #write each box to a new image aligned at the bottom
        for x in istarts:
            if imShow is True:
            if x+imgx>maxWidth:
            for i in range(x,x+imgx):
                if minyminy:
                    endImg.putpixel((i,height), img.getpixel((i-x,iheight)))
        endImg=endImg.crop((0,minHeight, maxWidth,maxheight))
        if imShow is True:
        return endImg

Some Final Processing

Your image should either be split up and stored in a set of characters or look fairly discernable by this point. If it is not, take the time to do some final pre-processing. Using pillow (PIL) to expand edges and eliminate disconnects is one important task for final processing. Try to make the image look like newsprint.

Tesseract: The Easy Part

Now, the image is ready for Tesseract. The command line code can be run via the subprocess library from a temporary file or a pipe can be established directly to Tesseract. The OCR splits the letters into key features, clusters on them, and then either compares the offsets of characters it finds to a letter set or runs a co-variance comparison. I personally will test the co-variance approach in a later article and am building a massive training set.

If you have a set of letters, be sure to use the -psm option and set it to 10. This tells Tesseract to perform single character analysis.

Running Tesseract is simple with pytesser.

  import pytesser
  from PIL import Image

Please. Do not Spam. I am not responsible for the misuse of this Code. This is purely educational!!!

For shits and giggles, here is an LDA algorithm that can be used with a covariance matrix and your own really,really big set of letters from sci-kit learn. Python really is excellent for AI, much easier to use than Java if you want infinite matrix size thanks to gensim,numpy, and sci-kit learn. Dare I say numpy is even faster at matrix solving than almost all Java packages when not using a tool like Mahout due to the use of BLAS and LaPack when possible. That is in part why I used Tesseract. The other is that goolge runs or ran a Captcha service while also supplying this really amazing OCR product.

Mornging Joe: Can Computer Vision Technology Help De-Militarize the Police and Provide Assistance?

There ha been an explosion of computer vision technology in the past few years or even the last decade or so considering OpenCV has been around that long. The recent events in Ferguson have created a need for keeping the police in line as well as the need to present credible evidence regarding certain situations.

Many police departments are starting to test programs that place snake cams like those used in the military on officers. While this could be viewed as more militarization, it also can present departments with a black eye if power is abused.

What if the lawyers, police, and ethics commissions could have a way of recognizing potentially dangerous situations before they happen? What if there was a light weight solution that allowed data programs to monitor situations in real or near real time, spot troublesome incidents, and provide alerts when situations were likely to get out of hand? What if potentially unethical situations could be flagged?

The answer is that this is possible without too much development already.

Statistical patterns can be used to predict behaviour long before anything happens. Microsoft and Facebook can accurately predict what you will be doing a year from now. The sad state of the current near police state is that the government has as much or more data on officers and citizens than Microsoft and Facebook.

These patterns can be used to narrow the video from those snake cams to potentially harmful situations for real time monitoring.

From there, a plethora of strong open source tools can be used to spot everything from weapons and the potential use of force, using the training capabilities of OpenCV and some basic kinematics (video is just a bunch of really quickly taken photos played in order), speech using Sphinx4 (a work in progress for captchas but probably not for clear speech), and even optical character recognition with pytesser. A bit of image pre-processing and OCR in Tesseract can already break nearly every captcha on the market in under one second with a single core and less than 2 gb of RAM. The same goes for using corner detection and OCR on a pdf table. Why can’t it be used in this situation?

The result in this case should be a more ethical police force and better safety to qualm the fears of officers and civilians alike.

Call me crazy but we can go deeper than just using snake cams on officers to police officers and provide assistance.  Quantum computing and/or better processors and graphics cards will only make this more of a reality.

Canny Edge Detection in Java

So you don’t really have a decent graphics card, CUDA in C or pyCuda are not options since you don’t have a NVIDIA card, or you just want something completely cross-platform without a large amount of research. Canny edge detection in straight Java does not need to be slow.

Accomplishing a faster and even memory efficient canny edge detection algorithm only requires the use of loops and the proxy design pattern. Basically, simple code applied to the theory will do the trick. All of the code is available at my github repository.

The most important initial task is actually to pre-process the image to bring out the edges that are wanted and obscure those that might get caught up in the detection. This is likely standard practice as unwanted edges are still edges. It is possible to implement a series of simple corrections such as a Gaussian or box blur, denoise, or an unsharp mask. The links in the previous sentence were not used in the following blur or my code libraries but the concept remains the same.

Gaussian Blur Example
The gaussian blur is an equation based way of getting rid of unwanted edges. The equation averages pixels within a certain window using Gauss’ statistical normal distribution equation.

This equation is implemented simply in Java by applying a window to an image.

private void blurGauss(double radius) {
		// TODO create a kernel and implement a gaussian blur
		// this is a somewhat lazy implementation setting a definite center and
		// not wrapping to avoid black space

		// the radius of the kernal should be at least 3*the blur factor
		BufferedImage proxyimage=image.getImage();
		int rows_and_columns = (int) Math.round(radius);

		while (rows_and_columns % 2 == 0 & rows_and_columns != 0) {

		while (rows_and_columns > proxyimage.getWidth()) {
			rows_and_columns = 3;
		int centerx = ((rows_and_columns + 1) / 2) - 1;
		int centery = ((rows_and_columns + 1) / 2) - 1;

		// the kernel sum
		float sum_multiplier = 0;

		/* get the kernel */
		// the base for gaussian filtering
		float base_start = (float) (1 / (2 * Math.PI * Math.pow(radius, 2)));

		// the multiplier matrix to be applied to every pixel, ensured to be one
		float[][] arr = new float[rows_and_columns][rows_and_columns];

		// the central coordinates

		for (int i = 0; i < rows_and_columns; i++) {
			for (int j = 0; j < rows_and_columns; j++) {
				float exp = 0;

				// calculate the corners
				exp = (float) -1.0
						* (float) ((Math.pow((Math.abs(i - centerx)), 2) + (Math
								.pow(Math.abs(j - centery), 2))) / (2 * Math
								.pow(radius, 2)));
				float base = (float) (base_start * Math.exp(exp));
				arr[i][j] = base;

				sum_multiplier += base;


		/* replace the values by multiplying by the sum_multiplier */
		// get the multiplier
		sum_multiplier = (float) 1 / sum_multiplier;

		// multiply by the sum multiplier for each number
		for (int i = 0; i < rows_and_columns; i++) {
			for (int j = 0; j < rows_and_columns; j++) {
				arr[i][j] = arr[i][j] * sum_multiplier;


		// blur the image using the matrix
		complete_gauss(arr, rows_and_columns, centerx, centery);

	private void complete_gauss(float[][] arr, int rows_and_columns,int centerx, int centery) {
		// TODO complete the gaussian blur by applying the kernel for each pixel
		BufferedImage proxyimage=image.getImage();
		// the blurred image
		BufferedImage image2 = new BufferedImage(proxyimage.getWidth(),proxyimage.getHeight(), BufferedImage.TYPE_INT_RGB);

		// the r,g,b, values
		int r = 0;
		int g = 0;
		int b = 0;

		int i = 0;
		int j = 0;

		// the image height and width
		int width = image2.getWidth();
		int height = image2.getHeight();

		int tempi = 0;
		int tempj = 0;
		int thisx = 0;
		int thisy = 0;
		if (arr.length != 1) {

			for (int x = 0; x < width; x++) {
				for (int y = 0; y < height; y++) {

					// the values surrounding the pixel and the resulting blur
					// multiply pixel and its neighbors by the appropriate
					// ammount

					i = (int) -Math.ceil((double) rows_and_columns / 2);
					j = (int) -Math.ceil((double) rows_and_columns / 2);

					while (i < Math.ceil((double) rows_and_columns / 2)
							& j < Math.ceil((double) rows_and_columns / 2)) {

						// sets the pixel coordinates
						thisx = x + i;

						if (thisx = proxyimage.getWidth()) {
							thisx = 0;

						thisy = y + j;

						if (thisy = proxyimage.getHeight()) {
							thisy = 0;

						// the implementation
						tempi = (int) (Math
								.round(((double) rows_and_columns / 2)) + i);
						tempj = (int) (Math
								.round(((double) rows_and_columns / 2)) + j);

						if (tempi >= arr[0].length) {
							tempi = 0;

						if (tempj >= arr[0].length) {
							tempj = 0;

						r += (new Color(proxyimage.getRGB((thisx), (thisy)))
								.getRed() * arr[tempi][tempj]);
						g += (new Color(proxyimage.getRGB((thisx), (thisy)))
								.getGreen() * arr[tempi][tempj]);
						b += (new Color(proxyimage.getRGB((thisx), (thisy)))
								.getBlue() * arr[tempi][tempj]);


						if (j == Math.round((double) rows_and_columns / 2)) {
							j = 0;


					// set the new rgb values with a brightening factor
					r = Math.min(
											+ ((int) Math.round(arr[0].length
													* arr[0].length))));
					g = Math.min(
											+ ((int) Math.round(arr[0].length
													* arr[0].length))));
					b = Math.min(
											+ ((int) Math.round(arr[0].length
													* arr[0].length))));

					Color rgb = new Color(r, g, b);
					image2.setRGB(x, y, rgb.getRGB());
					r = 0;
					g = 0;
					b = 0;

					i = 0;
					j = 0;

A matrix is generated and then used to blur the image in several loops. Methods were created to make the code more understandable.

Although an “is-a” classification is often associated with interfaces or abstract classes, the proxy design pattern is better implemented with an interface that controls access to the “expensive object.”

Steps to Canny Edge Detection
Canny edge detection takes three steps. These steps prepare the image, mark potential edges, and weed out the best edges.

They are:

  1. Blur with or without denoise and convert to greyscale
  2. Perform an analysis based on threshold values using an intensity gradient
  3. Perform hysteresis

Sobel Based Intensity Gradient
One version of the intensity gradient (read here for more depth on the algorithm) is derived using the Sobel gradient. The gradient is applied in a similar way to the blur, using a window and a matrix.

The matrix finds specific changes in intensity to discover which potential edges are the best candidates. Convolution is performed on the matrix to obtain the best result.

Perform Hysteresis
Hysteresis weeds out the remaining noise from the image, leaving the actual edges. This is necessary in using the Sobel gradient since it finds too many candidates. The trick is to weed out edges from non-edges using threshold values based on the intensity gradient. Values above and below a chosen threshold are thrown out.

A faster way to perform this, if necessary, is to try to use a depth first search-like algorithm to find the ends of the edge, taking connected edges and leaving the rest. This action is fairly accurate.

The Code
Sobel Based Intensity Gradient

private void getIntensity() {
		// TODO calculate magnitude
		 * Kernels
		 * G(x) G(y) -1|0|1 -1|-2|-1 -2|0|2 0|0|0 -1|0|1 1|-2|1
		 * |G|(magnitude for each cell)approx. =|G(x)|+|G(y)|=
		 * |(p1+2p2+p3)-(p7+2p8+p9)|+|(p3+2p6+p9)|-|(p1+2p4+p7)|blank rows or
		 * colums are left out of the calc.

		// the buffered image
		BufferedImage image2 = new BufferedImage(image.getWidth(),
				image.getHeight(), BufferedImage.TYPE_BYTE_GRAY);

		// gives ultimate control can also use image libraries
		// the current position properties
		int x = 0;
		int y = 0;

		// the image width and height properties
		int width = image.getWidth();
		int height = image.getHeight();

		// iterate throught the image
		for (y = 1; y < height - 1; y++) {
			for (x = 1; x < width - 1; x++) { 				// convert to greyscale by masking (32 bit color representing 				// intensity --> reduce to greyscale by taking only set bits)
				// gets the pixels surrounding hte center (the center is always
				// weighted at 0 in the convolution matrix)
				int c1 = (image.getRGB(x - 1, y - 1) & 0xFF);
				int c2 = (image.getRGB(x - 1, y) & 0xFF);
				int c3 = (image.getRGB(x - 1, y + 1) & 0xFF);
				int c4 = (image.getRGB(x, y - 1) & 0xFF);
				int c6 = (image.getRGB(x, y + 1) & 0xFF);
				int c7 = (image.getRGB(x + 1, y - 1) & 0xFF);
				int c8 = (image.getRGB(x + 1, y) & 0xFF);
				int c9 = (image.getRGB(x + 1, y + 1) & 0xFF);

				// apply the magnitude of the convolution kernal (blank
				// column/row not applied)
				// differential x and y gradients are as follows
				// this is non-max suppression
				 * Lxx = |1,-2,1|*L Lyy= {1,-2,1}*L ({} because its vertical and
				 * not horizontal)
				int color = Math.abs((c1 + (2 * c2) + c3)
						- (c7 + (2 * c8) + c9))
						+ Math.abs((c3 + (2 * c6) + c9) - (c1 + (2 * c4) + c7));

				// trim to fit the appropriate color pattern
				color = Math.min(255, Math.max(0, color));

				// suppress non-maximum
				// set new pixel of the edge
				image2.setRGB(x, y, color);

		// reset the image
		image = image2;


private void hysterisis() {
		// TODO perform a non-greedy hysterisis using upper and lower threshold
		// values
		int width = image.getWidth();
		int height = image.getHeight();

		Color curcol = null;
		int r = 0;
		int g = 0;
		int b = 0;

		ve = new String[width][height];

		for (int i = 0; i < width; i++) {
			for (int j = 0; j < height; j++) {
				ve[i][j] = "n";

		for (int i = 0; i < height; i++) {

			for (int j = 0; j < width; j++) { 				curcol = new Color(image.getRGB(j, i)); 				if (ve[j][i].compareTo("n") == 0 						& (((curcol.getRed() + curcol.getBlue() + curcol 								.getGreen()) / 3) > upperthreshold)) {
					ve[j][i] = "c";
					image.setRGB(j, i, new Color(255, 255, 255).getRGB());

					follow_edge(j, i, width, height);
				} else if (ve[j][i].compareTo("n") == 0) {
					ve[j][i] = "v";
					image.setRGB(j, i, new Color(0, 0, 0).getRGB());



Depth First Like Noise Reduction

private void follow_edge(int j, int i, int width, int height) {
		// TODO recursively search edges (memory shouldn't be a problem here
		// since the set is finite and should there should be less steps than
		// number of pixels)

		// search the eight side boxes for a proper edge marking non-edges as
		// visitors, follow any edge with the for-loop acting
		// as the restarter

		int x = j - 1;
		int y = i - 1;
		Color curcol = null;

		for (int k = 0; k < 9; k++) { 			if (x >= 0 & x < width & y >= 0 & y < height & x != j & y != i) { 				curcol = new Color(image.getRGB(j, i)); 				// check color 				if (ve[x][y].compareTo("n") == 0 						& ((curcol.getRed() + curcol.getBlue() + curcol 								.getGreen()) / 3) > lowerthreshold) {
					ve[x][y] = "c";
					image.setRGB(j, i, new Color(255, 255, 255).getRGB());

					follow_edge(x, y, width, height);
				} else if (ve[x][y].compareTo("n") == 0 & x != j & y != i) {
					ve[x][y] = "v";
					image.setRGB(j, i, new Color(0, 0, 0).getRGB());


			// check x and y by k
			if ((k % 3) == 0) {
				x = (j - 1);



Laplace Based Intensity Gradient as Opposed to Sobel

The Sobel gradient is not the only method for performing intensity analysis. A Laplacian operator can be used to obtain a different matrix. The Sobel detector is less sensitive to light differences and yields both magnitude and direction but is slightly more complicated. The Laplace gradient may also reduce the need for post-processing as the Sobel gradient normally accepts too many values.

The Laplace gradient uses 0 as a mask value, obtaining the following matrix.

-1 -1 -1
-1 8 -1
-1 -1 -1

The matrix is used to transform each pixel’s RGB value based on whether or not it is part of a candidate edge.

private void find_all_edges() {
		// TODO find all edges using laplace rather than sobel and hysterisis
		// (noise can interfere with the result)
		// the new buffered image containing the edges
		BufferedImage image2 = new BufferedImage(image.getWidth(),
				image.getHeight(), BufferedImage.TYPE_INT_RGB);

		// gives ultimate control can also use image libraries
		// the current position properties
		int x = 0;
		int y = 0;

		// the image width and height properties
		int width = image.getWidth();
		int height = image.getHeight();

		 * Denoise Using Rewritten Code found at
		 * http://introcs.cs.princeton.edu/
		 * java/31datatype/LaplaceFilter.java.html
		 * Using laplace is better than averaging the neighbors from each part
		 * of an image as it does a better job of getting rid of gaussian noise
		 * without overdoing it
		 * Applies a default filter:
		 * -1|-1|-1 -1|8|-1 -1|-1|-1

		// perform the laplace for each number
		for (y = 1; y < height - 1; y++) {
			for (x = 1; x < width - 1; x++) {

				// get the neighbor pixels for the transform
				Color c00 = new Color(image.getRGB(x - 1, y - 1));
				Color c01 = new Color(image.getRGB(x - 1, y));
				Color c02 = new Color(image.getRGB(x - 1, y + 1));
				Color c10 = new Color(image.getRGB(x, y - 1));
				Color c11 = new Color(image.getRGB(x, y));
				Color c12 = new Color(image.getRGB(x, y + 1));
				Color c20 = new Color(image.getRGB(x + 1, y - 1));
				Color c21 = new Color(image.getRGB(x + 1, y));
				Color c22 = new Color(image.getRGB(x + 1, y + 1));

				/* apply the matrix */
				// to check, try using gauss jordan

				// apply the transformation for r
				int r = -c00.getRed() - c01.getRed() - c02.getRed()
						+ -c10.getRed() + 8 * c11.getRed() - c12.getRed()
						+ -c20.getRed() - c21.getRed() - c22.getRed();

				// apply the transformation for g
				int g = -c00.getGreen() - c01.getGreen() - c02.getGreen()
						+ -c10.getGreen() + 8 * c11.getGreen() - c12.getGreen()
						+ -c20.getGreen() - c21.getGreen() - c22.getGreen();

				// apply the transformation for b
				int b = -c00.getBlue() - c01.getBlue() - c02.getBlue()
						+ -c10.getBlue() + 8 * c11.getBlue() - c12.getBlue()
						+ -c20.getBlue() - c21.getBlue() - c22.getBlue();

				// set the new rgb values
				r = Math.min(255, Math.max(0, r));
				g = Math.min(255, Math.max(0, g));
				b = Math.min(255, Math.max(0, b));
				Color c = new Color(r, g, b);

				image2.setRGB(x, y, c.getRGB());
		image = image2;


The following before and after pictures show the results of applying the Sobel based matrix to the image and using the depth first search like approach to hysteresis.

cross-country before


edge detected

Python Based Edge Detection

In Python, OpenCV performs this operation in a single line.

import cv2


Java Edge Detection and Imaging SDK: Get Involved, Create a Better Java Tool

Searching for a way to parse certain files in java with the re-usability of the Spring framework, I discovered that there are not many imaging SDK tools available with the power of Pythons PIL or pillow. I decided to go back to the drawing board. The issue was made more important by the need to run the tools on a server without much a graphics process. My servers tend to have native graphics processing capability, 4 gb of RAM, and a dual core 2.0 ghz processor.

In response, I began writing an imaging tool kit. Documentation and the classes are available to be forked and uploaded to on github.

Current Features (all homebrew with implementations of common theory) currently include but are not limited to:

  • Gaussian Blur
  • Box Blur
  • Sharpen
  • Denoise
  • Color Inversion
  • Decluttering
  • Canny Edge Detection

Memory use is improved with the use of the proxy pattern and speed improvements include the use of greedy algorithms.