Secure Your Data Series: Protecting from SQL Injection

I have been collecting data and building databases for a while. As I do so, I have come across a wide variety of mistakes in thought and code varying from the notion that a captcha works to a failure to adequately protect from SQL injection. Therefore, I am going to publish fixes to what I find here with examples utilizing Groovy or Spring and Java.

First up, the SQL injection attack. Attempts to stop these attacks range from  using JavaScript to setting variables. I recently came across a site that made an attempt to stop these attacks by limiting wildcards and querying with a variable. However, changing the header by resetting the variable actually gave me access to everything on the site in one fell swoop.

Suggestions

A few suggestions follow:

  • Do not check anything with JavaScript, this is too easy to get around by simply rewriting the page.
  • Write rules in the back end to protect from unauthorized access using SQL. Check variable combinations.
  • Eliminate any suspicious non-allowable wildcards and look for key terms like LIKE, ILIKE,%, and = that are common for WHERE clauses. Select statements in SQL/POSTGRESQL follow a form similar to SELECT –data list– FROM –table– WHERE –variable— ILIKE ‘%answer%’
  • Track IP Addresses and web pages using a symbol table (HashMap and the like) to eradicate any attempts to plainly just post to a server when it is not an API. This should be done in addition to any viewstate tracking and I strongly encourage the use of event validation.
  • Use event validation if available or come up with your own.

Implementation

Most requests occur through a POST request and most of the requests are handled in Java using spring or a format such as aspx with Groovy Grails starting to become popular.

  
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Controller;
import org.springframework.web.context.request.RequestAttributes;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;

@Controller
@RequestMapping("/path")
public class PostController{
      //RequestMethod.GET also exists
      @RequestMapping(value="/path",method=RequestMethod.POST)
      public ResponseEntity newAnswer(@RequestParam(value="stringq", required=true) String stringq,@RequestParam(value="",required=true) boolean wildcard){
           //prep response
           HttPHeaders headers=new HttpHeaders();
           headers.set("Content-Type","text/html; charset=utf-8");
           
           //get IP
           String ip=requestAttributes.getRequest().getRemoteAddr();

           String data=null;           

           //check for appropriate response page, assumes post-like nature with single url
           //makes a call to a HashMap entity setup at the beginning of the program
           // and accessible throughout the servers existance
           if(IPmap.get(ip).compareTo('appropriatePageURL')==0){
           
               //create variable data here with response JSON or string
           }else{
               //create innappropriate use message
           }
           return new ResponseEntity(data, headers, HttpStatus.CREATED);
      }
}

The hashmap in this case is a static hash map. However, for security (even if using package level access modifiers) a file may be the solution. This will add time to the response though.

Mornging Joe: Can Computer Vision Technology Help De-Militarize the Police and Provide Assistance?

There ha been an explosion of computer vision technology in the past few years or even the last decade or so considering OpenCV has been around that long. The recent events in Ferguson have created a need for keeping the police in line as well as the need to present credible evidence regarding certain situations.

Many police departments are starting to test programs that place snake cams like those used in the military on officers. While this could be viewed as more militarization, it also can present departments with a black eye if power is abused.

What if the lawyers, police, and ethics commissions could have a way of recognizing potentially dangerous situations before they happen? What if there was a light weight solution that allowed data programs to monitor situations in real or near real time, spot troublesome incidents, and provide alerts when situations were likely to get out of hand? What if potentially unethical situations could be flagged?

The answer is that this is possible without too much development already.

Statistical patterns can be used to predict behaviour long before anything happens. Microsoft and Facebook can accurately predict what you will be doing a year from now. The sad state of the current near police state is that the government has as much or more data on officers and citizens than Microsoft and Facebook.

These patterns can be used to narrow the video from those snake cams to potentially harmful situations for real time monitoring.

From there, a plethora of strong open source tools can be used to spot everything from weapons and the potential use of force, using the training capabilities of OpenCV and some basic kinematics (video is just a bunch of really quickly taken photos played in order), speech using Sphinx4 (a work in progress for captchas but probably not for clear speech), and even optical character recognition with pytesser. A bit of image pre-processing and OCR in Tesseract can already break nearly every captcha on the market in under one second with a single core and less than 2 gb of RAM. The same goes for using corner detection and OCR on a pdf table. Why can’t it be used in this situation?

The result in this case should be a more ethical police force and better safety to qualm the fears of officers and civilians alike.

Call me crazy but we can go deeper than just using snake cams on officers to police officers and provide assistance.  Quantum computing and/or better processors and graphics cards will only make this more of a reality.

The Benefits of combining Fork Join pools with PostgresSQL

Warning: As I am incredibly busy at the moment, benchmarks are not necessarily provided. This is a performance review based on work experience and the accompanying documents for my superiors.

Postgresql is a terrific free tool. In my opinion, it can do almost all than an Oracle product and has quick functions for performing critical tasks such as dumping an entire database in a CSV from a programming language. However, insertion can be slow. I am constantly updating three or more databases with 15 or more attributes a piece. Unfortunately, my company makes an estimated revenue of the cost of an Oracle or Microsoft license. While PostgresSQL developers promise multi-threading in the future, Java has actually solved a significant portion of this task for us along with Apache using Fork Join Pools. The result can be an improvement by thousands of records.

Fork Join Pools and Why to use SE 8

Java SE 7 introduced a Fork Join Pool to the concurrent processes. Oracle recommends only using this for extremely intensive tasks. The tool itself works by sharing tasks among threads, work-stealing. Java SE 8 improves on the algorithm and reduces the number of tasks which are dropped.

Setting Up the Pool In Spring 4

I reviewed several different connection pools before settling on the vastly improved and newest version of Apache DBCP. BoneCP offers improved performance but at the cost of ignoring critical flaws. Chiefly, the current version fails to close connections properly leading the developer to recommend that I revert to version 0.8. The new Apache DBCP outperformed C3P0 in my benchmark tests. My outlook is provided below. It is based on a connection between both a machine and the internet to a co-location and from a machine to a local machine.

Connection Pool Pros Cons
Apache DBCP Reliable Somewhat slower than BoneCP
BoneCP Fast Somewhat Unreliable
C3PO Reliable Slow and worse option than DBCP.

Setting Up the DBCP is incredibly simple and configurable. Spring uses gradle or XML for configuration. For my purpose, XML provided a decent option since my teammates use XML in their daily work. For that reason, the XML configuration is provided below. A decent way to set this up is to use the data source properties bean as a reference in the declaration of the DAO Template’s data source.

<bean id="dataSource" destroy-method="close"
  class="org.apache.commons.dbcp.BasicDataSource">
    <property name="driverClassName" value="${jdbcdriver}"/>
    <property name="url" value="${jdbcurl}"/>
    <property name="username" value="${dbcusername}"/>
    <property name="password" value="${dbcpass}"/>
    <property name="initialSize" value="3"/>
    <property name="validationQuery" value="Select 1" />
</bean>

Other options are provided at the DBCP site. A validation query is provided here since I ran into an issue with the validation query in my own work.

A reference can be provided to the DAO template using:

     <bean id="jdbcTemplate" ref="dataSource"/>

Delivering the Code with a DAO Template: Avoiding Race Conditions

An important consideration is how multiple connections will be handled by the DAO Template. Fortunately, Java offers asynchronization and Spring allows for this to be declared even without declaring a method to be synchronized using the annotation @Async.

   @Async
   protected void postData(Map data){

   }

The method above is now Asynchronously accessed.

Spring accepts the upper class in the collection hierarchy that more common collections implement or extend. Basically, due to inheritance, the HashMap “is-a” map but Map cannot be recast to HashMap. Call the getClass() method to see this in action. If .getClass() returns java.util.HashMap, it is possible to recast Map to HashMap to gain any benefits beyond implementing the Map interface. Objects are passed by reference, by memory location more specifically, so this should be the case.

The Fork Join Pool

In this instance, the Fork Join Pool should accept Recursive Actions. They are not only useful in recursive tasks. Due to the work sharing, they improve I/O and other intensive tasks as well. Oddly, I ran into an issue where methods returning values with RecursiveTask (my parsers) would not completely close. I switched to using Callables instead. Fork Join Pools accept both Runnable, Callable “is-a” runnable, and ForkJoinTask classes explicitly. Threads, Callables, Runnable, ForkJoinTask, and RecursiveAction objects are acceptable.

Instantiation for this task requires:

ForkJoinPool fjp=new ForkJoinPool(Runtime.getRuntime().availableProcessors()*procnum);

The number of processes is the argument. Anything that does not fit an integer throws an IllegalArgumentException.

Keep in mind that Armdahl’s rule applies and that too many processes will cause the process to slow down, as does inappropriate use.
wiki-to-armdahls-rul

To submit a process use:

fjp.execute(new SQLRecursiveAction(/**parameters**/));

The SQL class (in this case an inner class) would be:

private SQLRecursiveAction extends RecursiveAction{
      private final String aString;
      public SQLRecursiveAction(String aString){

      }

      protected void compute(){
            /**call to DAO Template**/
      }
}

The compute method is required as it implements a protected abstract method (abstract methods must at least be default and accessible in abstract classes).

Once ready, do not shutdown the Fork Join Task. Instead, there are other ways to wait for completion that allow reuse.

int w=0;
while(fjp.isQuiescent() == false && fjp.getActiveThreadCount()>0){
w++;
}
//log.info("Waited for "+w+" cycles");

When complete, shutdown is necessary. This should be done when finished with the pool.

fjp.shutdown() //orderly shutdown
//fjp.shutdownNow() for immediate shutdown

Findings

In the end, the Fork Join Pool significantly improved the performance of my application which can parse any type of document into attributes stored in JSon strings and posted to the database. The number of records I was able to parse and post increased dramatically from several hundred over one minute to over 10,000 records with the same number of attributes. It may also be wise to consider other improvements such as TokuTech’s fractal tree indexing for large and normalized databases.