Secure Your Data Series: Protecting from SQL Injection

I have been collecting data and building databases for a while. As I do so, I have come across a wide variety of mistakes in thought and code varying from the notion that a captcha works to a failure to adequately protect from SQL injection. Therefore, I am going to publish fixes to what I find here with examples utilizing Groovy or Spring and Java.

First up, the SQL injection attack. Attempts to stop these attacks range fromĀ  using JavaScript to setting variables. I recently came across a site that made an attempt to stop these attacks by limiting wildcards and querying with a variable. However, changing the header by resetting the variable actually gave me access to everything on the site in one fell swoop.


A few suggestions follow:

  • Do not check anything with JavaScript, this is too easy to get around by simply rewriting the page.
  • Write rules in the back end to protect from unauthorized access using SQL. Check variable combinations.
  • Eliminate any suspicious non-allowable wildcards and look for key terms like LIKE, ILIKE,%, and = that are common for WHERE clauses. Select statements in SQL/POSTGRESQL follow a form similar to SELECT –data list– FROM –table– WHERE –variable— ILIKE ‘%answer%’
  • Track IP Addresses and web pages using a symbol table (HashMap and the like) to eradicate any attempts to plainly just post to a server when it is not an API. This should be done in addition to any viewstate tracking and I strongly encourage the use of event validation.
  • Use event validation if available or come up with your own.


Most requests occur through a POST request and most of the requests are handled in Java using spring or a format such as aspx with Groovy Grails starting to become popular.

import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Controller;
import org.springframework.web.context.request.RequestAttributes;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;

public class PostController{
      //RequestMethod.GET also exists
      public ResponseEntity newAnswer(@RequestParam(value="stringq", required=true) String stringq,@RequestParam(value="",required=true) boolean wildcard){
           //prep response
           HttPHeaders headers=new HttpHeaders();
           headers.set("Content-Type","text/html; charset=utf-8");
           //get IP
           String ip=requestAttributes.getRequest().getRemoteAddr();

           String data=null;           

           //check for appropriate response page, assumes post-like nature with single url
           //makes a call to a HashMap entity setup at the beginning of the program
           // and accessible throughout the servers existance
               //create variable data here with response JSON or string
               //create innappropriate use message
           return new ResponseEntity(data, headers, HttpStatus.CREATED);

The hashmap in this case is a static hash map. However, for security (even if using package level access modifiers) a file may be the solution. This will add time to the response though.

A Quickee: So Fedora Wireless Is Not Working with BCM 4313

Its 1 am and instead of working over my VPN, I have been trying to get Fedora 20 to just use my wireless. Stuck and bumping my head into a wall, I found some instructions that actually worked, a bit.

Run a modprobe or lspci first to find which network controller you have (lspci | grep NetworkController*).

If you have a Broadcom 4313 like I do, then you may need to update your drivers using yum. The instructions are here. It bought me enough speed to resolve any further issues. I have no idea why this is not fixed despite all the bug reports and form questions. Considering this post still loads extremely slow but really quick on Windows 8 or Ubuuntu on my desktop, there is still a problem. That is with the cache deleted.


Cheers! I think.

A New Project: Is a Distance Based Regular Expression Method Feasible?

So, I would like to find specific data from scraped web pages, pdfs, and just about anything under the sun without taking a lot of time. After looking over the various fuzzy logic algorithms such as Jaro-Winkler, Metaphone, and Levenstein and finding that one did not have an incredibly wide application, I decided that developing a regular expression based distance algorithm may be more feasible.

The idea is simple, start with a regular expression, build a probability distribution across a good and known data set or multiple data sets, and test for the appropriate expression across every web page. The best score across multiple columns would be the clear winner in this case.

Building out the expression would be include taking known good data and finding a combination between the base pattern and the data that works or building an entirely new one. Patterns that appear across a large proportion of the set should be combined. If [A-Z]+[\s]+[0-9]+[A-Z], and [A-Z]+[\s]+[0-9]+, appears often in the same or equivalent place or even [a-z]+[\s]+[0-9]+, then it should likely be [A-Z\s0-9a-z]+, if the set is similarly structured. Since the goal is to save time in programming regular expressions to further parse Xpath or other regular expression results, this is useful.

The tricky part of the project will be designing a similarity score that adequately equates the expressions without too many outliers. Whether this is done with a simple difference test resulting in a statistical distribution or a straightforward score needs to be tested,

In all likelihood, re-occurring words should be used to break ties or bolster weak scores.

The new project will hopefully be available on Source Forge for data integration and pre-curation processes.