Morning Joe: Big Data Tools for Review

Big data is a big topic and many tools are starting to Spring up. However, implementations of the old SQL standard are falling far behind these tools. My recent post on using a Fork Join Pool in insertion and pulling data can help if using PostgreSQL but also destroys bandwidth. It cause headaches for everyone whenever I need to do some serious checking of my ETL,parsing, normalization, data work, or other processes outside of our co-location. Although this is nothing new, I’ve compiled a list of tools to go forth and make little rocks from big rocks all day.

I’ve found some companies and a tool for a big review later on, this is my prework post while my IDE starts up. So far, I have found technologies for SQL databases that use Hadoop,Fractal Trees, and Cassandra to speed up the process. They are not focused specifically on speed but can help create faster database access and lower coding time.

What I’ve found so far:

  • Cassandra (open source): promises scalability and availability alongside a plethora of features (maybe for implementing other tools)
  • Oracle Data Integration Adapter for Hadoop: promises the speed of hadoop connected to an Oracle database
  • BigSQL (open source):promises to combine Cassandra,PostgreSQL, and Hadoop into a blazing fast package for analsis.
  • MapR technologies (somewhat open source): offers a wide variety of products to improve speed in querying and analysis from Hive and map reduce, to actual hadoop
  • Fractal Tree Indexing (open source): Tokutech’s fractal tree indexing speeds up insertions using buffers on each tree node
  • Alteryx: a tool for quicker data processing though not quite as fast as the others (good if your budget does not allow clusters but allows something better than Pentaho
  • MongoDB (open source): Combines map reduce and other technologies with large databases. Tokutech tested its fractal tree indexes on MongoDB
  • Pentaho (open source): The open source version of Alteryx

Many of these tools are already implemented in others such as Pentaho. Personally, I would like to see a SQL-like language that uses these tools alongside a query processor. It would make the tasks even faster to write, think Java v. Python. You could have 10 lines of map reduce code, 5 minutes of click and drag, or a 1 line easy-going query that writes as you think.

To be clear, I am not ranking these, only marking them for future review since this is what has piqued my interest today. Cheers!

Leave a Reply