Data collection,cleaning, and presentation are a pain, especially when dealing with a multitude of sources. When APIs aren’t available and every step is taken to keep people from getting data, it can be incredibly tedious just to get the data. Parsing in this instance, of course, can be made easier by relating terms in a dictionary and using the documents structure to your advantage. At worst it is just a few lines of regex or several expath expressions and more cleaning with Pentaho. I’ve gone a bit further by enforcing key constraints and naming conventions with the help of Java Spring.
It seems that IBM is making this process a little less time consuming with Watson. Watson appears to have the capacity to find patterns and relations with minimal effort from a CSV or other structured file.
This could really benefit database design by applying a computer to the finding and illumination of the patterns driving key creation and normalization. After all, I would love to be able to focus less on key creation in a maximum volume industry and more on pumping scripts into my automated controllers. The less work and more productivity pre person, the more profit.
So I want to gather climate data and make some predictions on my own using a variety of factors and an Ardunio nano through assembler. That requires storing data I collect and ensuring that it can stream and is accountable. Which normalization level do I use?
- Only reduces horizontal redundancy so no.
- Only reduces vertical redundancy so no.
- Closer. Everything relates to the key. BNCF is even closer since the key explains everything with all candidate keys separated.
- Splits out multiple redundancies and further reduces data. So weather data can be separated by sensor or snow-water equivalent by area and layer.
- Accounts for more business-like rules. Is this overdoing it? It is semantic. Do I know enough to use it?
- Takes over all of the set of related values with a join. It is good for temporal data.
My data is meant to persist once it is inserted. It must be separated for easy mathematical calculations. Finally, it deals with nature, so relationships should probably not be rule defined. I n particular, it deals with a side of nature that no one really knows much about. I want to preserve all possible relationships. Therefore, 5 NF is a bit much.
I do need to relate things to keys so I can grab by specific area, day, weather, type of phenomena; whatever else I need. I also need to separate attributes into easy to grab attributes with an appropriate impact. The goal is prediction and calculation.
I am going to use 4 NF. Look back for more on this project.