The Case for Using an IRM to Scale Data Intake

Among many, there are three major problems faced by an analyst before data is useful:

  • data aggregation and storage
  • data security and access
  • data wrangling (ETL/ELT)

This article deals with data security and access using an information resource management system, IRM. My own company, Simplr Insites LLC, is writing such a system alongside a file storage solution in an effort to modernize the research process.

Problem

One significant problem faced in research and cooperation is the attainment of clean and useful data. Obtaining this data often means gaining access to systems, forming legal agreements, obfuscating certain data, and embarking on the painful process of data wrangling.

While ETL and ELT are critical steps, just obtaining sensitive data, even from within an organization, is tricky. Consider the following cases related directly to access:

  • data sets include confidential information
  • data sets are ensnared in legal agreements regarding who can access data
  • users want to control access to data to ensure it is not misused
  • external users are allowed varying degrees of access

IRM as a Solution

Oracle generated a solution that attempts to tackle the data security issue. The Oracle IRM documentation provides a rather informative graphical overview of their tool:

irm

In this system, an external user accesses a load balanced IRM server application which controls rights and access to different resources and files. Several firewalls help to improve security along with authentication, access grants, and encryption. Web services  and internal users utilize the IRM server as well.

Beyond the visible components, tokens can be used to instantly manage resources and propagate access changes.

Most file systems also offer the capability to pull the date when a resource was created or modified and various permissions information. This is useful for logging purposes.

Setting Up an IRM

It is not necessary to rely on Oracle for an IRM solution. In fact, the Oracle IRM only works with Microsoft Windows.

Each component can be paired with a reliable tool, most of which I have blogged about. A set of pairings might include

Base Application and Resource Management Django with Secure Login
REST API Resource Access Django OAuth Toolkit
Access Management Django Oauth Toolkit and a Database System
Individual Resource Tokens Randomly Generated and Hashed Key
File Storage GlusterFS or an Encrpytable File System
Encryption of Resources PyCrypto or a Similar Tool
Firewalls IP Tables or another firewall
Two Step Verification through SMS Twilio
Key Storage Stack Exchange Blackbox
VPN Access Firefox
Logging and Anomaly Detection Elastic APM and the ElkStack

Logging

Logging is critical to security. Logs allow administrators to spot harmful activity, generate statistical models based on usage, and aid in auditing the system.

Tokens

Tokens are a perfect solution for controlling document access in the system. They allow a user to gain access to a document, offer scopes for access, and often contain scopes that grant levels of access to a resource.

A user should be required to log in to the application to retrieve a token which refreshes on a regular schedule. These tokens can be revoked and changed by a resource owner or administrator much like using a file system.

Fernet Encryption

While RSA encryption is useful for two way encryption, Fernet encryption is stronger and more useful for storing files. If a system does not offer encryption, tools such as PyCrypto offer Fernet encryption.

Storing Keys

Keys should not be stored in the open. If compromised, it is extremely easy to gain access to a key stored in plain text. Instead, tools such as Stack Exchange’s Blackbox store keys in a system backed by a GPG key ring.

Two Step Downloading for Extra Security

Downloading a file in a secure manner might require extra protection, particularly when an external but trusted user desires access to a resource. To avoid spoofing and avoid a compromised computer from gaining access to a resource, two step verification is a recommended step.

In this process the external user provides an access token to obtain a document which is verified. On verification, a text message containing an access code is sent to the external user and the internal user is notified of the access. The external user enters the code and, if required, the resource owner or admin approves the download.

This type of process is not difficult to implement through desktop or web applications using push notifications or persistent storage.

Conclusion

Secured yet accessible storage is a critical problem for any data analyst or scientist. Using an established IRM or implementing a similar tool helps secure access and empower analytics.

Better Key Storage With Blackbox, RSA, Redis, and the Fernet Algorithm in Django

safe

Python lacks a proper key store.  This is an unnerving issue when trying to build a secure application. More troubling is the plain text storage of RSA keys. This article examines a process for storing keys in an encrypted manner on black box as well as the storage of keys using the Fernet algorithm and encryption through RSA in Django using redis for speed.

Problem

Unlike Java which already has a key store, Python lacks the ability to store keys for data encryption. Python developers are left with only basic methods for storing keys and this often means doing so in plain text.

That method is inexplicably terrible when working with FERPA/HIPPA and especially the increasingly difficult state guidelines for storing sensitive information.

Solution

One solution, of many, is to use Stack Exchange black box to store keys and the Fernet algorithm to encrypt the keys in a cache. In this way keys are stored in an encrypted format in a hidden file as well as in a secure format in memory.

Black Box

Stack Exchange’s Black Box offers a perfect storage solution for keys using a gpg keyring to encrypt data. The tool was made to store secrets in a git repository.

Check out my Python API for reading files from black box. It is possible to add a user to the administrator file in order to avoid entering a password each time.

Storing Encrypted Keys in Django

Once the keys are encrypted and accessible, a large application needs to ensure speed. To help alleviate sluggishness, it is possible to store keys using the Fernet algorithm in any cache that Django provides.

It is possible to use the cryptography package for this task.

from cryptography.fernet import Fernet
from django.core.cache import cache

key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"my deep dark secret")
cache.set('my_token', token)

Conclusion

It is possible to recreate a secure keystore using a mix of Stack Exchange Black Box and the Fernet algorithm when creating a Django application. The implementation above may not be production ready but is a proof of concept.

Security in a Microservices Environment

immunity

Too often, security is overlooked in a system. It is not enough to separate hardware or merely have an SSL certificate. There are some tricks to helping build secure microservices.

This article examines some of the security mechanisms available for helping prevent unauthorized access to a system. Notice the title related to security as nothing is ever 100 percent secured.

Related Articles:

Discovering Sharable Resources in a Microservices Environment

Segmenting Microservices

General Principles

While this article does not cover every potential security measure, a few are presented.

In general, however, a security policy and the following implementation should:

  • cover the entire system as a unit and examine each component
  • provide mechanisms for common logging, bug tracking, and monitoring
  • be weary of data visibility
  • analyze your data
  • provide a mechanism for security auditing and recommendations
  • include penetration testing
  • include policies regarding self-checking, monitoring, and violations
  • make everyone paranoid (cheers!)

Self-Checking

The most vulnerable point in any organization is the employee. Security experts are good but Lola at the front desk probably doesn’t really care as much whether her password is cat or randomized into a million pieces. Of course, long obscure passwords were found to be about as secure as cat recently so maybe Lola is a tad on the efficient side.

It is wise to:

  • phish employees (offer to take them fishing)
  • attack your system with brute force and deploy other mechanisms for evil
  • check for backdoors
  • penetration test using other means

There have been software companies where standard passwords are never changed while proprietary information is routed through the system. Imagine having the ability to type guest and admin into a console, setup a RabbitMQ router and instantly have access to every proprietary real estate listing.

Hardware Separation

Hardware separation is critical for the success of a system. Externally facing applications should never be placed on the same hardware as sensitive services meant for internal consumption.

In general the following should be separated:

  • a web application
  • data storage
  • services accessing data storage
  • services maintaining sensitive information

Simplr Insites, LLC, of which I am 50 percent owner, separates each of these types of services. With the increasing scrutiny placed on IT firms regarding personal information, privacy, and planning, this is not just a good thing but required by law. Colorado’s recent educational privacy laws demand that a security plan be presented for approval before an organization becomes a partner with a school district.

Password Protection

The most obvious point is to protect services via password on any forward facing component.  That means utilizing the most appropriate form of protection and staying abreast of security vulnerabilities.

Blowfish is broken but pbkdf is a recommended algorithm. Make NIST a regular part of your reading material.

Licenses and Tokens

It is not enough to secure front end applications, any access within the system should be authorized. Giving external services tokens for access and utilizing appropriate oauth2 toolkits helps to make unauthorized access more difficult. Again, nothing is impossible.

There is a difference between a license and a token in our systems. A license is often used to give general access to a system. A token grants access to components in the system. In this way, we can revoke specific access rights without invalidating a token or invalidate access to a system altogether.

Always remember to store tokens and licenses correctly and not in plain text anywhere within the system. Using an encryption algorithm such as Fernet encryption is a good start.

Encryption

Some data needs to be protected. This includes identifying information and licenses. It also means obtaining a trusted SSL certificate for each microservice.

Knowing which forms of encryption are considered secure is yet another reason to support the Boulder hippie coalition at NIST. They may be about peace and free love but are still concerned for their privacy.

When secured data needs to be transmitted, store a RSA private key at an appropriate place on the service and ensure that the front end maintains the public key. When data needs to be stored, Fernet encryption might be more recommendable. Of course, be weary of the date this article was published.

Monitoring Networks

Monitoring networks for unusual behavior goes well beyond intuition and Sentry, Zabbix, or the Elastic APM (used as an application logger) these days. These tools are terrific and must be deployed to tackle breaches alongside a strong firewall.

However, advances in neural networks and pattern matching allow security analysts to find anomalies. It might be possible to use a LSTM to detect and block unwanted users in real time.

To this end, companies such as Dark Trace are promising.

A combined approach is recommended as someone will be able to trick any security system. If it is possible, it will happen. Do not drink the cool-aid regarding cyber-immunity but do notice the word play.

Accounting

Related to monitoring a network, log everything that is feasible. If a user accesses a service or causes a job to be scheduled, log that job. If a service generates an error, log the error.

The Elastic APM is a terrific tool for logging, monitoring and bug reporting. Zabbix is available for monitoring hardware. Other companies I have worked for utilize Sentry for bug reporting as well.

Conclusion

A multi-pronged approach to security is a necessity in a micro-services environment. A decent system bridges the gap between services, monitors all traffic as a whole, allows learned security experts to find issues, tracks bugs in a single location, and provides much more.

Private Immutable Configuration Hack in Python

Pyhton is notoriously not secure by default. However, it is still possible to generate a level of security through name mangling and other means. This article explains how to use name mangling, the singleton pattern, and class methods to create more secure access to configuration in Python3.

Singleton and Class Methods

Setting up a singleton in Python is simple:

class ChatConfig():

    __config = None

    class __Config:

        def __init__(self, config):
            self.config = config

    @classmethod
    def set_config(cls, config):
        if cls.__config is None:
            cls.__config = config


    @classmethod
    def get_config(cls):
        return cls.__config

 

The internal class and variable are mangled so as to make the variable itself private. This allows a single configuration to exist across the different packages while keeping the internal variable private and allowing for the variable to be set only once.

Conclusion

Python is not entirely insecure. It only takes code. This article offers an example of a way to set a variable once and share the setup among multiple packages.

Opinion: FERPA, HIPPA and State Compliance for Sensitive Data, Follow NIST Dammit!

nistident_fleft_300ppi

Legal frameworks for data security are needed now more than ever. Recognizing this long after the journalistic debate giving rise to FERPA and HIPPA, states are creating a range of protection. It is, in my opinion, a sad fact that legislatures are the force driving this in the wake of breaches at Target and the rise of the Silk Road.

Navigating these frameworks and avoiding being placed in front of your local school firing squad sputtering garbage is actually quite simple. Use NIST, follow it well, and add some common sense.

My state of Colorado recently enacted some of the toughest data protection laws in the country. The important pieces for tech companies are:

  • Partners need to be verified by the school. Third parties can be vouched for but should also seek verification
  • Data can only be used within the confines of the objective stated in the contract
  • Companies working with information should de-identify student data for external programs unless working with a trusted party and should only work under the confines of their contract with allowed parties
  • Information cannot be used for direct marketing purposes
  • A security plan must be presented, in full, and actually deployed
    • Use access roles, proper password practices; etc. There are a few companies I can think of that really don’t
  • Keep information in audit tables
  • A breech will result in a public hearing and intense scrutiny as well as a possible black-listing

Other states, Washington in particular, already have similar laws. Hawaii, a state we are working within through a partner, actually verifies organizations as research partners.

FERPA and HIPPA compliance is really no different:

  • Certain data cannot be given out even under FOIA
  • Protect against threats using reasonable practices
  • Use common sense

Nothing, of course is secure and nothing ever will be. We, as developers, strive to achieve a level of difficulty that dismays.

So, how do you protect data. The government answered that too through NIST:

NIST even maintains a set of acceptable hashes.

As always, remember that security is fluid. Use common sense, patch known breaches, don’t use your database system full of customer information to run your HVAC. You know, the basics.

Two Step Verification in a Flask REST App

flask

Flask is great. It is simple, easy, and allows for lightning fast deployment. However, there are a few security problems that should be worked out before using it in production.

This article examines how to deploy two step verificatiom and ip and mac address tracking alongside JWT tokens in Flask.

Code for this article is on my Github.

OS and Hardware Security

Software is just a series of electrons floating around the Internet. Fans, special devices for man in the middle attacks, and general human ignorance can all circumvent good practices.



Some things that should be done prior to development are:

  • Assign proper roles to users with appropriate security measures
  • Setup IP tables and other forms of firewall protection
  • Don’t randomly open ports to the world
  • Isolate unprotected devices from those handling highly secure data (a web server from your ETL servers for instance)
  • Ensure passwords are fairly secure (8-20 memorable characters avoiding certain others)
  • Use endpoint security such as RSA keys where appropriate

Proper Security

Like all things security, articles should not promote a version of encryption as secure or make claims using algorithms that could be rendered useless even as I write. All good algorithms sour

I can, however, provide a list of algorithms to not use:

  • bcrypt
  • blowfish

Remember, that all algorithms are usually broken. The US government currently lists AES as use-able and pbkdf can render sha512 useful. SHA512 is currently promoted as a good algorithm by NIST. 

JWT in Flask

JWT tokens are useful in that they store the information necessary to keep a user logged in. They are great for single page applications where session tracking might be in-appropriate. Know your use case.

A strong and configurable tool for implementing JWT keys in Flask is flask_jwt_extended which rides on the Flask-Security module.

Implementing JWT is fairly simple:

from flask import Flask
from flask_jwt_extended import JWTManager, jwt_required

app = Flask(__name__)
jwt = JWTManager(app)
jwt.init_app(app)

@app.route('/login', methods=['POST'])
def login():
    access_token = create_access_token(identity=username)
    return jsonify(access_token=access_token)

@app.route('/is_working')
@jwt_required
def is_working():
    return json.dumps({'Success': True}), 200, {'ContentType': 'application/json'}

It appears that Flask-Security was recently fixed so that password hashing works appropriately once again.

from flask_security import Security
from flask_security.utils import encrypt_password, verify_password

security = Security(app, datastore)
pwd = encrypt_password("test")
if verify_password("test", pwd):
    print("Verified")

Email Server

Before discussing two step verification, it is necessary to setup a test email server and be able to send emails. The smtplib offers the functionality of a web server in a simple configurable Python application. I personally printed out the input so will not post the code here. The Python docs are a good place to get started.

Sending emails can be done through smtplib or Flask-Mail. The smtplib library will be more flexible.

The following sets up a smptlib for sending an email:

import smtplib

....

host = email_config['host']
port = email_config['port']
email_server = smtplib.SMTP(host, port)
if email_config.get('ehlo', False):
    email_server.ehlo()
if email_config.get('start_tls', False):
    certfile = email_config.get('tls_cert', None)
    keyfile = email_config.get('tls_key', None)
    context = email_config.get('context', None)
    email_server.starttls(keyfile, certfile, context)
if email_config.get('user') and email_config.get('password'):
    user = email_config.get('user')
    password = email_config.get('password')
    email_server.login(user, password)
...
email_server.sendmail(recipient, [sender], msg.as_string())

Many different options are configurable using smtplib. These settings can be set using Flask-Mail but any code needed to help perform setup might be an issue.

Two Step Verification

It is now possible to extend the login function to include multi step authorization. The important pieces of the puzzle are obtaining an ip and/or mac address, verifying a password as shown, sending an email with a verification code, handling receipt of the code, and persistence.

Most of this is shown in my own open source project. This code uses uuid to generate a unique code:

import uuid
...
code = uuid.uuid4()

This code is hashed as before and stored using SQLAlchemy.

The basic process followed in my Github code is:

  1. Use login() to retrieve the JWT key and check for a matching mac address and ip
  2. Send an email verification code as needed
  3. Through verify_ip_code and verify_mac_code the code is validated and databases updated

The login function contains the majority of calls for two step verification.

Conclusion

This article examined the basics required to create two step verification in Python using Flask using examples and code from my Github repository.

It is important to use the most up to date algorithms. This article made no attempt to recommend an encryption algorithm.