Running Django Tests Without Migrations

python_and_mouse

Django is a strong framework for building reliable websites. However, it is showing signs of age. Particularly, it is failing where tools such as Spring Boot excel.

This article examines how to overcome yet another issue encountered in Django, running tests without migrating databases.

Use Case

There are many times when a database migration is inappropriate for testing:

  • we have separated test, development, and production databases with custom features requiring testing
  • there are fixtures pre-loaded into a database
  • we have legwork to complete through scripts alongside database setup
  • dropping and re-creating a database eliminates data that is absolutely necessary to another function in our test environment
  • nothing we do can be included in a TestRunner, at least not reliably

Sadly, Django developers, among other issues they refuse to fix, were not willing to create a workaround until recently. The workaround does not work in the latest version.

Django Unit Test Framework

The Django unit test framework comes with many features typically reserved for tools such as ghost.py or selenium.

For instance, Django provides:

from django.test import TestCase
from django.test import Client
from django.test import RequestFactory

A RequestFactory is used to perform specific tests while the Client acts as a browser would.

While there is an option to use pytest, the changing nature of Django manages to continually break this tool. Importing models from different applications using multiple databases is a broken process at the time of publication.

Unittest Format

Django provides an extensive tutorial for setting up unit tests. The setup is straightforward:

from django.test import TestCase
from django.test import Client
from django.test import RequestFactory


class AdminTestCases(TestCase):
    multi_db = True

    def setUp(self):
        self.client = Client()
        self.rf = RequestFactory()

    def test_create_superuser(self):
        pass

    def test_remove_superuser(self):
        pass

    def test_fail_create_superuser(self):
        pass

    def test_fail_remove_superuser(self):
        pass

    def tearDown(self):
        pass

This skeleton prepares  a client and request factory, generates tests, and provides a tear down function. Setup and tear down appear to run before and after each test.

Multiple Databases

For security and other purposes, it is often necessary to separate databases. This offers a degree of protection for sensitive information.

Unfortunately, this also presents an issue with the tests in which a single database is used. To avoid this, each test case must include the following:

multi_db = True

This line informs the the test runner to search for different databases.

Ignoring Migrated Databases

Now, for the meat and potatoes. We want to ignore database migration altogether at times. It is still wise to have a script and drop any unused data.

Django allows for the generation of custom TestRunner classes. As of Django > 2.0. the following runner avoids migrating a database:

from django.test.runner import DiscoverRunner


class NoMigrationTestRunner(DiscoverRunner):
  """ A test runner to test without database creation """

  def setup_databases(self, **kwargs):
    """ Override the database creation defined in parent class """
    pass

  def teardown_databases(self, old_config, **kwargs):
    """ Override the database teardown defined in parent class """
    pass

The NoMigrationTestRunner extends the DiscoverRunner and directly overwrites the setup_databases and teardown_databases methods. These methods are used to setup certain connections, create database tables, and perform cleanup.

In this example, the setup and teardown methods are left blank to avoid creating new tables as well as to allow for the smooth usage of multiple databases.

Django is informed of the new test runner through a variable in the relevant settings configuration:

TEST_RUNNER = 'path_to.NoMigrationTestRunner'

Execution

Tests are executed using manage.py:

python manage.py test
python<version> manage.py path.to.tests.TestModule --settings=<settings_file>

The first example is generic and collects tests found through our DiscoverRunner. The second command runs tests using a specific python version and a path to the relevant module which is imported using this string. A settings file was also provided in the second command.

Conclusion

Django has extensive documentation. However, as the framework changes, certain aspects will break. This article covered how to setup tests for multiple databases, not migrate databases in test, and how to execute our tests.

Automating Django Database Re-Creation on PostgreSQL

border_wall

photo: Oren Ziv/Activestills.org

The Django database migration system is a mess. This mess is made more difficult when using PostgreSQL.

Schemas go unrecognized without a work around, indices can break when using the work around, and processes are very difficult to automate with the existing code base.

The authors of Django, or someone such as myself who occasionally finds the time to dip in and unclutter a mess or two, will eventually resolve this. However, the migration system today is a train wreck.

My current solution to migrating a Postgres based Django application is presented in this article.

Benefits of Getting Down and Dirty

The benefits of taking the time to master this process for your own projects are not limited to the ability to automate Django. They include the ability to easily manage migrations and even automate builds.

Imagine having a DDL that is defined in your application and changed with one line of code without ever touching your database.

Django is a powerful and concurrent web framework with an enormous amount of add-ons and features. Mastering this task can open your code base to amazing amounts of abstraction and potential.

Well Defined Problems

While Django is powerful, integration with Postgres is horrible. A schema detection problem can cause a multitude of problems as can issues related to using multiple databases. When combined, these problems sap hours of valuable time.

The most major issues are:

  • lack of migration support for multiple databases
  • lack of schema support (an issue recognized over 13 years ago)
  • some indices (PostGIS polygon ids here) break when using the schema workaround

Luckily, the solutions for these problems are simple and require hacking only a few lines of your configuration.

Solving the Multiple Database Issue

This issue is more annoying than a major problem. Simply obtain a list of your applications and use your database names as defined by your database router in migration.

If there are migrations you want to reset, use the following article instead of the migrate command to delete your migrations and make sure to drop tables from your database without deleting any required schemas.

Using python[version] manage.py migrate [app] zero  did not actually drop and recreate my tables for some reason.

For thoroughness, discover your applications using:

python[verion] manage.py showmigrations

With your applications defined, make the migrations and run the second line of code in an appropriate order for each application:

python[version] manage.py makemigrations [--settings=my_settings.py]
python[version] manage.py migrate [app] --database= [--settings=my_settings.py]
....

As promised, the solution is simple. The –database switch matches the database name in your configuration and must be present as Django only recognizes your default configuration if it is not. This can complete migrations without actually performing them.

Solving the Schema Problem without Breaking PostGIS Indices

Django offers terrific Postgres support, including support for PostGIS. However, Postgres schemas are not supported. Tickets were opened over 13 years ago but were merged and forgotten.

To ensure that schemas are read properly, setup a database router and add a database configuration utilizing the following template:

"default": {
        "ENGINE": "django.contrib.gis.db.backends.postgis",
        "NAME": "test",
        "USER": "test",
        "PASSWORD": "test",
        "HOST": "127.0.0.1",
        "PORT": "5432",
        "OPTIONS": {
            "options": "-csearch_path=my_schema,public"
        }
 }

An set of options append to your dsn including a schema and a backup schema, used if the first schema is not present, are added via the configuration. Note the lack of spacing.

Now that settings.py is configured and a database router is established, add the following Meta class to each Model:

    class Meta():
        db_table=u'schema\".\"table'

Notice the awkward value for db_table. This is due to the way that tables are specified in Django. It is possible to leave managed as True, allowing Django to perform migrations, as long as the database is cleaned up a bit.

If there are any indices breaking on migration at this point, simply drop the table definition and use whichever schema this table ends up in. There is no apparent work around for this.

Now Your Migration Can Be Run In a Script, Even in a CI Tool

After quite a bit of fiddling, I found that it is possible to script and thus automate database builds. This is incredibly useful for testing.

My script, written to recreate a test environment, included a few direct SQL statements as well as calls to manage.py:

echo "DROP SCHEMA IF EXISTS auth CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP SCHEMA IF EXISTS filestorage CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP SCHEMA IF EXISTS licensing CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP SCHEMA IF EXISTS simplred CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP SCHEMA IF EXISTS fileauth CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP SCHEMA IF EXISTS licenseauth CASCADE" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "DROP OWNED BY simplrdev" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS auth" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS filestorage" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS licensing" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS simplred" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS licenseauth" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
echo "CREATE SCHEMA IF NOT EXISTS fileauth" | python3 ./app_repo/simplred/manage.py dbshell --settings=test_settings
find ./app_repo -path "*/migrations/*.py" -not -name "__init__.py" -delete
find ./app_repo -path "*/migrations/*.pyc"  -delete
python3 ./app_repo/simplred/manage.py makemigrations --settings=test_settings
python3 ./app_repo/simplred/manage.py migrate auth --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate admin --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate registration --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate sessions --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate oauth2_provider --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate auth_middleware --settings=test_settings --database=auth
python3 ./app_repo/simplred/manage.py migrate simplredapp --settings=test_settings --database=data
python3 ./app_repo/simplred/manage.py loaddata --settings=test_settings --database=data fixture

Assuming the appropriate security measures are followed, this script works well in a Bamboo job.  My script drops and recreates any necessary database components as well as clears migrations and then creates and performs migrations. Remember, this script recreates a test environment.

The running Django application, which is updated via a webhook, doesn’t actually break when I do this. I now have a completely automated test environment predicated on merely merging pull requests into my test branch and ignoring any migrations folders through .gitignore.

Conclusion

PostgreSQL is powerful, as is Django, but combining the two requires some finagling to achieve maximum efficiency. Here, we examined a few issues encountered when using Django and Postgres together and discussed how to solve them.

We discovered that it is possible to automate database migration without too much effort if you already know the hacks that make this process work.

 

Better Key Storage With Blackbox, RSA, Redis, and the Fernet Algorithm in Django

safe

Python lacks a proper key store.  This is an unnerving issue when trying to build a secure application. More troubling is the plain text storage of RSA keys. This article examines a process for storing keys in an encrypted manner on black box as well as the storage of keys using the Fernet algorithm and encryption through RSA in Django using redis for speed.

Problem

Unlike Java which already has a key store, Python lacks the ability to store keys for data encryption. Python developers are left with only basic methods for storing keys and this often means doing so in plain text.

That method is inexplicably terrible when working with FERPA/HIPPA and especially the increasingly difficult state guidelines for storing sensitive information.

Solution

One solution, of many, is to use Stack Exchange black box to store keys and the Fernet algorithm to encrypt the keys in a cache. In this way keys are stored in an encrypted format in a hidden file as well as in a secure format in memory.

Black Box

Stack Exchange’s Black Box offers a perfect storage solution for keys using a gpg keyring to encrypt data. The tool was made to store secrets in a git repository.

Check out my Python API for reading files from black box. It is possible to add a user to the administrator file in order to avoid entering a password each time.

Storing Encrypted Keys in Django

Once the keys are encrypted and accessible, a large application needs to ensure speed. To help alleviate sluggishness, it is possible to store keys using the Fernet algorithm in any cache that Django provides.

It is possible to use the cryptography package for this task.

from cryptography.fernet import Fernet
from django.core.cache import cache

key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"my deep dark secret")
cache.set('my_token', token)

Conclusion

It is possible to recreate a secure keystore using a mix of Stack Exchange Black Box and the Fernet algorithm when creating a Django application. The implementation above may not be production ready but is a proof of concept.

Encrypting Data in Django with the Fernet Algorithm

lock

At some point, it will be necessary to encrypt data. While most queries are performed raw, there is still a use for the models Django provides in encrypting and decrypting data or even just in obtaining a model from a raw query.

This article examines how to apply the Fernet algorithm to save data in an encrypted format using Django.



 

Fernet Encryption

Fernet encryption utilizes the AES method at its core. This method is widely accepted and more powerful than RSA when there is no need for communication.

RSA is a powerful tool when requiring that data be passed over the wire. In this example, we are more concerned with data storage.

Secret Key

The Fernet algorithm requires using a secret key to store data.

from cryptography.fernet import Fernet

Fernet.generate_key()

This key should be stored in a secure fashion. Options for retrieving the key include loading the key from a file passed through an environment variable.

Encrypted Field

Django provides an excellent tutorial for writing custom fields. By overwriting the from_db_value and get_db_prep_value methods it is possible to achieve decryption and encryption respectively.

HOME = str(Path.home())


class EncryptedFernetField(models.TextField):
    """
    A field where data is encrypted using the Fernet Algorithm
    """

    description = "An encrypted field for storing information using Fernet Encryption"

    def __init__(self, *args, **kwargs):
        self.__key_path = os.environ.get('FERNET_KEY', None)
        if self.__key_path is None:
            self.__key_dir = os.environ.get('field_key_dir', None)
            if self.__key_dir is None:
                self.__key_dir = os.path.sep.join([HOME, 'field_keys'])
                if os.path.exists(self.__key_dir) is False:
                    os.mkdir(self.__key_dir)
            key = Fernet.generate_key()
            with open(os.path.sep.join([self.__key_dir, 'fernet.key']), 'w') as fp:
                fp.write(key.decode('utf-8'))
            self.__key_path = os.path.sep.join([self.__key_dir, 'fernet.key'])
        super().__init__(*args, **kwargs)

    def deconstruct(self):
        name, path, args, kwargs = super().deconstruct()
        return name, path, args, kwargs

    def from_db_value(self, value, expression, connection):
        if self.__key_path and value:
            value = base64.b64decode(value)
            with open(self.__key_path, 'r') as fp:
                key = fp.read()
            f = Fernet(key)
            value = f.decrypt(value)
        return value

    def to_python(self, value):
        return value

    def get_db_prep_value(self, value, connection, prepared=False):
        if value:
            if self.__key_path:
                with open(self.__key_path, 'r') as fp:
                    key = fp.read().encode()
                f = Fernet(key)
                value = f.encrypt(value.encode())
                value = base64.b64encode(value).decode('utf-8')
        return value

Encrypted values are stored in base 64 to avoid potential byte related issues. This class utilizes the models.TextField to save data as a blob.

Using the Field

The field is used in the same way as any other.

class License(models.Model):
    username = models.CharField(max_length=1024)
    application = models.CharField(max_length=1024)
    unique_id = models.IntegerField(unique=True)
    license = EncryptedFernetField()
    active = models.BooleanField(default=False)
    expiration_date = models.DateField()

    class Meta:
        unique_together = (('username', 'application'), )

 

Conclusion

Data encryption is fairly straightforward in Django. This tutorial examined how to create an encrypted field using the Fernet algorithm.

WebFlow Designer: Mediocre at Best

disappointed-face_1f61e

Web Flow is one of several visual HTML editors to come on the market in the hopes of replacing Komodo or Dream Weaver. While promising, it may be best to wait for Framer X before purchasing a visual editor.

Since the creation of Wix and the corresponding Wix editor, the idea of giving designers a no-code way to develop web pages has taken root in the development community. The reason such tools have not gained much in the way of popularity is easily apparent with Web Flow.

To be certain, Web Flow contains quite a bit of promise as both a design tool and CMS for simpler websites. The live content editors in particular are a nice to have for teams of any size.

The pros of Web Flow can be summed up as follows:

  • simple, fast, and live development for simple websites
  • basic editing support
  • basic script and style support through an embed tag
  • can be cheaper than a tool such as Dream Weaver for smaller organizations
  • can connect to article-style and other forms of data not requiring heavy back-end support

Just from the pros, it is obvious that a visual editor is perfect for the rapid prototyping associated with the design sprint concept. This is also not a difficult tool to create when considering the current power of Javascript and JQuery, especially when supporting a Mozilla based browser.

For each of the positive characteristics of Web Flow, there is at least one downside:

  • only allows style and script editing in an embed tag
  • contains almost no support for external CSS files
  • cannot be installed on site and does not use a company database
  • fails to provide advanced CSS features beyond a box shadow or border radius
  • class editing cannot be done separately from the generation of an element
  • a parent element must first be created to generate a new class
  • promotes sloppy, terrifying CSS that a middle-ware developer will quit over
  • is not drag and drop and has almost no such support in the editor

Basically, Web Flow and even tools such as Framer area far from ready for the production of modern websites.  Still, if you need to give a designer access to any form of tool that allows for inheritance based web editing or need slightly more than Wikipedia or Wix, Web Flow is decent. Still, Framer X promises to be a much better tool.

This tool, in the current form, is at best a 5/10. It is an average effort from what I hope is not an arrogant fool as can be the case with people who don’t want to actually work on their product that I at least hope will continue to improve to an 8 or 9 of 10.

 

 

 

Differences in Tablets and Browsers Over the Last Few Generations

We all have to do this at some point, create an all encompassing, GUI program that works between the different generations of IPad and even up to 1920 x insane. Increasingly, as of 2018, it doesn’t seem to matter whether you are working in the back or front end, at some point you are creating some sort of front end. Making matters more complicated, a slew of useful and enticing features are starting to become available.

This article is geared towards examining the increasing scale of screen widths in tablets and the differences in available features over time.

Features

Browser features have increasingly grown more powerful. Just a few newer features are:

The links above lead to the MDN pages including browser support. In general, Microsoft Edge or Explorer nearly never supports modern features. However, Microsoft’s browser usage is dropping steadily with Edge taking less than 3 percent of the market and Explorer taking less than 10 percent.

Solid, responsive web applications can be built for Opera, Safari, Chrome, and Firefox without entirely alienating all users but with a slightly outmoded design using CSS hacks or even web frameworks through header parsing.

Firefox, Chrome, and Safari continue to lead the pack in terms of support for newer features.

Device Usage

In terms of devices, the variety of tablet brands is growing. This is leading to a growth in the use of Android devices. This will mean that browsers such as Chrome and Firefox will grow increasingly popular.

Apple continues to lose market share while Microsoft is gaining ground. This could be due to the poor practices of Apple in relation to copyrights and development.


Reactive Trend

The overall trend is towards responsive web development. Each page scales to nearly any resolution required with the exception of mobile which requires a separate site.

This trend carries towards features as well. The days of using stateful web design, jumping from page to page, are nearly dead. Each page is practically a

Future development and certainly my own are also abandoning the typical grid system. Divisions can now become polygons. It is possible to draw complex shapes with d3js.

Web Applications in Unexpected Places

Anything and everything is becoming possible. Web applications are achieving the same power as desktop and mobile applications. However, security will still be a concern.

This power includes on the monitoring side of the IOT sphere. With the ability to launch a browser through tools such as QT and PyQT,  web applications will start showing up in unexpected places.

Consequently, web sockets, RTC, MQPP, and XMPP  will likely grow in popularity.


Screen Sizes

Screens have grown in resolution at the low and high end as expected. Since 2012, screen resolutions have grown from 960 x 720 for an IPAD to 1024 x 768 as well as up to 1366 x 768 on desktop, up from 1280 x 800.

A basic website should be safe coding, in 2018 using:

  • 960 x 720: Older Generation IPads
  • 1024 x 768: Newer Generation of IPads and older desktop screens
  • 1280 x 800: For older screens
  • 1366 x 768: The most common resolution of 2018
  • 1920 x 1080: The future most popular resolution

To stay relevant, plan on coding for both Ipad resolutions as low as 960 x 702 and desktop resolutions as high as 1920 x 1080. My own sites split this resolution into 7 different tiers.

Of course, create a mobile site as well at about 420 width.

The tiers I use are:

  • 900 – 999 width
  • 1000 – 1199 width
  • 12000 – 12299 width (some computers have bizarre resolutions in this range)
  • 1300-1399 width
  • 1400-1599 width
  • 1600 – 1799 width
  • 1800+ width

Frameworks

Frameworks have not changed much but have gotten better. Django now supports channels, Spring has been fully coupled with boot for some time, and Flask is becoming more secure.

More interestingly, React has gained ground due to the power it maintains in building responsive applications. This is really nothing new.

Personally, I utilize Django these days with Flask for micro-services. This allows me to  maintain as single stack for both my front and back-end which utilizes Celery,  Thespian, PyTorch, and Python’s other powerful data tools.

 

””

Two Step Verification in a Flask REST App

flask

Flask is great. It is simple, easy, and allows for lightning fast deployment. However, there are a few security problems that should be worked out before using it in production.

This article examines how to deploy two step verificatiom and ip and mac address tracking alongside JWT tokens in Flask.

Code for this article is on my Github.

OS and Hardware Security

Software is just a series of electrons floating around the Internet. Fans, special devices for man in the middle attacks, and general human ignorance can all circumvent good practices.



Some things that should be done prior to development are:

  • Assign proper roles to users with appropriate security measures
  • Setup IP tables and other forms of firewall protection
  • Don’t randomly open ports to the world
  • Isolate unprotected devices from those handling highly secure data (a web server from your ETL servers for instance)
  • Ensure passwords are fairly secure (8-20 memorable characters avoiding certain others)
  • Use endpoint security such as RSA keys where appropriate

Proper Security

Like all things security, articles should not promote a version of encryption as secure or make claims using algorithms that could be rendered useless even as I write. All good algorithms sour

I can, however, provide a list of algorithms to not use:

  • bcrypt
  • blowfish

Remember, that all algorithms are usually broken. The US government currently lists AES as use-able and pbkdf can render sha512 useful. SHA512 is currently promoted as a good algorithm by NIST. 

JWT in Flask

JWT tokens are useful in that they store the information necessary to keep a user logged in. They are great for single page applications where session tracking might be in-appropriate. Know your use case.

A strong and configurable tool for implementing JWT keys in Flask is flask_jwt_extended which rides on the Flask-Security module.

Implementing JWT is fairly simple:

from flask import Flask
from flask_jwt_extended import JWTManager, jwt_required

app = Flask(__name__)
jwt = JWTManager(app)
jwt.init_app(app)

@app.route('/login', methods=['POST'])
def login():
    access_token = create_access_token(identity=username)
    return jsonify(access_token=access_token)

@app.route('/is_working')
@jwt_required
def is_working():
    return json.dumps({'Success': True}), 200, {'ContentType': 'application/json'}

It appears that Flask-Security was recently fixed so that password hashing works appropriately once again.

from flask_security import Security
from flask_security.utils import encrypt_password, verify_password

security = Security(app, datastore)
pwd = encrypt_password("test")
if verify_password("test", pwd):
    print("Verified")

Email Server

Before discussing two step verification, it is necessary to setup a test email server and be able to send emails. The smtplib offers the functionality of a web server in a simple configurable Python application. I personally printed out the input so will not post the code here. The Python docs are a good place to get started.

Sending emails can be done through smtplib or Flask-Mail. The smtplib library will be more flexible.

The following sets up a smptlib for sending an email:

import smtplib

....

host = email_config['host']
port = email_config['port']
email_server = smtplib.SMTP(host, port)
if email_config.get('ehlo', False):
    email_server.ehlo()
if email_config.get('start_tls', False):
    certfile = email_config.get('tls_cert', None)
    keyfile = email_config.get('tls_key', None)
    context = email_config.get('context', None)
    email_server.starttls(keyfile, certfile, context)
if email_config.get('user') and email_config.get('password'):
    user = email_config.get('user')
    password = email_config.get('password')
    email_server.login(user, password)
...
email_server.sendmail(recipient, [sender], msg.as_string())

Many different options are configurable using smtplib. These settings can be set using Flask-Mail but any code needed to help perform setup might be an issue.

Two Step Verification

It is now possible to extend the login function to include multi step authorization. The important pieces of the puzzle are obtaining an ip and/or mac address, verifying a password as shown, sending an email with a verification code, handling receipt of the code, and persistence.

Most of this is shown in my own open source project. This code uses uuid to generate a unique code:

import uuid
...
code = uuid.uuid4()

This code is hashed as before and stored using SQLAlchemy.

The basic process followed in my Github code is:

  1. Use login() to retrieve the JWT key and check for a matching mac address and ip
  2. Send an email verification code as needed
  3. Through verify_ip_code and verify_mac_code the code is validated and databases updated

The login function contains the majority of calls for two step verification.

Conclusion

This article examined the basics required to create two step verification in Python using Flask using examples and code from my Github repository.

It is important to use the most up to date algorithms. This article made no attempt to recommend an encryption algorithm.

Using JCEF in Eclipse

After some serious issues with Selenium and with a need to stay up on the latest and greatest to ensure that I do not fall behind in running test-like scripts, I came across Chromium Embedded Extensions. Having most of my tools in java, I decided to install and use JCEF. This proved to be a bit of a nightmare. Hopefully, the following post will alleviate problems for users.

Before continuing, I did receive extensive support from MaGreenblatt (a cheif contributor and maybe the lead for JCEF).

Python PDF 3: Writing With HTML and XML

Alas, I have discovered the potent mixture of Jinja, weasyprint and Pandas. Mixing these tools with matplotlib and Python image modules yields a way to write PDF documents with relative ease and with the styling help of HTML. It would also be able to use a tool like xmltopdf for generating pdf files from XML. Previous Posts dealt with this using a more complicated tool, PyPDF2.

A Basic HTML Template

In this tutorial, I am using jinja to create tables. My tables will not have much in the way of styling but it is also possible to add styles with jinja or by using a tool such as Django-Tables2. Both tools are incredibly similar to the Django platform.

A template is needed in order to generate HTML pages for conversion to pdf format. Jinja follows a basic format with double curly braces used to mark where items are entered encapsulating the title of the property.


<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<title>{{ title }}</title>
</head>
<body>
<div>
<h1>Weekly Summary Report</h1>
{{ summary_pivot_table }}
</div>

<div>
<h1>Frequency Report</h1>
{{ frequency_table }}
</div>

<div>
<h1>Weekly Source Reports</h1>
{{ source_pivot_table }}
</div>
</body>
</html>

In this case, there is a title and three reports. It would be easy to add CSS tags and generate different styles using the division tags. These will be converted by weasyprint later.

Writing to the Template

Writing to a template with Jinja requires using the dictionary data structure.

adminVars={"title":"Weekly Statistics","frequency_table":freqFrame.to_html(),"summary_pivot_table":sframe.to_html(),"source_pivot_table":tframe.to_html()}

Generating Data

Generating data is simple with Pandas. This is especially true with databases. One only needs to connect to a database using a SQLAlchemy engine and perform any necessary query. It is also possible to concatenate as many queries as necessary to generate a table.

import sqlalchemy
import pandas

#create alchemy engine      

dsn='postgresql+psycopg2://'+cfp.getvar("db","user","string")+":"+cfp.getVar("db", "passw","string")+"@"+cfp.getVar("db","host","string")+":"+cfp.getVar
("db","port","string")+"/"+cfp.getVar("db","dbname","string")

engine=sqlalchemy.create_engine(dsn)
        
#get totals table
query=cfp.getVar("sql","weekly_totals_complete","string")
tframe=pandas.read_sql_query(query,engine)

Concatenation is not difficult either using the concat function.

pandas.concat([pandas.read_sql_query(query,engine) for query in tables])

New columns will be generated with NaN values.

Performing Basic Operations on Dataframes

Performing operations on dataframes is easy with numpy or scipy.

import numpy

#operate on tframe from above
tframe.apply(numpy.average,axis=0)

Dataframes themselves have operations that can be formed on them and use numpy.

#tframe from above
tframe.mean()

A list of operations is provided in the Pandas documentation.

More complicated operations may require unpacking the values or using generator functions

Using Weazy Print

Once the resources and template are prepared, simply call on weazy print to convert the html resulting from the template to a PDF.

An extra import is needed to fetch resources such as images from links embedded within the url.

Otherwise, generate a pandas data frame, conver the frame to html and place as the value attached to the appropriate template key in your dictionary and then convert. The example code uses SQLAlchemy to fetch resources from a PostgreSQL database.

from crawleraids.ConfigVars import Config
from jinja2 import Environment,FileSystemLoader
import pandas
from weasyprint import HTML,default_url_fetcher
import sqlalchemy

def fetchURL(url):
   '''
   Provide a resource obtainer for getting urls to weazy print
   '''
   return weasyprint.default_url_fetcher(url)


def generatePDF(fpath):
       '''
       Generate the pdf.
       '''
       #create alchemy engine
       cfp=Config(fpath)
       
dsn='postgresql+psycopg2://'+cfp.getvar("db","user","string")+":"+cfp.getVar("db", "passw","string")+"@"+cfp.getVar("db","host","string")+":"+cfp.getVar("db","port","string")+"/"+cfp.getVar("db","dbname","string")
        engine=sqlalchemy.create_engine(dsn)
        
        #get totals table
        query=cfp.getVar("sql","weekly_totals_complete","string")
        tframe=pandas.read_sql_query(query,engine)
        
        #get summary stats table
        query=cfp.getVar("sql","weekly_summary_complete","string")
        sframe=pandas.read_sql_query(query,engine)
        
        #get frequencies
        query=cfp.getVar("sql","weekly_frequency","string")
        freqFrame=pandas.read_sql_query(query,engine)
         
        #get the resource loader 
        env=Environment(loader=FileSystemLoader('.'))
        template=env.get_template("aboveTemplate.html")
         
       #fill the template
       vars={"title":"Weekly Statistics","frequency_table":freqFrame.to_html(),"summary_pivot_table":sframe.to_html(),"source_pivot_table":tframe.to_html()}
       html=template.render(vars)
       
       #use weazy print to convert to pdf
       HTML(string=html).write_pdf(target="/reports/report.pdf",stylesheets=cfp.getVar("style","aboveTemplate","string"))

All of the power of Pandas is now at the disposal of the programmer along with anything that can be embedded in a url.

Generating and Saving Graphs with Pandas and Matplotlib or PyPlot

It is possible to embed graphs into a pdf by saving them as images.

Obviously, the folks behind pdf allow most things made of bytes to be placed in objects in a PDF (a pdf is a series of pdf objects much like xml with byte strings in base 64 as the text). See my magic numbers post and try to parse or write your own image to a pdf if you really want to dive into the subject.

Generating graphs is simple Pandas. Just make sure to match the template graph with the image url.

import matplotlib.pyplot as plt

data=[[1,3,5,2],[1,3,4]] #perform operations on the data to transform the graph. Each array is a new plot line.
df = DataFrame(data,columns=[['PlotA','PlotB']))
fig=df.plot()
fig=fig.get_figure()
fig.savefig('graph.png') #also can save as own pdf to be merged as described in an earlier post

It is possible to do this directly with pyplot as well.

Using Flask to Create a PDF Web Server

It appears from comments and questions that pfd servers are often a request. The clunkiness of Spring can now be replaced easily with the combination of the mentioned tools and the Flask web framework. These tools allow for the quick and easy creation of a pdf web server. However, asyncore with socket, Spring with Java based tools, or other tools will need to be run if the plan is to use something akin to the proxy pattern, a sad state of affairs.

To create the server, simply create a method with an annotation specifying the path, much as would happen in spring.

from flask import Flask
try:
   from cStringIO import StringIO
except:
   import StringIO

from flask import send_file
app = Flask(__name__)

def otherFunc():
    pass

@app.route("/")
def generatePDF():
    #code to generate PDF........
    StringIO(pdf)
    return send_file(pdf, attachment_filename='file.pdf')

if __name__ == "__main__":
    app.run()

Weasyprint also includes a way to incorporate pre-generated pdfs from within the same application.