The Case for Using an IRM to Scale Data Intake

Among many, there are three major problems faced by an analyst before data is useful:

  • data aggregation and storage
  • data security and access
  • data wrangling (ETL/ELT)

This article deals with data security and access using an information resource management system, IRM. My own company, Simplr Insites LLC, is writing such a system alongside a file storage solution in an effort to modernize the research process.

Problem

One significant problem faced in research and cooperation is the attainment of clean and useful data. Obtaining this data often means gaining access to systems, forming legal agreements, obfuscating certain data, and embarking on the painful process of data wrangling.

While ETL and ELT are critical steps, just obtaining sensitive data, even from within an organization, is tricky. Consider the following cases related directly to access:

  • data sets include confidential information
  • data sets are ensnared in legal agreements regarding who can access data
  • users want to control access to data to ensure it is not misused
  • external users are allowed varying degrees of access

IRM as a Solution

Oracle generated a solution that attempts to tackle the data security issue. The Oracle IRM documentation provides a rather informative graphical overview of their tool:

irm

In this system, an external user accesses a load balanced IRM server application which controls rights and access to different resources and files. Several firewalls help to improve security along with authentication, access grants, and encryption. Web services  and internal users utilize the IRM server as well.

Beyond the visible components, tokens can be used to instantly manage resources and propagate access changes.

Most file systems also offer the capability to pull the date when a resource was created or modified and various permissions information. This is useful for logging purposes.

Setting Up an IRM

It is not necessary to rely on Oracle for an IRM solution. In fact, the Oracle IRM only works with Microsoft Windows.

Each component can be paired with a reliable tool, most of which I have blogged about. A set of pairings might include

Base Application and Resource Management Django with Secure Login
REST API Resource Access Django OAuth Toolkit
Access Management Django Oauth Toolkit and a Database System
Individual Resource Tokens Randomly Generated and Hashed Key
File Storage GlusterFS or an Encrpytable File System
Encryption of Resources PyCrypto or a Similar Tool
Firewalls IP Tables or another firewall
Two Step Verification through SMS Twilio
Key Storage Stack Exchange Blackbox
VPN Access Firefox
Logging and Anomaly Detection Elastic APM and the ElkStack

Logging

Logging is critical to security. Logs allow administrators to spot harmful activity, generate statistical models based on usage, and aid in auditing the system.

Tokens

Tokens are a perfect solution for controlling document access in the system. They allow a user to gain access to a document, offer scopes for access, and often contain scopes that grant levels of access to a resource.

A user should be required to log in to the application to retrieve a token which refreshes on a regular schedule. These tokens can be revoked and changed by a resource owner or administrator much like using a file system.

Fernet Encryption

While RSA encryption is useful for two way encryption, Fernet encryption is stronger and more useful for storing files. If a system does not offer encryption, tools such as PyCrypto offer Fernet encryption.

Storing Keys

Keys should not be stored in the open. If compromised, it is extremely easy to gain access to a key stored in plain text. Instead, tools such as Stack Exchange’s Blackbox store keys in a system backed by a GPG key ring.

Two Step Downloading for Extra Security

Downloading a file in a secure manner might require extra protection, particularly when an external but trusted user desires access to a resource. To avoid spoofing and avoid a compromised computer from gaining access to a resource, two step verification is a recommended step.

In this process the external user provides an access token to obtain a document which is verified. On verification, a text message containing an access code is sent to the external user and the internal user is notified of the access. The external user enters the code and, if required, the resource owner or admin approves the download.

This type of process is not difficult to implement through desktop or web applications using push notifications or persistent storage.

Conclusion

Secured yet accessible storage is a critical problem for any data analyst or scientist. Using an established IRM or implementing a similar tool helps secure access and empower analytics.

Agile is Flexible: Deploying Agile in a Team Environment

Agile is not strict. In fact, agile is meant to be flexible. The concept comes with choices. Does your team use Scrum or Kanban? Are your retrospectives long or short?

This article examines the deployment of Agile, knowledge gleaned from deployment in multiple organizations.



Agile

The principles of agile are meant to provide flexibility and understanding. A few telling signs of this are forthcoming in the agile manifesto, which promotes:

  • simplicity by maximizing efficiency at work
  • early and continuous release of product
  • cross-organizational communication
  • working software as the primary sign of progress
  • embracing change

Deployment Headaches

The agile concepts and practices are some of the easiest to learn but maximizing efficiency does not mean having seven hours of meetings. It does not mean creating an unproductive comradery either.

Some of the pitfalls of deployment include:

  • meetings that eat away at productivity rather than help due to their length
  • a barrage of questions from various project managers that further eat away at productivity
  • stakeholder meetings that degrade into requirements analysis if there is no product to release yet
  • tickets that move into a Sprint or up a Kanban board in a way that slows the process down

Headache Exemplified

Issues in deployment can cause a backlog to grow uncontrollably. The amount of work can grow if agile is not deployed correctly and flexibly.

For instance, at a recent company I worked for,  the daily barrage of questions and meetings for the better part of a morning or day caused a significant slowdown in release times. The backlog grew by up to twenty tickets each week as pressure to complete tickets created more bugs.

Mounting pressure nearly created legal problems. In fact, as a new developer, I was the scapegoat for at least one problem, having a ticket offloaded onto myself only after a client demanded resolution and threatened a lawsuit.

Alleviating the Headaches

It is possible to alleviate the pain of deployment with a few simple measures including:

  • finding effective project managers and team leads for each team that are the sole source from which questions and tickets are generated
  • keeping meeting times small and encourage team members to find problems and solve them by embracing analysis of problems and their solutions at the appropriate meeting (some organizations look at an attempt to find the source of an issue as making excuses)
  •  finding an effective meeting length, often dedicating less time than recommended
  • planning stakeholder meetings effectively around the release of tangible product and using the project manager wisely to keep in contact with and update clients
  • utilizing software appropriately, keep teams as separated as needed

Clearly, there are decisions to be made regarding separation and time. Failing to address these problems can be deadly.

Tinkering with these resolutions, I have found that stakeholder meetings happen sporadically at first, ensuing as recommended after a product becomes substantial. Requirements gathering is itself a continuous project. I have also found that early stage projects can often make better use of Kanban, allowing for more flexibility, while established projects make better use of Scrum, when tickets are defined at the backlog grooming and retrospective.



Conclusion

While agile methodologies are effective, they are by no means inflexible. They were not intended to be. Using your time effectively is at the root of any project management philosophy.

Proper use of agile with a precise application for your organization is the key to maximizing effectiveness.

 

A Guide to Defining Business Objectives

It can be said that in the world of the actual developer when the client or boss stops complaining and critiquing, the developer was automated to the point of being redundant. Basically, if no one is complaining, the developer is out of work.

Everyone wants a flying car, even when the job involves ingestion. However, there is a fine line in a business between a business requirement and idealism or ignorance. This is especially true when reporting to a boss whose technical competence and knowledge is less than stellar. So what rules have I discovered that will keep you employed? How do you avoid going to far off track when creating SCRUM tasks? Read on.

This article is an opinion piece on defining objectives and, to some extent, building the infrastructure required for a project.

Defining Core Objectives

All actual rules should be tightly cropped to the actual end client’s needs. This means being social with the end user. The first thing to do should be to find the most pessimistic person in the office and the person who will be using your product most. They may be the same people. Strike up a nice conversation with them. Get to know them  and then start asking questions about what they absolutely need. If there are multiple people in the office interacting directly with your end product, work from the most pessimistic and necessary to the least, establishing an absolute core of necessity.

By default, your boss will usually be the most optimistic person in the office. They are often less knowledgeable of the technical requirements and often of technology in general. Your goal is to temper their expectations to something reasonable. I have found this to be true of business partners as well.  You should understand what they want as they will provide a small set of absolute requirements but also keep in mind that if they can squeeze gold from water, they will.

If you find yourself limiting your objectives, use memos and presentations to make sure that you are providing a solid line of reasoning for why something cannot be done. Learn the IEEE standards for documentation and get familiar with Microsoft Office or Libre Office. Always offer a solution and state what would be needed to actually accomplish an objective and why it may be infeasible. In doing so, you may find a compromise or a solution. Offer them as alternatives. Do not be overly technical with the less technical.

My line of work requires providing relatively sensitive information in bulk at speed with a fair degree of normalization and quality. Basically, I am developing and building a large distributed ingestion and ETL engine with unique requirements that do not fit existing tools. This process has been a wreck as I was new to development coming in, given a largely inappropriate set of technology, ignored, and asked to build a Netflix style stack from the hulk of Geo Cities.

Defining business requirements was the first task I struggled with. Competing and even conflicting interests came from everywhere. My boss wanted and to a large degree wants an auto-scaling system with a great degree of statistical prediction on top of a massive number of sources from just one person. My co-workers, clients in the most strict sense, want normalization, de-duplication, anomaly detection,  and a large number of enhancements on top of a high degree of speed. No one seemed to grasp the technical requirements but everyone had an idea of what they needed.

In solving this mess, I found that my most valuable resource was the end user who was both most pessimistic of what we could deliver and had less technical skill than hoped for.  She is extremely bright and quite amazing, so bringing her up to speed was a simple task. However, she was very vague about what she wanted. In this case, I was able to discern requirements from my bosses optimism and a set of questions posed to her. As she also creates the tickets stemming from issues in the system, she indirectly defines our objectives as well.

Available Technology

The availability of technology will determine how much you can do. Your company will often try to provide less than the required amount of technology. Use your standards based documentation, cost models, and business writing to jockey for more. If you are under-respected, find someone who has clout and push them to act on your behalf.

As a junior employee several years ago, I found myself needing to push for basic technologies. My boss wanted me to use Pentaho for highly concurrent yet state based networking tasks on large documents ranging from HTML to PDF. In addition to this, he wanted automation and a tool that could scale easily. Pentaho was a poor tool choice. Worse, I had no server access. It took one year before I was able to start leaning on a more senior employee to lobby for more leniency and after another year and a half, servers. It took another year before I had appropriate access. If I was not developing a company, one that now has clients, I would have quit. The important take away, get to know your senior employees and use them on your behalf when you need to. Bribes can be paid in donuts where I work.

Promise Appropriately, Deliver With Quality

Some organizations require under-promising and over-delivering. These tend to be large organizations with performance review systems in desperate need of an overhaul. If you find yourself in such a situation, lobby to change the system. A solid set of reasoning goes a long way.

Most of us are in a position to promise an appropriate number of features with improvements over time. If you use SCRUM, this model fits perfectly. Devise your tasks around this model. Know who is on your team and promise an appropriate unit of work. Sales targets are built around what you can deliver. They are kept on your quality and  the ease of handling your product. Do not deliver to little, you will be fired but don’t define so much as to raise exuberance to an unsatisfiable level.

In my ingestion job, promises are made based on quality and quantity. I use the SCRUM model to refine our promises. Knowing my new co-worker’s capacity, fairly dismal, and my own, swamped with creating the tool, I can temper our tasks to meet business goals. Over time, we are able to include more business requirements on top of the number of sources being output and improving existing tools.

Hire Talent

If you are in the position of being able to hire people to expand on what you can achieve,  I do not recommend telling your boss an entry level position will suffice as they will then find someone with no skill. Also, push to be in the loop on the hiring process. The right person can make or break a project. My current co-worker is stuck re-running old tasks as he had no knowledge of our required tools and concepts despite my memo to my boss. Over time, he will get better but, with little skill, that may be too long. Sometimes the most difficult higher ups are those who are nice at heart but ignorant in practice.

Tickets

Your core requirements are not the ten commandments. You are not defining a society and universal morals but a more organic project. Requirements and objectives will change over time. The best thing you can do is to establish a ticket system, choose a solid system as changing to a different tool later is difficult. Patterns from these tickets will create new tools, define more requirements, and help you to better define your process.

In finding an appropriate system, ask:

  • Do I need an API that can interact with my or my clients tools?
  • Do I need SCRUM and Kanban capabilities on top of ticketing?
  • How hard is it to communicate with the client or end user?

At my work, I implemented a manual SCRUM board for certain tasks which had a positive impact on my overwhelmed co-worker who found JIRA cumbersome and full of lag. It is. We use JIRA for bug reporting and associated Kanban capabilities.

Cost

Cost is that lurking issue many will ignore. You need to document cost and use it to explain the feasibility of an objective. When possible create statistical tools that you can use to predict the burden on profitability and justify decisions. Money is the most powerful reasoning tool you have.

Conclusion

This opinion piece reviewed my lessons for entry level software developers looking to learn how to define business objectives. Overall, my advice is to:

  • Define core objectives starting with the most important and pessimistic users
  • Dive into your bosses core requirements and use their optimism to define the icing on the cake
  • Build on your objectives and requirements over time
  • Be involved in your bosses decisions
  • Define an appropriate number of objectives that allow you to deliver quality work (you will build on your past work over time)
  • Communicate and use an appropriate project management framework
  • Track costs and build statistical tools
  • Learn IEEE standards based documentation such as Software Design Documents and Database Design Documents, get familiar with business writing
  • Make sure you hire the right people