Red Hat, Is IBM Making a Desperate Grab?

CentOS, Fedora, Ubuntu, which comes to mind first. Even in the world of servers, the likely answer is Ubuntu which is debian based and not Red Hat based.

There are a few enormous reasons why Ubuntu and debian are winning the battle for Linux supremacy.

  1. Ease of Use
  2. Widely supported
  3. Extremely configurable
  4. Open source all the way through and thus FREE
  5. Popular, both an upside and a downside
  6. Ubuntu is a leader in developing distributed computing systems
  7. Ubuntu works with Amazon and others and will continue to do so unless it is sold

These are only a few of the reasons Ubuntu and debian constantly top the list of popular Linux distributions.

The same cannot easily be said for Fedora or CentOS. The lag in support alone costs time and money while frustrating users. While most aspects of the operating system are the same, Ubuntu has some serious advantages in a rapidly changing world.

The main advantage given for using CentOS is security. However, put in another way CentOS often lags behind in terms of support for various packages to allow a testing phase.  This might mean that a small undetected security breach leaks all of your files to a hacker while being quietly patched by a developer. It also means installing some cutting edge technology from source. CentOS may not be more secure at all.

Furthermore, when developing security applications or building extremely small packages, Arch Linux is typically the operating system of choice.

Fedora is definitely an unstable release as it tends to be where testing occurs for Red Hat.

Now, consider some of the other downsides which start to resemble a loop:

  • less popular operating systems tend to receive less support
  • operating systems that were once popular may be out of reach of many when starting to sell commercially (a concern for startups)
  • IBM is notoriously behind the times and seeing revenue decline
  • IBM has failed to capitalize even when it has a market advantage by years
  • commercial products restrict usage, further limiting the user base

With an increased likelihood that Red Hat distributions will suddenly stop being made available by Amazon and used by Oracle, Red Hat’s competition could be seeing purchase offers and a huge boost. However, if Ubuntu remains free, it’s already nascent popularity could spike a little.

There are trends which might break this loop. Commercial products always receive support. Commercial products are also pedaled in schools like drugs are to doctors. The last point is more mute as degree seeking students in non-business technical classes (where databases/operating systems/more technical math are taught) tend to shun anything that costs them money and Oracle and Android have an advantage everywhere else.

Basically, in the specialist, educational, and enthusiast markets, consumers and professionals may well start to shy away from Red Hat in fear of tying themselves and their products and wallets to IBMs.

All things given, why then, would IBM buy Red Hat for over $30 billion. They make $3 million per year.

  • Improve and modernize their own systems – at $30 billion?
  • Acquire talent that might improve their future product offerings to compete with Microsoft, Amazon, and the open source community – at $30 billion?
  • Spend even more money to build a business operating system to compete with Windows to try to attract Microsoft customers (Linux users don’t use Windows for the reasons they won’t use Red Hat now)

I am sure that the last point is fairly mute with Microsoft’s decades long technological leap over IBM.

Perhaps I am not seeing something but the conclusion seems to be that IBM’s lag in the AI space after an early lead has left it desperate and that, with the actual decline of Red Hat’s popularity and stock price over the last year mirroring the growth of other systems over the last 10 years, two dying animals are trying to combine their strengths.

The Case for Using an IRM to Scale Data Intake

Among many, there are three major problems faced by an analyst before data is useful:

  • data aggregation and storage
  • data security and access
  • data wrangling (ETL/ELT)

This article deals with data security and access using an information resource management system, IRM. My own company, Simplr Insites LLC, is writing such a system alongside a file storage solution in an effort to modernize the research process.

Problem

One significant problem faced in research and cooperation is the attainment of clean and useful data. Obtaining this data often means gaining access to systems, forming legal agreements, obfuscating certain data, and embarking on the painful process of data wrangling.

While ETL and ELT are critical steps, just obtaining sensitive data, even from within an organization, is tricky. Consider the following cases related directly to access:

  • data sets include confidential information
  • data sets are ensnared in legal agreements regarding who can access data
  • users want to control access to data to ensure it is not misused
  • external users are allowed varying degrees of access

IRM as a Solution

Oracle generated a solution that attempts to tackle the data security issue. The Oracle IRM documentation provides a rather informative graphical overview of their tool:

irm

In this system, an external user accesses a load balanced IRM server application which controls rights and access to different resources and files. Several firewalls help to improve security along with authentication, access grants, and encryption. Web services  and internal users utilize the IRM server as well.

Beyond the visible components, tokens can be used to instantly manage resources and propagate access changes.

Most file systems also offer the capability to pull the date when a resource was created or modified and various permissions information. This is useful for logging purposes.

Setting Up an IRM

It is not necessary to rely on Oracle for an IRM solution. In fact, the Oracle IRM only works with Microsoft Windows.

Each component can be paired with a reliable tool, most of which I have blogged about. A set of pairings might include

Base Application and Resource Management Django with Secure Login
REST API Resource Access Django OAuth Toolkit
Access Management Django Oauth Toolkit and a Database System
Individual Resource Tokens Randomly Generated and Hashed Key
File Storage GlusterFS or an Encrpytable File System
Encryption of Resources PyCrypto or a Similar Tool
Firewalls IP Tables or another firewall
Two Step Verification through SMS Twilio
Key Storage Stack Exchange Blackbox
VPN Access Firefox
Logging and Anomaly Detection Elastic APM and the ElkStack

Logging

Logging is critical to security. Logs allow administrators to spot harmful activity, generate statistical models based on usage, and aid in auditing the system.

Tokens

Tokens are a perfect solution for controlling document access in the system. They allow a user to gain access to a document, offer scopes for access, and often contain scopes that grant levels of access to a resource.

A user should be required to log in to the application to retrieve a token which refreshes on a regular schedule. These tokens can be revoked and changed by a resource owner or administrator much like using a file system.

Fernet Encryption

While RSA encryption is useful for two way encryption, Fernet encryption is stronger and more useful for storing files. If a system does not offer encryption, tools such as PyCrypto offer Fernet encryption.

Storing Keys

Keys should not be stored in the open. If compromised, it is extremely easy to gain access to a key stored in plain text. Instead, tools such as Stack Exchange’s Blackbox store keys in a system backed by a GPG key ring.

Two Step Downloading for Extra Security

Downloading a file in a secure manner might require extra protection, particularly when an external but trusted user desires access to a resource. To avoid spoofing and avoid a compromised computer from gaining access to a resource, two step verification is a recommended step.

In this process the external user provides an access token to obtain a document which is verified. On verification, a text message containing an access code is sent to the external user and the internal user is notified of the access. The external user enters the code and, if required, the resource owner or admin approves the download.

This type of process is not difficult to implement through desktop or web applications using push notifications or persistent storage.

Conclusion

Secured yet accessible storage is a critical problem for any data analyst or scientist. Using an established IRM or implementing a similar tool helps secure access and empower analytics.