Quasi Quotes,Serializability, and Scala

Serialization of functions may seem like a security concern. However, it is also a major need. Tools like Java RMI, Spark, and Akka make at least part of their name on this ability. Where flexibility is a need and systems are less vulnerable, distribution of functions are not necessarily a bad thing. The concepts presented here require Scala 2.11 or higher.

A few use cases to get started:

  • Distributing custom functions with pre-written code
  • Creating localization and centrality in systems that may be spread over multiple Linux containers using tools such as Mesos
  • Executing user code in a trusted, secure, non-networked, and isolated environment
  • Tools such as JSFiddle but with Scala

Follow me through reflection because, well, I may need to explain it to someone soon.

Reflection

Scala let’s a programmer manipulate the Abstract Syntax Tree and change elements of code. There is an article on the Scala website about this. The tool here is reflection, the ability of a program to manipulate itself. Programs which use Just In Time compiling are much more easily manipulated than those that do not. The type of reflection done at run time is Runtime Reflection.

Reflection involves the ability to reify (ie. make explicit) otherwise-implicit elements of a program. These elements can be either static program elements like classes, methods, or expressions, or dynamic elements like the current continuation or execution events such as method invocations and field accesses. One usually distinguishes between compile-time and runtime reflection depending on when the reflection process is performed. Compile-time reflection is a powerful way to develop program transformers and generators, while runtime reflection is typically used to adapt the language semantics or to support very late binding between software components. — Heather Miller, Eugene Burmako, Philipp Haller at Scala

Reflection could be useful in building ETL parsing tools, tuning code for operating systems, or changing query builders to fit certain databases.

Now for my specific use case. I need to be able to serialize and pass code to children processes running separate JVMs. Why? To reduce management issues and improve flexibility in software I am writing for work.

Scala Macros

Scala macros look like functions but manipulate the symbol trees. Their uses include such tasks as tuning functions, CSV Parsing and generating data validators. The data validation link provides a nice overview of macros prior to digging into the Scala website. It is possible to build a basic tree from code in this way.

Scala 2.11.0 Macros

In Scala 2.11.0, we create a method and link it to the macro using macro with T coming from generics belonging to the object enclosing the function. This linking allows us to define elements of the macro.

import c.universe._
import scala.reflect.macros

object Enclosure[T]{
    def myFunction[V](T => V) : Enclosure[V] = macro Func_Impl.myFunction[V,T]
}

Our function takes in a type T and produces a type V. It is defined in the Object FuncImpl by myFunction.

We then define the implementation (this is a basic and slight deviation from the Scala website). Here we create a type at runtime. Therefore, our type is weak. The type tag comes from the previously imported c package. We then define a context (usually universal), the expression transforming the data, and the return

object Func_Impl{
    def myFunction(V : c.WeakTypeTag , T : c.WeakTypeTag)  (c : Context) (c.Expr[ T => V): c.Expr[Enclosure[T]] = ...
}

The code here can be reified and used as needed following the explicit structure where data type T is transformed to V.

The code here is meant as an introduction to the Scala Website which goes much more in depth on this subject including how to manipulate symbol trees and types.

Scala 2.11.8 Macros

Scala 2.11 changed Macros quite a bit. There are now whitebox and blackbox macros with whitebox macros having more ambiguous signatures than blackbox macros. The transformations completed by blackbox macros is understood by their input and output types without understanding their inner workings (think of testing code).

Creation of macros changed significantly as well with implementations divided into whitebox and blackbox packages under scala.reflect. Tags have changed as well with changes documented on the scala website.

Quasi Quotes

Quasi quotes take the difficulty out of Scala Macros for our task. They take harder to write code and let us build and use trees from them. We simply write the code, parse it with Scala’s reflection tools, and then return the function through evaluation. The quasi quote effects compilation.

import scala.reflection.tools.ToolBox
import scala.reflect.runtime.universe.{Quasiquote, runtimeMirror}

val f = q"def myFunction(i : Int) : Int = i + 1"
val wrapper = "object FunctionWrapper { " + f + "}"
val symbol = tb.define(tb.parse(wrapper).asInstanceOf[tb.u.ImplDef])

        // Map each element using user specified function
val func = tb.eval("$symbol.f _")
func(1)

//result should be 2

The code here created a quasiquote, generated a wrapper to be parsed, generated the symbol tree, and obtained the function from it.

Serialization

With a cursory view of macros, we can now look at serialization. It is actually quite simple. Serialization here just means that we take a string and serialize it. Classes do not extend the Serializable trait. Any does not extend Serializable either. Therefore, it is necessary to find alternate means to write serialize the code. Some blogs recommend approaches such as shim functions which may better suit your needs. However, String is serializable and, as long as the functions are defined, quasiquotes are useful here. Just ensure that any other libraries requiring linking to are already in the class path.

Serialization is simple.

@SerialVersionUID(10L)
class OurClass (val code : String,val v2 : Int, val v3 : Double) extends Serializable{

    override def toString = s"{code:'$code',v2: '$v2',v3: '$v3'}"
}

Security Mechanisms

This is by no means secure or, in many cases, a bright idea. However, when possible and when the usefulness outweighs the problems, there are some tools to deploy in ensuring a bit of security.

  • Write and pass check sums with a Hamming, Adler, or other function
  • Encrypt transmissions
  • Ensure security (permissions by user modes,passwords, and user names)
  • Isolating environments that execute this sort of code
  • Only running and generating code passed in this manner via internal networks
  • If absolutely necessary, use a VPN

Conclusion

There are certainly more complex cases but this is a starter for reflection, macros, quasi-quotes, and serialization, a way to tie things together. The linked resources should prove useful for more depth.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s