Infrastructure as code?

I had an interesting conversation with a friend of mine @dtcb about the size of the code deployed.

In all of the infrastructure efforts I’ve been part of for the last 6 years the size of the code deployed grew, sometimes to the point that the operational complexity was so high that the code had to broken apart. This is in line with the thought process of micro-services as an architectural solution to break the complexity of components from a monolithic architecture to a piece-meal composition that in turn is easier to deploy and debug.

What struck me at that time is that micro services might in fact might be too big for deployment? What if instead you could deploy classes. Literally, a simple interface that did one (or very few) things well. What if your operating system was in a way an API to deploy services (it is!) but the size of the code deployed was so small that it would in turn be hard to make mistakes. What if each of these classes had built in monitoring, tracing and debugging hooks. What if each running operator (see where I’m going?) had an ability to run multiple versions of it. What if you allowed a higher level runtime (higher than programming language runtime) worry about the performance and gluing of components.

In a way this is exactly what a CQRS/Streaming architecture affords the developer. Notice that I mentioned ‘architecture’ and not ‘samza, storm, etc’ because as of today all these services require you to deploy a fat jar,tar.gz file that contains the glue itself. It is not left up to a higher level construct to build up the ‘Computational Topology’ of your code. As a developer you specify: (in pseudo code)

// assuming an aggregation here.
//
val t = new topology
t.group_by ( some field in stream, typically user id)
 .operator( usually a commutative and associative - like Sum())
 .save( to some database )
 .emit( next tuple to chain more operators )
 .....

So where does the ‘infrastructure as code’ comes into play?

Imagine some magical command line tool that allowed you to deployed a single class/interface and run it, regardless of the programming language. What is more important is not just ‘running’ of your code but it managed the connections in between components.

What if you could write a source of data in Python and then have a heavy machine learning algorithm in Java. What if you empowered each team in an organization to deploy code and the code have enough information (via some implicit information) about the size of job and how it would in turn execute in physical space. If you achieve this goal you have in turn achieved ‘infrastructure as code’. You would have the ability to modify your running infrastructure by running more code.

For example, supposed you started code as follows:

  new topology().addStream( hdfs://file ).map( x: String => println (x) )

and then you added a simple filter:

new topology()
    .addStream( hdfs://file )
    .filter( x: String => x == "blue you know you're my boy!" )
    .map( x: String => println (x) )

What if instead of just tearing everything down there was a system that allowed you to modify the runtime of these application to run the filter function either in the same process or in the same machine or in a series of machines without have to take down your running code.

I do know that the erlang vm allows you to do byte code deployment on a running vm. And I know that spark gives you this native (scala) interface for dealing with streams in multiple machines.

I want both - really, I want the ability to redeploy the map function at runtime, pausing all its upstream operators, deploy a new one and keep producing. I also want the ability to add some arbitrary computation graph above the mapping function without affecting anything else. In fact I don’t even want to write them in the same file. I just want to refer to these functions by name.

I suppose I’m trying to solve some of these challenges. I think that as of today we don’t have the tools (at least i don’t know of them) that would allow us to have this simple programming paradigm, but we have some of the tools.

Obviously not complete, here is a sample of the system I’m working on:

class MockTweetComputation:
    def initialize(self, ctx):
        ctx.set_timer(milliseconds_now())

    def process_timer(self, ctx, timer):
        for tweet in gen_tweets():
            ctx.produce_record('tweets', '_', thrift_to_bytes(tweet))
        ctx.set_timer(milliseconds_now() + 1000)

    def metadata(self):
        return rbonut.Metadata(
                name='mock-tweets-generator',
                istreams=[],
                ostreams=['tweets']
                )

You’ve figured it out!

Essentially a globally unique pub-sub system gives you the flexibility I’m talking about here. Essentially a pub sub system would allow you to refer to functions by name (same as a program) and your arguments to your function would be explicity delcared as would your output of your program such that some scheduler/oracle could co-locate/inline/scale work for you.

In brief, the above operator is something like this in C++:

struct Tweet {
  char[140] data;
};

std::list<Tweet> mockTweetGenerator() {
  std::list<Tweet> retval;
  // ... do lots of work
  return retval;
}

Notice that the istreams of mock-tweets-generator is an empty list. However, the ostreams (return arguments) is a list of one ‘type’.

Ok… so what’s that about code size?

Well.. the argument is that by making the code small, you’ll make less mistakes. You’ll be able to iterate faster on your problems. Same deal that people have been arguing for for decades.

What is exciting for me, is that we - at work - built a system that actually empowers you to do just that. For all programming languages and through a simple pub-sub model you can effectively change how you chain complex computational models - with some small added overhead.

We built the glue, and hopefully our application developers can just build the application and deploy code without fear. Deploy code fast.

Note: It is clear to me that this paradigm doesn’t work for all the problems in the world. You would need a unified runtime and a unified programming which is almost what erlang gives you. However, this would defeat one of our goals which is to allow anyone with any programming language to take advantage of this model.

Soon, I’ll talk about our isolation, message guarantees, etc.

Thoughts? - please let me know!