Green Fields Ahead

I can do anything!

Greenfield projects in software imply that there won’t be a legacy system to accommodate, so this time, everything can be done right and all that “old cruft” can be ignored and we’ll all be coding in a land of rainbows and joy with only sunshine.

The reality is, greenfield projects are considered more risky. My earlier article on Getting to Enterprise explains one warning about greenfield risk — starting with enterprise tech because “you’ll need it anyway.”

Feel free to search for articles on the risks of greenfield development. This article focuses on what the coders need and do during greenfield and where not having legacy helps and hinders them.

There’s so much to choose!

How can we write code before we know so many things? For instance, we need to know:

  • The continuous integration system
  • The version control system
  • What kind of containerization we’re using
  • Whether we’ll be using Kubernetes to manage the containers
  • Which platform — AWS, Azure? Google?
  • What stack?

Of course, we need to lay out everything above before we start coding, because we want to get it right!

Nonsense.

That sounds strange? I’ll repeat it: nonsense!

Even ignoring the modern promotion of agile methodologies (vs waterfall, which would require you decide in advance on the deployment, since everything is decided in advance) the implication is that at the start, when you know the least about this new endeavor and what it will need, you should commit to a particular deployment model.

In fact, there is no inherent need to choose most of what’s in that list up front.

Why starting with solution-space technology isn’t effective

The key driving point for any software coding effort is to understand the problem thoroughly. Wallow in the data, saturate yourself in the intentions of the people who will need your solution. Choose a good representation of the problem and model that in your solution so their intentions can be carried out.

This has nothing to do with how it deploys. Deployment, fundamentally, is a dev-ops exercise.

This doesn’t mean dev-ops is not important. It’s not possible to succeed with your project if you can’t deploy.

But until you have running code, what will you deploy?

Until you’ve measured the performance of your running code (speed, memory needed, storage required, etc.) how will you be able to choose the deployment technology best suited? How will you know when you need to optimize some area, to perhaps affect the choice of later deployment?

You don’t need to commit up-front to nearly as much as you might imagine. For instance, as a simple mental exercise, answer the following:

How does my server side JS code change between Express.js and AWS Lambda?

Simple enough question, right?

Some people would point out that all the routing, the access of the query parameters and body (assuming JSON body) and the callbacks to specify the return headers change. That’s a lot of change!

Here’s a secret: if you have to change the code that does the required work on receipt of the API message because you have changed the code that receives the API request you have written your code badly.

How can I say such a thing? Of course the code between Lambda and Express.js is different! Any review of the documentation of the two products shows it!

I have spent the last year teaching non-traditional background developers how to build good code. Imagine (as they actually implement) a facility for doing “gross to net” calculations for South Africa Revenue Service PAYE and UIF that will eventually be able to handle any country in the world. The details don’t matter, but what matters is the basic way the code that works is called:

// calling in unit test:
import * as calc from "./calc.js";

// populate an object with key/value pairs of data on
// period gross earnings, periods per year, and age

const result = calc.execute(source);

// use the data in result, which is an object of key/value
// pairs
import * as calc from "./calc.js";

// populate an object with key/value pairs of data on
// period gross earnings, periods per year, and age

const result = calc.execute(source);

// throw exception if result != expected result

The real system is a bit more elaborate, but not much more. The point is, calc.execute(source) -> result, and both source and result can be transformed to/from JSON.

Hooking this to Lambda requires the following Lambda specific boundary code be created:

import * as calc from "./calc.js";

export const handler = async (event) => {

    // route checking logic, extract JSON body in event to source

    const source = ...; // use event object here
    const result = calc.execute(source);

    // convert the result into JSON and push it back
    return {200, result, headers};
}

Of course, that looks nothing like my AWS lambda, but that’s how all the demo programs show you to do it. The point remains: nothing in that above code requires a single line of modification to the working gross to net logic inside calc.js.

Well, what about Express.js?

import * as calc from "./calc.js";

const route = ...;

app.post( route, (req, res) => {
    // fetch from req the JSON for the source
    const source = ...; // use req object here
    const result = calc.execute(source);

    // deliver the result
    res.send( /* format result here as needed */ );
})

The block of code for express is different from Lambda, but it is also “new code” that alters nothing in the logic inside calc.js.

Why is it that unit testing, choosing Express.js, or choosing Lambda has no impact on the code in calc.js?

Separation of concerns ensures that code for integration with an API engine (any API engine) doesn’t leak into the code that does the work.

It’s this same capability that allows us to write unit tests for calc.js without having to setup any form of API at all. It’s easy to unit-test a pure functional API that accepts inputs and produces outputs — so easy it’s almost unfair.

So easy that excellent developers do it all the time.

Because of this one aspect of coding discipline, the only technical decision you need to enable developers to start working on solving the domain problem at hand is “What language and version of that language are we using for this component?”

I implied that my gross to net calculation was JS using ES6 (because I used import instead of require). That is the only solution space choice needed to get to work.

There is code unique to each eventual choice of solution tech!

As the above thought experiment with Lambda vs Express.js shows, there is per-solution unique boundary code. That’s generally the case.

Because we rejected selection of the deployment technology in advance, we as developers writing code were constrained (limited!) in what we could do within our code. We couldn’t take advantage of any of the deployment tech!

For instance, suppose there’s an exception deep within the guts of the calc.js that triggers. How do we send the error details back to the client who used the API? Don’t we have to know how to send responses?

Of course not! Knowing that would cripple our unit testing, forcing us into mocks. Even thinking such a thing implies that it’s not possible to separate concerns.

Instead, throw a custom exception with the details wrapped.

Let the boundary code unique to Express or to Lambda (or whatever your chosen tech stack) catch the domain errors and transform them into whatever the tech requires.

One of the best indicators of sanity-testing your choices is to be able to answer a question of “How does that get back to the user?” with “Don’t know, don’t care.” The domain code throws a domain exception. That’s the extent of the knowledge needed at that point.

That restriction, limitation, that constraint, that absence of knowledge, ensures that the code is freely reusable from simple test harnesses to arbitrarily complex deployments.

But I want to write it once and have it run anywhere!

Attempting to make solution-domain code (such as the routing layer of Express.js vs the routing of Lambda) portable can be done–but it’s hard. And it’s not feasible to abstract from that abstraction until you have two working versions (one per tech) to refactor into some form of common framework.

Such a framework, write once and run on different competing vendors, can thus emerge based on your needs. Should you make it rich enough (which is conceptually possible) then you can reuse it for other projects. That’s how frameworks for allowing single SQL statements to work across many SQL vendors came about, for instance. But, it’s a project of its own. And using it has a learning curve — that’s where Enterprise frameworks come from and starting with enterprise frameworks makes it harder.

If you end up with such a framework of your own, please do share. It will join all the other frameworks that try it — and likely it will be as complex as all the others as well.

What about other boundaries than API?

The concept separation of concerns applies at each connection between your code and other. This is where Dependency Injection comes into play, for instance.

Any time code reaches for anything not “within its control” that dependency increases the pain of writing unit tests. While I’m not going to promote Test Driven Development as a process (because I’m process agnostic) I am going to point out: if you can’t easily (without mocks) make a unit test for a function or class you have failed to manage your boundaries well!

In helping coach the developers in my program they have told me more than once that something must be used or that it wasn’t possible to do something given the environment or technology. Oddly, they always managed to maintain separation of concerns once they realized I wasn’t going to give them a “special case” exemption. And the result of what they did was testable!

The goal isn’t to try to write portable code (though that’s a pleasant side-effect). The goal isn’t to write easily testable code (though that’s also a pleasant side effect). The goal is to write code which precisely performs the intended result on the problem’s representation without requiring knowledge beyond the code being worked on.

The rallying cry for this is “Don’t know, don’t care!” because that’s how you know you aren’t dependent on something.

But what about Docker vs dedicated EC2 images?

If your code has to directly access something that is affected by Docker or EC2 or Lambda, and it’s not code intentionally unique that is part of the boundary, then your code needs to be refactored or rewritten.

I could do an example above to show what changes between Docker and EC2 — the problem is, both versions would likely look the same. Docker and Ec2 (or any other similar tech) causes minimal to no code changes. They do have profound changes on the deployment, of course, but not the internal code they bundle.

This is why, as a developer, I’m container agnostic. I develop on a box, be it laptop or desktop, be it Mac, Linux, or Windows. If I’m developing in JavaScript ES6 then the different OS and platform affect only how the code is launched. I don’t have to think, “Now, since this is going to deploy in Docker, I need to change how I code it.”

What differences there are in the launcher are boundary differences, almost always in configuration scripts and such. Only very rarely code.

When do I care about dev-ops then?

If you’ve got a large enough team, the dev-ops group will be working in parallel, so that they can measure the performance of each of the versions of the domain code. They will ask some dev to write the boundary specific bit (or do it themselves in some cases) but that doesn’t impact the problem domain code at all.

If there’s not enough people to have dedicated dev-ops then it will become a later stage in the process.

I’ve had people attempt to explain “I have to know in advance how the deployment will work before I write code so I don’t rewrite code.” As described above, the only code that could need rewriting is the boundary-specific, which is tiny.

Once the unit tests show that all the code for “doing the job” works, that’s when the small team can turn their attention to packaging and deployment.

In order to enter production, the system must be deployed. So deferring the dev-ops to the later phases doesn’t mean going live without scalability or testing. It simply means that by the time dev-ops work starts the majority coding work is done….and at least stable enough that there aren’t large changes to the external exposed entry points.

Only the entry points are needed to write the boundary code anyway!

Attempting to implement the deployment path in advance means re-doing the dev-ops work constantly — or worse, it means refusing to change the external entry points because that would break the deployment.

That pre-locked in choice then makes later work harder — that becomes the new legacy that is hated but must be endured. You will have done it to yourselves, which is even more painful.

Lead with enterprise, lead with solution space mandates, and you doom your coders to legacy restrictions in advance.

Lead with representing the problem and capturing the intentions, code for testability, and your deployment is done more easily with better understanding of what it needs to do. Your coders are free to create the most capable entry points that support fully the problem to be solved without “working around” anything. That sounds like the point, right?

And if you don’t start with deployment until the code is stable, there’s no cost for re-doing deployment (or having screaming fights between dev-ops and dev) until the code works. No re-doing means lower costs.

What about estimation, costs, and business models which need target environment choices to get price quotes?

It is reasonable when doing the business planning and estimation to seek details such as what it will cost to deploy and run.

For instance, AWS can get very expensive if used naively.

When I do high speed algorithmic trading development for clients, AWS is often my recommendation. AWS makes ingestion from the market feed free — it charges for delivery out of their network. For trading, thousands or millions of ticks are ingested and then orders (tiny amount of bytes) are placed with brokers. Reports are also produced, which again are tiny. This makes AWS really cost-effective for that type of workflow.

Compare that with a workflow that involves doing custom video (say, writing a competitor to MS Teams). The ingestion of the video streams is free, but pushing one person’s stream out to six other members is expensive comparatively. Cloudflare might serve that need a lot more cost-effectively.

So, shouldn’t those decisions be made before we start coding? So we know if we’re going AWS or Cloudflare, or which parts will be AWS and which Cloudflare?

Those decisions don’t affect the coding! If the system requires a process that takes a set of audio/video streams and merges them that process will work regardless of which platform hosts the streams. The code won’t change. The choice of programming language would be the driving force for developers.

AWS vs Azure vs Google vs … those have little or no effect on the code developed. The most affect they would have is on the boundary code, which is the least complex code — it’s code who’s complexity is purely what Fred Brooks calls accidental complexity. So long as your working code passes tests, you can integrate it to any of the deployment environments.

Problems with integration are caused by poor integration technology and poorly written code to integrate. You control the choice of integration tech — but you own the code to integrate.

Conclusion

While it may seem counter-intuitive, when doing greenfield development, let the problem drive the code, and the working code drive the choice of deployment technology.

Don’t force a new legacy system into being to make your development painful before you even get started!

Keep the Light,
Otter
Brian Jones

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s