Works on my Machine

One of the most insidious obstacles to continuous delivery (and to continuous flow in software delivery generally) is the works-on-my-machine phenomenon. Anyone who has worked on a software development team or an infrastructure support team has experienced it. Anyone who works with such teams has heard the phrase spoken during (attempted) demos. The issue is so common there’s even a badge for it:

Perhaps you have earned this badge yourself. I have several. You should see my trophy room.
There’s a longstanding tradition on Agile teams that may have originated at ThoughtWorks around the turn of the century. It goes like this: When someone violates the ancient engineering principle, “Don’t do anything stupid on purpose,” they have to pay a penalty. The penalty might be to drop a dollar into the team snack jar, or something much worse (for an introverted technical type), like standing in front of the team and singing a song. To explain a failed demo with a glib “<shrug>Works on my machine!</shrug>” qualifies.
It may not be possible to avoid the problem in all situations. As Forrest Gump said…well, you know what he said. But we can minimize the problem by paying attention to a few obvious things. (Yes, I understand “obvious” is a word to be used advisedly.)
Pitfall 1: Leftover configuration
Problem: Leftover configuration from previous work enables the code to work on the development environment (and maybe the test environment, too) while it fails on other environments.
Pitfall 2: Development/test configuration differs from production
The solutions to this pitfall are so similar to those for Pitfall 1 that I’m going to group the two.
Solution (tl;dr): Don’t reuse environments.
Common situation: Many developers set up an environment they like on their laptop/desktop or on the team’s shared development environment. The environment grows from project to project, as more libraries are added and more configuration options are set. Sometimes the configurations conflict with one another, and teams/individuals often make manual configuration adjustments depending on which project is active at the moment. It doesn’t take long for the development configuration to become very different from the configuration of the target production environment. Libraries that are present on the development system may not exist on the production system. You may run your local tests assuming you’ve configured things the same as production only to discover later that you’ve been using a different version of a key library than the one in production. Subtle and unpredictable differences in behavior occur across development, test, and production environments. The situation creates challenges not only during development, but also during production support work when we’re trying to reproduce reported behavior.
Solution (long): Create an isolated, dedicated development environment for each project
There’s more than one practical approach. You can probably think of several. Here are a few possibilities:
- Provision a new VM (locally, on your machine) for each project. (I had to add “locally, on your machine” because I’ve learned that in many larger organizations, developers must jump through bureaucratic hoops to get access to a VM, and VMs are managed solely by a separate functional silo. Go figure.)
- Do your development in an isolated environment (including testing in the lower levels of the test automation pyramid), like Docker or similar.
- Do your development on a cloud-based development environment that is provisioned by the cloud provider when you define a new project.
- Set up your continuous integration (CI) pipeline to provision a fresh VM for each build/test run, to ensure nothing will be left over from the last build that might pollute the results of the current build.
- Set up your continuous delivery (CD) pipeline to provision a fresh execution environment for higher-level testing and for production, rather than promoting code and configuration files into an existing environment (for the same reason). Note that this approach also gives you the advantage of linting, style-checking, and validating the provisioning scripts in the normal course of a build/deploy cycle. Convenient.
All those options won’t be feasible for every conceivable platform or stack. Pick and choose, and roll your own as appropriate. In general, all these things are pretty easy to do if you’re working on Linux. All of them can be done for other *nix systems with some effort. Most of them are reasonably easy to do with Windows; the only issue there is licensing, and if your company has an enterprise license, you’re all set. For other platforms, such as IBM zOS or HP NonStop, expect to do some hand-rolling of tools.
Anything that’s feasible in your situation and that helps you isolate your development and test environments will be helpful. If you can’t do all these things in your situation, don’t worry about it. Just do what you can do.
Provision a new VM locally
If you’re working on a desktop, laptop, or shared development server running Linux, FreeBSD, Solaris, Windows, or OSX, then you’re in good shape. You can use virtualization software such as VirtualBox or VMware to stand up and tear down local VMs at will. For the less-mainstream platforms, you may have to build the virtualization tool from source.
One thing I usually recommend is that developers cultivate an attitude of laziness in themselves. Well, the right kind of laziness, that is. You shouldn’t feel perfectly happy provisioning a server manually more than once. Take the time during that first provisioning exercise to script the things you discover along the way. Then you won’t have to remember them and repeat the same mis-steps again. (Well, unless you enjoy that sort of thing, of course.)
For example, here are a few provisioning scripts that I’ve come up with when I needed to set up development environments. These are all based on Ubuntu Linux and written in Bash. I don’t know if they’ll help you, but they work on my machine.
- Provision a Node dev environment
- Provision a Python dev environment
- Provision a COBOL dev environment
- Provision a Java dev environment
- Provision a Ruby dev environment
If your company is running RedHat Linux in production, you’ll probably want to adjust these scripts to run on CentOS or Fedora, so that your development environments will be reasonably close to the target environments. No big deal.
If you want to be even lazier, you can use a tool like Vagrant to simplify the configuration definitions for your VMs.
One more thing: Whatever scripts you write and whatever definition files you write for provisioning tools, keep them under version control along with each project. Make sure whatever is in version control for a given project is everything necessary to work on that project…code, tests, documentation, scripts…everything. This is rather important, I think.
Do your development in a container
One way of isolating your development environment is to run it in a container. Most of the tools you’ll read about when you search for information about containers are really orchestration tools intended to help us manage multiple containers, typically in a production environment. For local development purposes, you really don’t need that much functionality. There are a couple of practical containers for this purpose:
These are Linux-based. Whether it’s practical for you to containerize your development environment depends on what technologies you need. To containerize a development environment for another OS, such as Windows, may not be worth the effort over just running a full-blown VM. For other platforms, it’s probably impossible to containerize a development environment.
Develop in the cloud
This is a relatively new option, and it’s feasible for a limited set of technologies. The advantage over building a local development environment is that you can stand up a fresh environment for each project, guaranteeing you won’t have any components or configuration settings left over from previous work. Here are a couple of options:
Expect to see these environments improve, and expect to see more players in this market. Check which technologies and languages are supported so see whether one of these will be a fit for your needs. Because of the rapid pace of change, there’s no sense in listing what’s available as of the date of this article.
Generate test environments on the fly as part of your CI build
Once you have a script that spins up a VM or configures a container, it’s easy to add it to your CI build. The advantage is that your tests will run on a pristine environment, with no chance of false positives due to leftover configuration from previous versions of the application or from other applications that had previously shared the same static test environment, or because of test data modified in a previous test run.
Many people have scripts that they’ve hacked up to simplify their lives, but they may not be suitable for unattended execution. Your scripts (or the tools you use to interpret declarative configuration specifications) have to be able to run without issuing any prompts (such as prompting for an administrator password). They also need to be idempotent (that is, it won’t do any harm to run them multiple times, in case of restarts). Any runtime values that must be provided to the script have to be obtainable by the script as it runs, and not require any manual “tweaking” prior to each run.
The idea of “generating an environment” may sound infeasible for some stacks. Take the suggestion broadly. For a Linux environment, it’s pretty common to create a VM whenever you need one. For other environments, you may not be able to do exactly that, but there may be some steps you can take based on the general notion of creating an environment on the fly.
For example, a team working on a CICS application on an IBM mainframe can define and start a CICS environment any time by running it as a standard job. In the early 1980s, we used to do that routinely. As the 1980s dragged on (and continued through the 1990s and 2000s, in some organizations), the world of corporate IT became increasingly bureaucratized until this capability was taken out of developers’ hands.
Strangely, as of 2017 very few development teams have the option to run their own CICS environments for experimentation, development, and initial testing. I say “strangely” because so many other aspects of our working lives have improved dramatically, while that aspect seems to have moved in retrograde. We don’t have such problems working on the front end of our applications, but when we move to the back end we fall through a sort of time warp.
From a purely technical point of view, there’s nothing to stop a development team from doing this. It qualifies as “generating an environment,” in my view. You can’t run a CICS system “in the cloud” or “on a VM” (at least, not as of 2017), but you can apply “cloud thinking” to the challenge of managing your resources.
Similarly, you can apply “cloud thinking” to other resources in your environment, as well. Use your imagination and creativity. Isn’t that why you chose this field of work, after all?
Generate production environments on the fly as part of your CD pipeline
This suggestion is pretty much the same as the previous one, except that it occurs later in the CI/CD pipeline. Once you have some form of automated deployment in place, you can extend that process to include automatically spinning up VMs or automatically reloading and provisioning hardware servers as part of the deployment process. At that point, “deployment” really means creating and provisioning the target environment, as opposed to moving code into an existing environment.
This approach solves a number of problems beyond simple configuration differences. For instance, if a hacker has introduced anything to the production environment, rebuilding that environment out of source that you control eliminates that malware. People are discovering there’s value in rebuilding production machines and VMs frequently even if there are no changes to “deploy,” for that reason as well as to avoid “configuration drift” that occurs when we apply changes over time to a long-running instance.
Many organizations run Windows servers in production, mainly to support third-party packages that require that OS. An issue with deploying to an existing Windows server is that many applications require an installer to be present on the target instance. Generally, information security people frown on having installers available on any production instance. (FWIW, I agree with them.)
If you create a Windows VM or provision a Windows server on the fly from controlled sources, then you don’t need the installer once the provisioning is complete. You won’t re-install an application; if a change is necessary, you’ll rebuild the entire instance. You can prepare the environment before it’s accessible in production, and then delete any installers that were used to provision it. So, this approach addresses more than just the works-on-my-machine problem.
When it comes to back end systems like zOS, you won’t be spinning up your own CICS regions and LPARs for production deployment. The “cloud thinking” in that case is to have two identical production environments. Deployment then becomes a matter of switching traffic between the two environments, rather than migrating code. This makes it easier to implement production releases without impacting customers. It also helps alleviate the works-on-my-machine problem, as testing late in the delivery cycle occurs on a real production environment (even if customers aren’t pointed to it yet).
The usual objection to this is the cost (that is, fees paid to IBM) to support twin environments. This objection is usually raised by people who have not fully analyzed the costs of all the delay and rework inherent in doing things the “old way.”
Pitfall 3: Unpleasant surprises when code is merged
Problem: Different teams and individuals handle code check-out and check-in in various ways. Some check out code once and modify it throughout the course of a project, possibly over a period of weeks or months. Others commit small changes frequently, updating their local copy and committing changes many times per day. Most teams fall somewhere between those extremes.
Generally, the longer you keep code checked out and the more changes you make to it, the greater the chances of a collision when you merge. It’s also likely that you will have forgotten exactly why you made every little change, and so will the other people who have modified the same chunks of code. Merges can be a hassle.
During these merge events, all other value-add work stops. Everyone is trying to figure out how to merge the changes. Tempers flare. Everyone can claim, accurately, that the system works on their machine.
Solution: A simple way to avoid this sort of thing is to commit small changes frequently, run the test suite with everyone’s changes in place, and deal with minor collisions quickly before memory fades. It’s substantially less stressful.
The best part is you don’t need any special tooling to do this. It’s just a question of self-discipline. On the other hand, it only takes one individual who keeps code checked out for a long time to mess everyone else up. Be aware of that, and kindly help your colleagues establish good habits.
Pitfall 4: Integration errors discovered late
Problem: This problem is similar to Pitfall 3, but one level of abstraction higher. Even if a team commits small changes frequently and runs a comprehensive suite of automated tests with every commit, they may experience significant issues integrating their code with other components of the solution, or interacting with other applications in context.
The code may work on my machine, as well as on my team’s integration test environment, but as soon as we take the next step forward, all hell breaks loose.
Solution: There are a couple of solutions to this problem. The first is static code analysis. It’s becoming the norm for a continuous integration pipeline to include static code analysis as part of every build. This occurs before the code is compiled. Static code analysis tools examine the source code as text, looking for patterns that are known to result in integration errors (among other things).
Static code analysis can detect structural problems in the code such as cyclic dependencies and high cyclomatic complexity, as well as other basic problems like dead code and violations of coding standards that tend to increase cruft in a codebase. It’s just the sort of cruft that causes merge hassles, too.
A related suggestion is to take any warning-level errors from static code analysis tools and from compilers as real errors. Accumulating warning-level errors is a great way to end up with mysterious, unexpected behaviors at runtime.
The second solution is to integrate components and run automated integration test suites frequently. Set up the CI pipeline so that when all unit-level checks pass, then integration-level checks are executed automatically. Let failures at that level break the build, just as you do with the unit-level checks.
With these two methods, you can detect integration errors as early as possible in the delivery pipeline. The earlier you detect a problem, the easier it is to fix.
Pitfall 5: Deployments are nightmarish all-night marathons
Problem: Circa 2017 it’s still common to find organizations where people have “release parties” whenever they deploy code to production. Release parties are just like all-night frat parties, only without the fun.
The problem is that the first time applications are executed in a production-like environment is when they are executed in the real production environment. Many issues only become visible when the team tries to deploy to production.
Of course, there’s no time or budget allocated for that. People working in a rush may get the system up and running somehow, but often at the cost of regressions that pop up later in the form of production support issues.
And it’s all because at each stage of the delivery pipeline, the system “worked on my machine,” whether a developer’s laptop, a shared test environment configured differently from production, or some other unreliable environment.
Solution: The solution is to configure every environment throughout the delivery pipeline as close to production as possible. The following are general guidelines that you may need to modify depending on local circumstances.
If you have a staging environment, rather than twin production environments, it should be configured with all internal interfaces live and external interfaces stubbed, mocked, or virtualized. Even if this is as far as you take the idea, it will probably eliminate the need for release parties. But if you can, it’s good to continue upstream in the pipeline, to reduce unexpected delays in promoting code along.
Test environments between development and staging should be running the same version of the OS and libraries as production. They should be isolated at the appropriate boundary based on the scope of testing to be performed.
At the beginning of the pipeline, if it’s possible develop on the same OS and same general configuration as production. It’s likely you will not have as much memory or as many processors as in the production environment. The development environment also will not have any live interfaces; all dependencies external to the application will be faked.
At a minimum, match the OS and release level to production as closely as you can. For instance, if you’ll be deploying to Windows Server 2016, then use a Windows Server 2016 VM to run your quick CI build and unit test suite. Windows Server 2016 is based on NT 10, so do your development work on Windows 10 because it’s also based on NT 10. Similarly, if the production environment is Windows Server 2008 R2 (based on NT 6.1) then develop on Windows 7 (also based on NT 6.1). You won’t be able to eliminate every single configuration difference, but you will be able to avoid the majority of incompatibilities.
Follow the same rule of thumb for Linux targets and development systems. For instance, if you will deploy to RHEL 7.3 (kernel version 3.10.x), then run unit tests on the same OS if possible. Otherwise, look for (or build) a version of CentOS based on the same kernel version as your production RHEL (don’t assume). At a minimum, run unit tests on a Linux distro based on the same kernel version as the target production instance. Do your development on CentOS or a Fedora-based distro to minimize inconsistencies with RHEL.
If you’re using a dynamic infrastructure management approach that includes building OS instances from source, then this problem becomes much easier to control. You can build your development, test, and production environments from the same sources, assuring version consistency throughout the delivery pipeline. But the reality is that very few organizations are managing infrastructure in this way as of 2017. It’s more likely that you’ll configure and provision OS instances based on a published ISO, and then install packages from a private or public repo. You’ll have to pay close attention to versions.
If you’re doing development work on your own laptop or desktop, and you’re using a cross-platform language (Ruby, Python, Java, etc.), you might think it doesn’t matter which OS you use. You might have a nice development stack on Windows or OSX (or whatever) that you’re comfortable with. Even so, it’s a good idea to spin up a local VM running an OS that’s closer to the production environment, just to avoid unexpected surprises.
For embedded development where the development processor is different from the target processor, include a compile step in your low-level TDD cycle with the compiler options set for the target platform. This can expose errors that don’t occur when you compile for the development platform. Sometimes the same version of the same library will exhibit different behaviors when executed on different processors.
Another suggestion for embedded development is to constrain your development environment to have the same memory limits and other resource constraints as the target platform. You can catch certain types of errors early by doing this.
For some of the older back end platforms, it’s possible to do development and unit testing off-platform for convenience. Fairly early in the delivery pipeline, you’ll want to upload your source to an environment on the target platform and buld/test there.
For instance, for a C++ application on, say, HP NonStop, it’s convenient to do TDD on whatever local environment you like (assuming that’s feasible for the type of application), using any compiler and a unit testing framework like CppUnit.
Similarly, it’s convenient to do COBOL development and unit testing on a Linux instance using GnuCOBOL; much faster and easier than using OEDIT on-platform for fine-grained TDD.
But in these cases the target execution environment is very different from the development environment. You’ll want to exercise the code on-platform early in the delivery pipeline to eliminate works-on-my-machine surprises.
Summary
The author’s observation is that the works-on-my-machine problem is one of the leading causes of developer stress and lost time. The author further observes that the main cause of the works-on-my-machine problem is differences in configuration across development, test, and production environments.
The basic advice is to avoid configuration differences to the extent possible. Take pains to ensure all environments are as similar to production as is practical. Pay attention to OS kernel versions, library versions, API versions, compiler versions, and the versions of any home-grown utilities and libraries. When differences can’t be avoided, then make note of them and treat them as risks. Wrap them in test cases to provide early warning of any issues.
The second suggestion is to automate as much testing as possible at different levels of abstraction, merge code frequently, build the application frequently, run the automated test suites frequently, deploy frequently, and (where feasible) build the execution environment frequently. This will help you detect problems early, while the most recent changes are still fresh in your mind, and while the issues are still minor.
Let’s fix the world so that the next generation of software developers doesn’t understand the phrase, “Works on my machine.”
 
            
Comments (2)
Raja Nagendra Kumar
This kind of responses clearly say the primitive engineering skills of the team/systems. No product should ever think of products engineering without proper CI/CD
RC
Bravo! This article hits the nail squarely on the head: the “builds on my machine” fallacy is not just confined to developer workstations, but ALL static “pet” servers, including production.
It astonishes me that, in 2018, the same developers who understand the concept of setUp() and tearDown() in their unit tests don’t understand the importance of applying the same to their deployment environments. As a consequence, their servers are inevitably as unique as snowflakes, and deployment to production becomes a spin of the roulette wheel.
If you’re not able to create virtual machines or containers “on the fly”, then at the very least you should be cleaning your environments after every use, and then recreating their configuration from code (e.g. Ansible) on each subsequent cycle. And this includes Production.