I wrote this blog real quick, errors may exist, YMMV.
I’ve been exploring DevOps tooling for immutable infrastructure and ephemeral server environments. I’ve been doing this work inside a company that relies primarily on Microsoft and Windows Technology, a departure from my usual OSS and *nix tech stacks.
During the course of my trials, I came across a need to speed up orchestration and provisioning of virtual machines in Azure. I had already developed an orchestration and provisioning framework with Ansible, but my total time to spin up a new box took between 25–30 minutes. While this isn’t an awful amount of time, there was some excitement to see if we could get this down to about 5 minutes.
In order to speed up the process of creating new VM’s I had to deal with the time it takes for provisioning. Creating a VM in Azure takes about 3–5 minutes of time. However, running through all of my Ansible playbooks took 20–25 minutes. If I could eliminate the 20–25 minutes of provisioning, I could achieve our 5 minute goal to create a fully configured VM. To do that, we need to build an image of a fully provisioned VM ahead of time, and then deploy a copy of that image. Enter Packer.
I’m generally a big fan of HashiCorp tech. While packer has some impressive attributes it’s a bit rough around the edges. In particular there’s some room for improvement around launching provisioning, exposing variables outside of the packer process, and documentation. Those problems are compounded when you take tooling built with a *nix philosophy and start applying it to proprietary MS technology (WinRM).
There are a couple of gotcha’s with the default WinRM setup on a packer compute node. I won’t spend much time on them but here they are.
1 — By default Kerberos is the only auth mechanism available.
2 — winrm_password variable doesn’t work(the default randomized password can’t be overridden)
3 — There are a fixed set of variables that can be passed to your provisioner and the randomized hostname of the compute node isn’t one of them. This also applies to the ip address.
4 — The “suggested/default” mechanism for using Ansible to provision with packer is to install an Ansible plugin specifically to wrap the packer connection to the vm. I don’t understand this approach and it seems like a pita.
Those are some problems, but what is the solution? The solution I came up with boils down to the following
1 — Use the “shell-local” provisioner to call Ansible with your local shell as though you were manually running Ansible from the cli.
2 — Override the default randomized compute node name to something static that we can expect to exist later in the process.
3 — Use Ansible dynamic inventory to search for VM’s with the compute node name we specified in 2. This way ansible will search Azure based on hostname and dynamically create an inventory file for us with the packer compute node’s IP address
4 — Use the “powershell” provisioner to enable basic auth (local username and password) in winrm before launching our Ansible provisioner.
Lets take a look at our packer json file.
Our first section “variables” is where we are defining our known vm compute node name. When packer launches an azure vm to base our image off of this variable will provide the hostname and vm name in azure for the VM.
Our “builders” section is fairly typical. There are a few things to note:
1 — I’m defining a separate file with my azure credentials. That file get’s passed in as a command line argument e.g.
“packer -var-file=~/azure-secrets.json packerbuild.json”
2 — I’ve specified a virtual network and subnet to build our VM inside of. This will force packer to build our vm on a private network inside of Azure. This will also ensure our box is given a private ip address and not exposed to the open internet.
3 — Anything else in our builder should be fairly typical packer directives.
Our “provisioners” section has some magic.
1 — In our provisioners we are exposing some internal packer variables to our environment. This will set a corresponding shell variable the randomized packer vm password and our known compute node name.
“environment_vars”: [“WINRMPASS={{.WinRMPassword}}”, ”COMPUTE_NAME={{user `computename`}}”],
2 — We enable basic authentication to winrm with a powershell provisioner.
winrm set winrm/config/service/auth ‘@{Basic=\”true\”}’”
3 — We provide our hostname to Ansible dynamic inventory (more on this later).
echo \” — name != ‘$COMPUTE_NAME’\” >> dynamic_azure_rm.yml
4 — We launch Ansible to provision our VM.
ansible-playbook -vvv -i \”dynamic_azure_rm.yml\” -u packer -e ansible_password=\”$WINRMPASS\” -e@../variables/vars.yml ../playbooks/provisionedbox.yml
5 — We generalize our VM with sysprep. (This allows creation of a generic image)
& $env:SystemRoot\\System32\\Sysprep\\Sysprep.exe /oobe /generalize /quiet /quit”, “while($true) { $imageState = Get-ItemProperty HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Setup\\State | Select ImageState; if($imageState.ImageState -ne ‘IMAGE_STATE_GENERALIZE_RESEAL_TO_OOBE’) { Write-Output $imageState.ImageState; Start-Sleep -s 10 } else { break } }
After this runs through we get our packer built image, but whats the deal with Ansible and dynamic inventory?
There’s a lot that could and has been said on dynamic inventory with Ansible. I won’t re-invent the wheel. However, I will show you my example and talk a bit about it.
Here’s my example of a dynamic inventory yaml file for azure.
This script get’s parsed, queries for VM’s in an azure resource group, and then excludes any VM that with a hostname that passes evaluation of “!= known-vm-name”. In other words, query for all vm’s with the host name of known-vm-name. The results of that query are provided to Ansible as an inventory file complete with the IP address Ansible needs to connect to our remote packer node.
Quick mention to our variables.yml file. This provides some configuration paramaters for our winrm connection. Ansible expects a solid network connection over WinRM. I’ve found that to be lacking in my current environment (not sure why). I’ve tossed in some params to increase timeout values. Also, lets’ ignore that self signed certificate on our target node.
All in all that’s the gist. There’s a lot more that could be said on the nuances of Packer, Ansible, WinRM, Azure, and the like. It’s worth pointing out that there is some discussion in the packer community around better methods of exposing variables and hand off’s to provisioners. Hopefully, that disccusion will lead to future improvements. In the meantime, here’s a workaround to a problem a few folks (myself included) seem to have with the current model.
