Infrastructure as Code (IaC) Part 2: Configuration Management With Ansible

In this second installment of our Infrastructure as Code series, we’ll bridge the gap between infrastructure provisioning and application deployment by introducing Ansible for configuration management. While Terraform excels at creating and managing infrastructure resources, Ansible shines at configuring and maintaining the software stack that runs on top of that infrastructure.

Ansible is an agentless automation platform that uses SSH to execute tasks across multiple systems simultaneously. Unlike infrastructure provisioning tools, Ansible focuses on configuration management, application deployment, and orchestration. It uses simple, human-readable YAML playbooks to describe automation jobs, making it accessible to both developers and system administrators.

In this tutorial, we’ll take the 5-VM cluster we created in Part 1 and transform it into a fully functional Kubernetes cluster using Ansible. You’ll learn how to:

Integrate Terraform and Ansible for end-to-end infrastructure automation
Build reusable Ansible roles for consistent configuration management
Automate Kubernetes cluster deployment using kubeadm and container runtime setup
Implement dynamic inventory generation that adapts to changing infrastructure
Create deployment pipelines that combine infrastructure provisioning with application readiness

This hands-on approach demonstrates a real-world DevOps pattern where infrastructure provisioning and configuration management work together seamlessly. By the end of this tutorial, you’ll have a complete automation pipeline that can provision virtual machines and configure them into a production-ready Kubernetes cluster with a single command.

Prerequisites and Current State

This tutorial picks up right where Part 1 of our Infrastructure as Code series: Introduction to Terraform left off. To follow along, you’ll need to have already completed that Terraform tutorial and have your 5-VM cluster infrastructure ready. This cluster, made up of one master node and four worker nodes, will be the environment where we deploy Kubernetes.

It’s crucial that you complete Part 1 first because the Ansible configuration we’re about to build relies on the specific VM setup, networking, and cloud-init templates established there. The seamless integration we’ll demonstrate between Terraform and Ansible depends on both tools working with the exact same infrastructure state and SSH key configuration.

Install Ansible

First, we must install the required packages for Ansible:

# On Ubuntu/Debian:
sudo apt update
sudo apt install ansible

# On RHEL/CentOS/Fedora:
sudo dnf install ansible
# or on older systems:
sudo yum install ansible

# On macOS:
brew install ansible

# Alternative: Install via pip (works on any OS with Python):
pip install ansible

Also install required system packages:

# On Ubuntu/Debian:
sudo apt install jq ssh-client

# On RHEL/CentOS/Fedora:
sudo dnf install jq openssh-clients

# On macOS:
brew install jq
# (ssh is already included)

Ansible Collections

Ansible collections are distribution formats for packaging and distributing Ansible content including playbooks, roles, modules, and plugins. They extend Ansible’s core functionality by providing specialized modules for specific platforms and services. For our Kubernetes deployment, we need collections that can interact with Kubernetes APIs and provide additional system administration capabilities.

# Install Kubernetes collection (required for control-plane role)
ansible-galaxy collection install kubernetes.core

# Install community.general collection (often useful)
ansible-galaxy collection install community.general

Python Dependencies

# Install Python Kubernetes client (required by kubernetes.core collection)
pip install kubernetes

# Alternative: install via system package manager
# On Ubuntu/Debian:
sudo apt install python3-kubernetes

# On RHEL/CentOS/Fedora:
sudo dnf install python3-kubernetes

Project Structure

Before diving into Ansible configuration, we need to restructure our project to accommodate both Terraform and Ansible components. This organization follows DevOps best practices by separating infrastructure provisioning from configuration management while maintaining clear relationships between components.

Understanding how Ansible organizes automation content is crucial for building maintainable configurations. Ansible uses several key concepts:

Roles: Reusable units of automation that group related tasks, variables, and files
Playbooks: YAML files that define which roles to apply to which hosts
Inventory: Files that define the hosts and groups that Ansible will manage

Our project structure separates these concerns while enabling seamless integration between Terraform’s infrastructure provisioning and Ansible’s configuration management.

First, create a parent directory to house both your Terraform and Ansible projects. Move your existing introduction-to-terraform directory into this new structure, then create the Ansible directory alongside it.

Your final project structure should look like this:

infrastructure-as-code/
├── Makefile
├── introduction-to-terraform/
|   ├── main.tf           # Primary resource definitions
|   ├── variables.tf      # Input variable declarations
|   ├── outputs.tf        # Output value definitions
|   ├── locals.tf         # Local value computations
|   └── cloud-init/       # VM initialization templates
|       ├── user-data.tpl     # User and SSH configuration
|       └── network-config.tpl # Static IP configuration
└── configuration-with-ansible/
    ├── ansible.cfg                    # Ansible configuration file with SSH settings and output formatting
    ├── generate_inventory.sh          # Script to parse Terraform output and generate Ansible inventory
    ├── inventory.ini                  # Generated inventory file (created by generate_inventory.sh)
    ├── site.yml                       # Main Ansible playbook that orchestrates all roles
    └── roles/                         # Directory containing all Ansible roles
        ├── common/                    # Role for common tasks across all nodes
        │   └── tasks/
        │       └── main.yml           # Disables swap, loads kernel modules, sets sysctl parameters
        ├── containerd/                # Role for container runtime installation and configuration
        │   └── tasks/
        │       └── main.yml           # Installs containerd and configures systemd cgroup driver
        ├── kubernetes/                # Role for Kubernetes component installation
        │   └── tasks/
        │       └── main.yml           # Installs kubelet, kubeadm, kubectl with version pinning
        ├── control-plane/             # Role for Kubernetes master node setup
        │   └── tasks/
        │       └── main.yml           # Runs kubeadm init, sets up kubeconfig, installs Calico CNI
        └── worker/                    # Role for Kubernetes worker node setup
            └── tasks/
                └── main.yml           # Joins worker nodes to the cluster using kubeadm join

Configuring Ansible for Dynamic Infrastructure

Ansible’s configuration file (ansible.cfg) controls how Ansible behaves when connecting to and managing remote hosts. When working with dynamic infrastructure, where IP addresses and host details change frequently, specific configuration optimizations become essential for reliability and performance.

The configuration settings we’ll implement address several challenges common in automated infrastructure environments:

SSH connection optimization reduces overhead when managing multiple hosts simultaneously
Security settings handle the dynamic nature of VM IP addresses and SSH keys
Performance tuning enables faster execution across multiple nodes
User configuration accommodates the cloud-init user setup from our Terraform deployment

These settings ensure Ansible can reliably connect to and manage the VMs created by Terraform, even when those VMs are destroyed and recreated with different SSH host keys.

Create new file configuration-with-ansible/ansible.cfg:

[defaults]
host_key_checking = False
pipelining = True
gathering = smart
fact_caching = memory
stdout_callback = yaml
bin_ansible_callbacks = True

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes

Dynamic Inventory Generation

Static inventory files work well for stable infrastructure, but they become a maintenance burden when working with dynamic infrastructure that changes frequently. In our Terraform-Ansible integration, VM IP addresses and host details are determined at provision time, making static inventory files impractical.

Dynamic inventory generation solves this problem by programmatically extracting infrastructure details from Terraform’s state and converting them into Ansible inventory format. This approach ensures that Ansible always has current, accurate information about the infrastructure it needs to manage, eliminating manual inventory maintenance and reducing the potential for configuration drift.

Create a new file named generate_inventory.sh in the configuration-with-ansible directory:

#!/usr/bin/env bash
set -e

TF_OUTPUT_JSON="$1"
INVENTORY_FILE="$2"

if [[ ! -f "$TF_OUTPUT_JSON" ]]; then
    echo "Error: Terraform output file $TF_OUTPUT_JSON not found"
    exit 1
fi

# Extract SSH configuration from Terraform outputs
SSH_USER=$(jq -r '.ssh_user.value // "ubuntu"' "$TF_OUTPUT_JSON")
SSH_KEY=$(jq -r '.ssh_private_key_path.value // "~/.ssh/id_rsa"' "$TF_OUTPUT_JSON")

# Extract IPs and create inventory entries with SSH config
masters=$(jq -r '.master_ips.value // {} | to_entries[] | "\(.key) ansible_host=\(.value)"' "$TF_OUTPUT_JSON")
workers=$(jq -r '.worker_ips.value // {} | to_entries[] | "\(.key) ansible_host=\(.value)"' "$TF_OUTPUT_JSON")

# Create inventory file
{
    echo "[masters]"
    if [[ -n "$masters" ]]; then
        echo "$masters"
    fi
    echo ""
    echo "[workers]"
    if [[ -n "$workers" ]]; then
        echo "$workers"
    fi
    echo ""
    echo "[all:vars]"
    echo "ansible_user=$SSH_USER"
    echo "ansible_ssh_private_key_file=$SSH_KEY"
    echo "ansible_ssh_common_args='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'"
} > "$INVENTORY_FILE"

echo "Generated inventory file: $INVENTORY_FILE"
cat "$INVENTORY_FILE"

This script demonstrates several important bash scripting and JSON processing techniques:

Error handling: The set -e directive ensures the script exits immediately if any command fails
JSON parsing: Uses jq to extract specific values from Terraform’s JSON output, with fallback defaults using the // operator
String processing: Constructs Ansible inventory entries by combining host names with IP addresses
File generation: Creates a properly formatted INI-style inventory file with host groups and global variables

The script separates master and worker nodes into distinct inventory groups, enabling Ansible to apply different roles to different node types. The [all:vars] section provides SSH configuration that applies to all hosts, ensuring consistent connection behavior across the entire cluster.

Testing Connectivity

After generating the inventory, verify that Ansible can successfully connect to all nodes:

# Make the script executable
chmod +x configuration-with-ansible/generate_inventory.sh

# Test the inventory generation (assuming Terraform has been applied)
cd infrastructure-as-code
configuration-with-ansible/generate_inventory.sh introduction-to-terraform/terraform_output.json configuration-with-ansible/inventory.ini

If everything is working properly, the generate_inventory.sh script should have generated an inventory file and returned something similiar to:

Generated inventory file: configuration-with-ansible/inventory.ini
[masters]
master-1 ansible_host=192.168.122.100

[workers]
worker-1 ansible_host=192.168.122.101
worker-2 ansible_host=192.168.122.102
worker-3 ansible_host=192.168.122.103
worker-4 ansible_host=192.168.122.104

[all:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/id_rsa
ansible_ssh_common_args='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

Now, use ansible to perform a ping test on the hosts in your newly created inventory file:

# Test Ansible connectivity
ANSIBLE_CONFIG=configuration-with-ansible/ansible.cfg ansible -i \
     configuration-with-ansible/inventory.ini all -m ping

You should get output similiar to:


PLAY [Ansible Ad-Hoc] *******************************************************************************

TASK [ping] *****************************************************************************************
ok: [master-1]
ok: [worker-3]
ok: [worker-2]
ok: [worker-1]
ok: [worker-4]

PLAY RECAP ******************************************************************************************
master-1    : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
worker-1    : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
worker-2    : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
worker-3    : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
worker-4    : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Building Reusable Ansible Roles

Ansible roles provide a structured way to organize automation content into reusable components. Each role encapsulates a specific piece of functionality, such as installing containerd, configuring networking, or setting up a database, making it easy to compose complex automation by combining simple, focused roles.

Roles promote consistency across environments by ensuring the same configuration steps are applied identically every time. They also enable collaboration by providing clear interfaces and documentation for automation components. In our Kubernetes deployment, we’ll create specialized roles for each layer of the technology stack, building from basic system configuration up to application-ready cluster components.

The Common Role

The common role establishes the foundational system configuration required for Kubernetes nodes. This includes package installation, kernel parameter tuning, and system service configuration that must be consistent across all cluster members. By implementing these prerequisites in a shared role, we ensure that both master and worker nodes start from an identical, known-good state.

Create a new file configuration-with-ansible/roles/common/tasks/main.yml:

- name: Log system information
  ansible.builtin.debug:
    msg:
      - "=== Node Information ==="
      - "Node: {{ inventory_hostname }}"
      - "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}"
      - "Kernel: {{ ansible_kernel }}"
      - "Memory: {{ ansible_memtotal_mb }}MB"
      - "CPU: {{ ansible_processor_vcpus }} cores"
      - "Architecture: {{ ansible_architecture }}"
      - "========================"

- name: Update apt cache
  ansible.builtin.apt:
    update_cache: yes
    cache_valid_time: 3600

- name: Install required packages
  ansible.builtin.apt:
    name:
      - python3-pip
      - python3-setuptools
      - python3-kubernetes
      - python3-yaml
      - apt-transport-https
      - ca-certificates
      - curl
      - gnupg
      - lsb-release
    state: present

- name: Disable swap
  ansible.builtin.mount:
    path: swap
    fstype: swap
    state: unmounted

- name: Remove swap entry from /etc/fstab
  ansible.builtin.lineinfile:
    path: /etc/fstab
    regexp: '^\s*([^#]\S+\s+\S+\s+swap\s+)'
    state: absent

- name: Ensure required kernel modules are loaded
  ansible.builtin.modprobe:
    name: "{{ item }}"
    state: present
  loop:
    - br_netfilter
    - overlay

- name: Ensure sysctl settings for Kubernetes networking
  ansible.builtin.sysctl:
    name: "{{ item.key }}"
    value: "{{ item.value }}"
    state: present
    reload: yes
  loop:
    - { key: 'net.bridge.bridge-nf-call-iptables', value: 1 }
    - { key: 'net.ipv4.ip_forward', value: 1 }
    - { key: 'net.bridge.bridge-nf-call-ip6tables', value: 1 }


- name: Log package installation results
  ansible.builtin.debug:
    msg: "Installed packages: {{ apt_result.stdout_lines | default([]) }}"
  when: apt_result is defined and apt_result.changed

This role performs several critical system-level configurations:

Package management: Installs Python libraries and system tools required by subsequent roles and Kubernetes components
Swap disabling: Kubernetes requires swap to be disabled for proper memory management and performance
Kernel module loading: Enables container networking features (br_netfilter) and overlay filesystem support (overlay)
Network parameter tuning: Configures kernel parameters for proper container networking and IP forwarding

These configurations address Kubernetes’ specific requirements for the underlying operating system, ensuring that the cluster components can function correctly once installed.

The Containerd Role

The containerd role installs and configures the container runtime that Kubernetes will use to manage application containers. Containerd is a high-performance container runtime that implements the Container Runtime Interface (CRI) specification, making it compatible with Kubernetes. This role ensures that the container runtime is properly integrated with systemd for process management and cgroup handling.

Create a new file configuration-with-ansible/roles/containerd/tasks/main.yml:

---
# roles/containerd/tasks/main.yml
- name: Install containerd
  ansible.builtin.apt:
    name: containerd
    state: present
    update_cache: yes

- name: Configure containerd with systemd cgroup driver
  ansible.builtin.shell: |
    mkdir -p /etc/containerd
    containerd config default | sed 's/SystemdCgroup = false/SystemdCgroup = true/' > /etc/containerd/config.toml
  args:
    creates: /etc/containerd/config.toml

- name: Restart containerd
  ansible.builtin.service:
    name: containerd
    state: restarted
    enabled: yes

The systemd cgroup driver configuration is particularly important as it ensures that containerd and Kubernetes use the same cgroup hierarchy, preventing resource management conflicts.

The Kubernetes Role

The Kubernetes role installs the core Kubernetes components (kubelet, kubeadm, and kubectl) from the official Kubernetes package repository. This role carefully manages package versions to ensure cluster consistency and includes version pinning to prevent unexpected upgrades that could destabilize the cluster.

---
# roles/kubernetes/tasks/main.yml

- name: Update apt cache
  ansible.builtin.apt:
    update_cache: yes

- name: Install required packages
  ansible.builtin.apt:
    name:
      - apt-transport-https
      - ca-certificates
      - curl
      - gpg
    state: present

- name: Create keyrings directory
  ansible.builtin.file:
    path: /etc/apt/keyrings
    state: directory
    mode: '0755'

- name: Add Kubernetes apt key
  ansible.builtin.shell: |
    curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
  args:
    creates: /etc/apt/keyrings/kubernetes-apt-keyring.gpg

- name: Add Kubernetes apt repository
  ansible.builtin.apt_repository:
    repo: "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /"
    state: present
    filename: kubernetes

- name: Install kubelet, kubeadm, kubectl
  ansible.builtin.apt:
    name:
      - kubelet=1.28.*
      - kubeadm=1.28.*
      - kubectl=1.28.*
    state: present
    update_cache: yes

- name: Hold Kubernetes packages
  ansible.builtin.dpkg_selections:
    name: "{{ item }}"
    selection: hold
  loop:
    - kubelet
    - kubeadm
    - kubectl

- name: Enable and start kubelet
  ansible.builtin.service:
    name: kubelet
    enabled: yes
    state: started

This role manages the installation and configuration of Kubernetes components:

Repository management: Adds the official Kubernetes APT repository with proper GPG key verification for security
Package installation: Installs specific versions of kubelet (node agent), kubeadm (cluster bootstrapping tool), and kubectl (command-line interface)
Version pinning: Uses dpkg_selections to prevent automatic package updates that could break cluster compatibility
Service management: Enables the kubelet service so it can be started by kubeadm during cluster initialization

Version pinning is crucial in Kubernetes environments because minor version differences between cluster components can cause compatibility issues or unexpected behavior.

The Control Plane Role

The control plane role transforms a prepared node into a Kubernetes master by initializing the cluster control plane components. This role handles the complex bootstrap process that creates the cluster’s initial state, configures administrative access, and installs essential cluster networking components.

Create a new file configuration-with-ansible/roles/control-plane/tasks/main.yml:

---
# roles/control-plane/tasks/main.yml
- name: Check if kubeadm has already run
  ansible.builtin.stat:
    path: /etc/kubernetes/admin.conf
  register: kubeadm_init_stat

- name: Initialize Kubernetes control plane
  ansible.builtin.command: kubeadm init --pod-network-cidr=192.168.0.0/16
  when: not kubeadm_init_stat.stat.exists
  register: kubeadm_init_result

- name: Create .kube directory for ubuntu user
  ansible.builtin.file:
    path: /home/ubuntu/.kube
    state: directory
    owner: ubuntu
    group: ubuntu
    mode: '0755'

- name: Copy kubeconfig for ubuntu user
  ansible.builtin.copy:
    src: /etc/kubernetes/admin.conf
    dest: /home/ubuntu/.kube/config
    remote_src: yes
    owner: ubuntu
    group: ubuntu
    mode: '0600'

- name: Generate kubeadm join command
  ansible.builtin.shell: kubeadm token create --print-join-command
  register: join_command_result
  when: not kubeadm_init_stat.stat.exists or ansible_play_hosts | length > 1

- name: Save join command to file
  ansible.builtin.copy:
    content: "{{ join_command_result.stdout }}"
    dest: /tmp/kubeadm_join_cmd.sh
    mode: '0755'
  when: join_command_result is defined and join_command_result.stdout is defined

- name: Install Calico CNI
  ansible.builtin.shell: kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
  become_user: ubuntu
  environment:
    KUBECONFIG: /home/ubuntu/.kube/config
  when: not kubeadm_init_stat.stat.exists
  register: calico_install

- name: Wait for Calico Controller to be ready
  ansible.builtin.command:
    cmd: kubectl rollout status deployment/calico-kube-controllers -n kube-system --timeout=300s
  become_user: ubuntu
  environment:
    KUBECONFIG: /home/ubuntu/.kube/config
  changed_when: false
  when: not kubeadm_init_stat.stat.exists

- name: Wait for Calico Node DaemonSet to be ready
  ansible.builtin.command:
    cmd: kubectl rollout status daemonset/calico-node -n kube-system --timeout=300s
  become_user: ubuntu
  environment:
    KUBECONFIG: /home/ubuntu/.kube/config
  changed_when: false
  when: not kubeadm_init_stat.stat.exists

- name: Display Calico installation result
  ansible.builtin.debug:
    var: calico_install
  when: not kubeadm_init_stat.stat.exists

- name: Verify system pods are running
  ansible.builtin.shell: kubectl get pods -n kube-system
  become_user: ubuntu
  environment:
    KUBECONFIG: /home/ubuntu/.kube/config
  register: system_pods
  when: not kubeadm_init_stat.stat.exists

- name: Display system pods status
  ansible.builtin.debug:
    var: system_pods.stdout_lines
  when: not kubeadm_init_stat.stat.exists and system_pods is defined

This role orchestrates the complex process of creating a Kubernetes control plane:

Idempotency checking: Uses the presence of /etc/kubernetes/admin.conf to determine if cluster initialization has already occurred
Cluster initialization: Runs kubeadm init with a specific pod network CIDR that’s compatible with Calico CNI
User access configuration: Sets up kubectl access for the ubuntu user by copying and configuring the kubeconfig file
Join token generation: Creates the kubeadm join command that worker nodes will use to join the cluster
Network plugin installation: Downloads and applies the Calico CNI manifest to enable pod-to-pod networking

The Calico CNI (Container Network Interface) plugin is essential because Kubernetes doesn’t include built-in networking for pod communication across nodes. Calico provides this functionality using BGP routing and network policies.

The Worker Node Role

The worker node role handles the process of joining additional nodes to an existing Kubernetes cluster. This role retrieves the join command generated by the control plane and executes it on worker nodes, establishing secure communication with the cluster and registering the node as available for workload scheduling.

Create a new file configuration-with-ansible/roles/worker/tasks/main.yml:

---
# roles/worker/tasks/main.yml
- name: Check if node is already joined
  ansible.builtin.stat:
    path: /etc/kubernetes/kubelet.conf
  register: kubelet_conf_stat

- name: Fetch join command from master
  ansible.builtin.slurp:
    src: /tmp/kubeadm_join_cmd.sh
  delegate_to: "{{ groups['masters'][0] }}"
  register: join_cmd_content
  when: not kubelet_conf_stat.stat.exists

- name: Join the node to the cluster
  ansible.builtin.shell: "{{ join_cmd_content.content | b64decode | trim }}"
  when: not kubelet_conf_stat.stat.exists and join_cmd_content is defined

This role manages the worker node join process:

Join status checking: Examines the /etc/kubernetes/kubelet.conf file to determine if the node has already joined a cluster
Command retrieval: Uses Ansible’s delegation feature to fetch the join command from the master node without requiring shared storage
Secure joining: Executes the kubeadm join command which establishes encrypted communication with the control plane and registers the node

The delegate_to directive is particularly important here as it allows worker nodes to retrieve information from the master node dynamically, eliminating the need for external coordination mechanisms or shared file systems.

Orchestrating the Deployment

The main playbook (site.yml) serves as the orchestration layer that coordinates the application of our roles across different node types. This playbook demonstrates key Ansible concepts including host groups, role sequencing, and privilege escalation. By organizing the deployment into logical phases, we ensure that dependencies are satisfied and that cluster initialization occurs in the correct order.

The playbook structure reflects the natural dependency hierarchy of Kubernetes cluster deployment: all nodes need basic system configuration first, then the control plane must be established before worker nodes can join. This sequencing is critical for successful cluster initialization.

Create new file configuration-with-ansible/site.yml:

- hosts: all
  become: true
  roles:
    - common
    - containerd
    - kubernetes

- hosts: masters
  become: true
  roles:
    - control-plane

- hosts: workers
  become: true
  roles:
    - worker

In Ansible, become: true tells Ansible to run tasks with elevated privileges, typically as the root user. This is similar to using sudo on the command line.

Why is this needed?

Many system-level tasks (like installing packages, modifying system files, or configuring services) require root access. By setting become: true, you ensure these tasks have the necessary permissions.

Note: If the user ansible is using for SSH does not have sudo privileges, become: true will not work, regardless of it being set to true.

Integration with Terraform

Creating a seamless integration between Terraform and Ansible requires careful coordination of the deployment pipeline. The Makefile provides a automation layer that orchestrates the entire infrastructure lifecycle, from initial provisioning through application deployment to final cleanup.

The automation ensures that infrastructure changes are immediately followed by configuration updates, maintaining consistency between the desired state defined in code and the actual deployed state.

Create a new file in project root called Makefile:

.PHONY: plan apply inventory ansible deploy destroy clean-ssh

TF_DIR := introduction-to-terraform
ANSIBLE_DIR := configuration-with-ansible
INVENTORY := $(ANSIBLE_DIR)/inventory.ini
TF_OUTPUT := $(TF_DIR)/terraform_output.json

plan:
    cd $(TF_DIR) && terraform init && terraform plan

apply:
    cd $(TF_DIR) && terraform init && terraform apply -auto-approve
    cd $(TF_DIR) && terraform output -json > terraform_output.json

inventory: apply
    $(ANSIBLE_DIR)/generate_inventory.sh $(TF_OUTPUT) $(INVENTORY)

wait-for-ssh: inventory
    $(ANSIBLE_DIR)/wait_for_ssh.sh $(INVENTORY)

ansible: wait-for-ssh
    ANSIBLE_CONFIG=$(ANSIBLE_DIR)/ansible.cfg ansible-playbook -i $(INVENTORY) $(ANSIBLE_DIR)/site.yml

deploy: apply inventory wait-for-ssh ansible

destroy:
    cd $(TF_DIR) && terraform destroy -auto-approve
    $(MAKE) clean-ssh

clean-ssh:
    @echo "Clearing SSH known_hosts entries for libvirt VMs..."
    @bash -c 'for ip in {100..104}; do ssh-keygen -f "$$HOME/.ssh/known_hosts" -R "192.168.122.$$ip" 2>/dev/null || true; done'
    @echo "SSH known_hosts cleaned"

For more information on Makefile’s, check out this resource for more examples.

Also, we need to create our helper script that ensures all VM’s are up and reachable before the automation continues. Create a new file configuration-with-ansible/wait_for_ssh.sh:

#!/usr/bin/env bash
set -e

INVENTORY_FILE="$1"

if [[ ! -f "$INVENTORY_FILE" ]]; then
    echo "Error: Inventory file $INVENTORY_FILE not found"
    exit 1
fi

echo "Waiting for SSH to be available on all VMs..."

# Extract IPs from inventory file
ips=$(grep -E "ansible_host=" "$INVENTORY_FILE" | sed 's/.*ansible_host=\([0-9.]*\).*/\1/')

for ip in $ips; do
    echo "Waiting for SSH on $ip..."
    timeout=120
    while ! nc -z "$ip" 22 2>/dev/null && [ $timeout -gt 0 ]; do
        sleep 2
        timeout=$((timeout-2))
    done
    
    if [ $timeout -le 0 ]; then
        echo "Timeout waiting for SSH on $ip"
        exit 1
    else
        echo "SSH available on $ip"
    fi
done

echo "All VMs are ready for SSH connections"

Don’t forget to make the script executable:

chmod +x configuration-with-ansible/wait_for_ssh.sh

Testing and Validation

After deploying your Kubernetes cluster, it’s essential to verify that all components are functioning correctly before deploying workloads. This validation process ensures cluster health and helps identify any configuration issues early.

Verifying Cluster Functionality

Connect to your master node and run these validation commands:

# SSH to the master node (replace IP with your master's IP)
ssh -i ~/.ssh/id_rsa [email protected]

# Check cluster status
kubectl get nodes

# Verify all nodes are in Ready state
kubectl get nodes -o wide

# Check pod status across all namespaces
kubectl get pods --all-namespaces

# Verify Calico networking is working
kubectl get pods -n kube-system | grep calico

Running Test Workloads

Deploy a simple test application to validate cluster functionality:

# Create a test deployment
kubectl create deployment nginx-test --image=nginx:latest --replicas=3

# Expose the deployment as a service
kubectl expose deployment nginx-test --port=80 --target-port=80

# Check if pods are distributed across worker nodes
kubectl get pods -o wide

# Test service connectivity
kubectl get svc nginx-test

Troubleshooting Common Issues

If you encounter problems, check these common areas:

Node connectivity: Ensure all nodes can communicate on the pod network CIDR
Container runtime: Verify containerd is running on all nodes with systemctl status containerd
Kubelet status: Check kubelet logs with journalctl -u kubelet -f
CNI networking: Verify Calico pods are running in the kube-system namespace

Cleanup and Tear Down

Proper cleanup is essential when working with dynamic infrastructure, especially in development and testing environments. The cleanup process must handle both the application layer (Kubernetes cluster) and the infrastructure layer (virtual machines) while managing ancillary effects like SSH known_hosts entries.

Ansible Considerations for Infrastructure Destruction

Unlike some configuration management tools, Ansible doesn’t automatically track and reverse the changes it makes. When destroying infrastructure, it’s often more efficient to destroy the underlying VMs rather than attempting to reverse all configuration changes. However, in production environments, you might want to create specific “cleanup” playbooks for graceful service shutdown and data preservation.

SSH Known_hosts Management

When VMs are destroyed and recreated, their SSH host keys change, leading to SSH connection warnings. The clean-ssh target in our Makefile proactively removes these entries, preventing connection issues in subsequent deployments.

Complete Cleanup Workflow

To tear down the entire environment:

# Destroy everything and clean SSH entries
make destroy

# Or run individual steps
make clean-ssh  # Clean SSH known_hosts only
cd introduction-to-terraform && terraform destroy  # Destroy infrastructure only

This approach ensures complete environment cleanup while maintaining the ability to quickly rebuild the infrastructure for testing or development purposes.

Best Practices and Next Steps

The Terraform and Ansible integration pattern we’ve implemented represents a powerful foundation for production infrastructure automation. However, several considerations become important as you scale this approach or adapt it for production use.

Security Considerations for Production Deployments

Production environments require additional security measures:

SSH key management: Implement proper key rotation and use dedicated service accounts rather than personal SSH keys
Network segmentation: Configure firewalls and network policies to restrict communication between cluster components
Secrets management: Use tools like HashiCorp Vault or Kubernetes secrets for sensitive configuration data
RBAC implementation: Configure Kubernetes Role-Based Access Control to limit user and service permissions

Scaling the Approach for Larger Clusters

As your infrastructure grows, consider these optimizations:

Ansible parallelism: Tune the forks setting in ansible.cfg to manage more nodes simultaneously
Role parameterization: Add variables to roles for different environment configurations (dev, staging, production)
Inventory grouping: Create more sophisticated inventory groups for different node types or environments
State management: Consider using remote state storage for Terraform and Ansible facts caching for improved performance

Adding Monitoring and Logging Roles

Extend the automation with additional roles for operational capabilities:

Prometheus monitoring: Create roles for metrics collection and alerting
Log aggregation: Implement centralized logging with tools like Fluentd or Logstash
Backup automation: Add roles for automated backup and disaster recovery procedures

Version Management and GitOps Integration

For production-ready deployments, implement version control and deployment automation:

Git workflow: Store all configuration in version control with proper branching strategies
CI/CD integration: Automate testing and deployment using tools like GitLab CI or GitHub Actions
Immutable infrastructure: Consider implementing blue-green deployments for safer production updates
Configuration drift detection: Implement monitoring to detect when actual configuration diverges from desired state

Conclusion

The integration of Terraform and Ansible represents a powerful paradigm in Infrastructure as Code that addresses the complete infrastructure lifecycle. By combining Terraform’s declarative infrastructure provisioning with Ansible’s flexible configuration management, we’ve created an automation pipeline that can consistently deploy complex, multi-tier applications from bare metal to production-ready state.

Benefits of the Terraform + Ansible Approach

This integrated approach offers several key advantages:

Separation of concerns: Infrastructure provisioning and application configuration are handled by tools optimized for each task
Flexibility: Changes to infrastructure or application configuration can be made independently
Reusability: Ansible roles can be applied to infrastructure provisioned by any tool, not just Terraform
Testability: Each layer can be tested independently, improving reliability and debugging capability
Scalability: The pattern scales from development environments to large production deployments

When to Use This Pattern vs Alternatives

This Terraform-Ansible pattern works best when:

You need complex, multi-step configuration that goes beyond basic package installation
Your infrastructure spans multiple environments or cloud providers
You require fine-grained control over the configuration process
Your team has expertise in both infrastructure and configuration management

Alternative approaches like cloud-init, Helm charts, or container-based deployments may be more appropriate for simpler use cases or when working within specific ecosystems like Kubernetes-native applications.

Further Learning Resources

To deepen your understanding of Infrastructure as Code and expand on the concepts covered in this series:

Terraform Documentation: HashiCorp’s official documentation for advanced provider usage and state management
Ansible Documentation: Red Hat’s Ansible documentation for advanced playbook patterns and enterprise features
Kubernetes the Hard Way: Kelsey Hightower’s tutorial for understanding Kubernetes internals
Infrastructure as Code Patterns: Explore advanced patterns in Kief Morris’s “Infrastructure as Code” book
GitOps Practices: Learn about ArgoCD and Flux for Kubernetes-native deployment automation

The foundation you’ve built with this two-part series provides a solid base for exploring more advanced DevOps patterns and tools. Whether you’re managing a homelab or preparing for production deployments, the principles of declarative infrastructure and automated configuration management will serve you well as you continue to build reliable, scalable systems.

Stay Tuned…

In Part 3 of this series, we’ll take the next logical step by enhancing our Kubernetes cluster with production-ready components including MetalLB load balancing, Istio service mesh for traffic management, and persistent storage solutions, all automated through Ansible.

As always, you can find all the code examples and configuration files from this tutorial in our GitHub repository.