DeepThought.sh
Infrastructure

Infrastructure as Code in Azure: Security Hardening and Configuration With Ansible

Part 2 of our Infrastructure as Code in Azure series, where we establish security baselines and harden Azure infrastructure by integrating Terraform provisioning with Ansible-driven configuration management.

Aaron Mathis
42 min read
Infrastructure as Code in Azure: Security Hardening and Configuration With Ansible

Continuing our Infrastructure as Code in Azure series, this article bridges the gap between infrastructure provisioning and application configuration by introducing Ansible for automated management. In Part 1, we launched a single VM in Azure using Terraform. Now, we’ll build on that foundation by using Ansible to harden security and configure the VM for production workloads.

Ansible is a powerful, agentless automation tool that uses SSH to execute tasks across systems. Unlike Terraform, which focuses on resource provisioning, Ansible excels at configuration management, application deployment, and orchestration. Its YAML-based playbooks are easy to read and write, making automation accessible to both developers and operations teams.

In this tutorial, you’ll learn how to:

  • Combine Terraform and Ansible for a streamlined automation workflow
  • Develop reusable Ansible roles to standardize configuration and security
  • Automate VM setup including hardening SSH, configuring firewalls, and more
  • Generate dynamic inventories that reflect your current infrastructure state
  • Build deployment pipelines that integrate provisioning and configuration steps

This practical guide demonstrates how infrastructure provisioning and configuration management can work together for efficient, reliable deployments. By the end, you’ll have an automated pipeline capable of launching and configuring a secure, production-ready VM in Azure with a single command.

Prerequisites and Current State

This tutorial continues from Part 1: Infrastructure as Code in Azure – Introduction to Terraform. To proceed, ensure you have completed Part 1 and provisioned your Azure VM using Terraform. The security hardening and configuration steps in this guide assume your VM, networking, and cloud-init setup match the previous tutorial.

Completing Part 1 is essential, as the Ansible automation here depends on the infrastructure state and SSH key configuration established earlier. This ensures seamless integration between Terraform provisioning and Ansible-driven configuration management.

Install Ansible

First, we must install the required packages for Ansible:

# On Ubuntu/Debian:
sudo apt update
sudo apt install ansible

# On RHEL/CentOS/Fedora:
sudo dnf install ansible
# or on older systems:
sudo yum install ansible

# On macOS:
brew install ansible

# Alternative: Install via pip (works on any OS with Python):
pip install ansible

Also install required system packages:

# On Ubuntu/Debian:
sudo apt install jq ssh-client

# On RHEL/CentOS/Fedora:
sudo dnf install jq openssh-clients

# On macOS:
brew install jq
# (ssh is already included)

Ansible Collections

Ansible collections are distribution formats for packaging and distributing Ansible content including playbooks, roles, modules, and plugins. They extend Ansible’s core functionality by providing specialized modules for specific platforms and services.


# Install community.general collection (often useful)
ansible-galaxy collection install community.general

Project Structure

Before diving into Ansible configuration, we need to restructure our project to accommodate both Terraform and Ansible components. This organization follows DevOps best practices by separating infrastructure provisioning from configuration management while maintaining clear relationships between components.

Understanding how Ansible organizes automation content is crucial for building maintainable configurations. Ansible uses several key concepts:

  • Roles: Reusable units of automation that group related tasks, variables, and files
  • Playbooks: YAML files that define which roles to apply to which hosts
  • Inventory: Files that define the hosts and groups that Ansible will manage

Our project structure separates these concerns while enabling seamless integration between Terraform’s infrastructure provisioning and Ansible’s configuration management.

Begin by creating a provisioning directory at the root of your project. Move all existing Terraform files into this folder to organize your infrastructure code. Next, create a configuration-management directory alongside provisioning to house your Ansible configuration and roles.

# Create provisioning directory
mkdir provisioning/

# Move all files (except .gitignore) into provisioning
mkdir provisioning && mv *.tf .env *.tfstate *.tfvars *.hcl .terraform provisioning

# Create configuration-management directory
mkdir configuration-management/

Your final project structure should look like this:

infrastructure-in-azure/
├── Makefile                           # Automation workflow integrating Terraform and Ansible
├── provisioning/
|	├── .env		                  # Environment variables for Azure authentication
|	├── .gitignore	                  # Git ignore file for Terraform state and secrets
|	├── budgets.tf                    # Resource definition for budget guardrails
|	├── main.tf                       # Primary resource definitions (VM, networking, etc.)
|	├── outputs.tf                    # Output value definitions for Ansible integration
|	└── variables.tf                  # Input variable declarations
└── configuration-management/
    ├── ansible.cfg                    # Ansible configuration file with SSH settings and output formatting
    ├── deploy.sh                      # Main deployment script that orchestrates the entire process
    ├── requirements.yml               # Ansible Galaxy collections and dependencies
    ├── site.yml                       # Main Ansible playbook that orchestrates all roles
    ├── group_vars/                    # Group-level variables for different host types
   ├── all.yml                    # Variables that apply to all hosts
   └── azure_vms.yml             # Variables specific to Azure virtual machines
    ├── inventory/
   └── hosts.yml                  # Dynamic inventory file with VM connection details
    └── roles/                         # Directory containing all Ansible roles for security hardening
        ├── system-hardening/          # Role for OS security hardening and updates
   └── tasks/
       └── main.yml           # System security configurations and package management
        ├── firewall/                  # Role for UFW firewall configuration
   ├── tasks/
   └── main.yml           # Firewall rules and network security setup
   └── handlers/
       └── main.yml           # Service restart handlers for firewall changes
        ├── ssh-hardening/             # Role for SSH security hardening
   ├── tasks/
   └── main.yml           # SSH daemon configuration and security settings
   └── handlers/
       └── main.yml           # SSH service restart handlers
        ├── fail2ban/                  # Role for intrusion prevention system
   ├── tasks/
   └── main.yml           # Fail2ban installation and configuration
   ├── handlers/
   └── main.yml           # Fail2ban service management handlers
   └── templates/
       └── jail.local.j2      # Fail2ban configuration template
        ├── time-sync/                 # Role for network time synchronization
   ├── tasks/
   └── main.yml           # Chrony installation and time sync configuration
   ├── handlers/
   └── main.yml           # Chrony service restart handlers
   └── templates/
       └── chrony.conf.j2     # Chrony configuration template
        ├── azure-monitor/             # Role for Azure monitoring and system metrics
   ├── tasks/
   └── main.yml           # Monitoring tools installation and configuration
   └── handlers/
       └── main.yml           # Monitoring service handlers
        └── cron-jobs/                 # Role for automated maintenance tasks
            └── tasks/
                └── main.yml           # System backup, log rotation, and monitoring cron jobs

Configuring Ansible for Azure VM Management

Ansible’s configuration file (ansible.cfg) defines how Ansible connects to and manages our Azure VM. Since we’re working with cloud infrastructure that can be destroyed and recreated, we need specific configuration settings that handle the dynamic nature of cloud VMs and streamline the connection process.

The configuration settings we’ll implement address several key requirements for our Azure VM automation:

  • Cloud-friendly SSH settings that handle VMs being destroyed and recreated with new SSH host keys
  • Authentication configuration that works seamlessly with our cloud-init user setup from Part 1
  • Privilege escalation to run security hardening tasks that require root access
  • Performance optimizations for reliable execution over internet connections to Azure

These settings ensure Ansible can consistently connect to our Azure VM and execute our security hardening playbook, even when the VM is destroyed and recreated during testing and development.

Create the Ansible configuration file configuration-management/ansible.cfg:

[defaults]
host_key_checking = False
inventory = inventory/hosts.yml
remote_user = azureuser
private_key_file = ~/.ssh/id_rsa
timeout = 30
interpreter_python = /usr/bin/python3
gathering = smart
fact_caching = memory
fact_caching_timeout = 86400

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o PubkeyAcceptedKeyTypes=+ssh-rsa
pipelining = True
control_path_dir = /tmp/.ansible-cp

[inventory]
enable_plugins = host_list, script, auto, yaml, ini, toml

[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False

Let’s break down these configuration settings:

[defaults] Section:

  • host_key_checking = False: Disables SSH host key verification, essential for cloud VMs that get new SSH keys when recreated
  • inventory = inventory/hosts.yml: Specifies the default inventory file location where our VM details are stored
  • remote_user = azureuser: Sets the default SSH user that matches our cloud-init configuration from Part 1
  • private_key_file = ~/.ssh/id_rsa: Points to the SSH private key file we generated in Part 1
  • gathering = smart: Optimizes fact gathering by caching system information to improve performance
  • fact_caching = memory: Stores gathered facts in memory for the duration of the playbook run

[ssh_connection] Section:

  • ssh_args: Configures SSH connection optimization with persistent connections and bypasses host key checking for cloud environments
  • pipelining = True: Reduces the number of SSH operations by sending multiple commands in one connection, improving performance over internet connections
  • control_path_dir = /tmp/.ansible-cp: Specifies where SSH control sockets are stored for connection reuse

[privilege_escalation] Section:

  • become = True: Automatically escalates privileges using sudo for security hardening tasks that require root access
  • become_method = sudo: Specifies sudo as the privilege escalation method
  • become_ask_pass = False: Assumes passwordless sudo is configured (which our cloud-init setup from Part 1 provides)

Dynamic Inventory Generation

Unlike static environments where server details rarely change, cloud infrastructure is inherently dynamic. IP addresses, hostnames, and even the number of VMs can change with each deployment. Static inventory files become a maintenance burden and source of errors in such environments.

Dynamic inventory generation solves this challenge by programmatically extracting current infrastructure state from Terraform and converting it into Ansible-compatible inventory format. This ensures Ansible always has accurate, up-to-date information about the infrastructure it needs to manage.

Our deployment script automatically generates the inventory file by extracting the VM’s public IP address from Terraform’s output and writing it to the inventory file. This approach eliminates manual inventory management and reduces deployment errors.

Create the deployment script configuration-management/deploy.sh:

#!/bin/bash
# Ansible deployment script for Azure VM configuration
set -e

# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Configuration
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
TERRAFORM_DIR="../provisioning"
ANSIBLE_DIR="."

# Function to print colored output
print_status() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

print_warning() {
    echo -e "${YELLOW}[WARN]${NC} $1"
}

print_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Check if Ansible is installed
check_ansible() {
    if ! command -v ansible &> /dev/null; then
        print_error "Ansible is not installed. Please install Ansible 11.x"
        exit 1
    fi
    
    ANSIBLE_VERSION=$(ansible --version | head -n1 | awk '{print $3}')
    print_status "Found Ansible version: $ANSIBLE_VERSION"
}

# Install Ansible collections
install_collections() {
    print_status "Installing required Ansible collections..."
    ansible-galaxy collection install -r requirements.yml --force
}

# Get VM IP from Terraform output and write inventory file
get_vm_ip() {
    if [ -f "$TERRAFORM_DIR/terraform.tfstate" ]; then
        VM_IP=$(cd "$TERRAFORM_DIR" && terraform output -raw public_ip_address 2>/dev/null || echo "")
        if [ -n "$VM_IP" ]; then
            print_status "Found VM IP from Terraform: $VM_IP"
            print_status "Writing dynamic inventory file..."
            
            # Write the inventory file dynamically
            cat > inventory/hosts.yml << EOF
all:
  children:
    azure_vms:
      hosts:
        demo-vm:
          ansible_host: $VM_IP
          ansible_user: azureuser
          ansible_ssh_private_key_file: ~/.ssh/id_rsa
          ansible_python_interpreter: /usr/bin/python3
      vars:
        ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
        environment_name: "{{ env | default('dev') }}"
        vm_size: Standard_B1s
        location: eastus
EOF
            print_status "Inventory file updated with VM IP: $VM_IP"
        else
            print_warning "Could not get VM IP from Terraform output"
            print_warning "Please manually update inventory/hosts.yml with the correct IP"
        fi
    else
        print_warning "Terraform state file not found"
        print_warning "Please manually update inventory/hosts.yml with the VM IP"
    fi
}

# Test connectivity to VM
test_connectivity() {
    print_status "Testing connectivity to VM..."
    if ansible all -m ping -i inventory/hosts.yml &> /dev/null; then
        print_status "Successfully connected to VM"
    else
        print_error "Cannot connect to VM. Please check:"
        echo "  1. VM IP address in inventory/hosts.yml"
        echo "  2. SSH key permissions"
        echo "  3. Network security group rules"
        echo "  4. VM is running"
        exit 1
    fi
}

# Run Ansible playbook
deploy_configuration() {
    print_status "Starting Ansible deployment..."
    
    # Run the playbook with verbose output
    ansible-playbook -i inventory/hosts.yml site.yml \
        --ssh-extra-args='-o StrictHostKeyChecking=no' \
        -v
    
    if [ $? -eq 0 ]; then
        print_status "Deployment completed successfully!"
        print_status "Your secure VM is now configured and ready for use!"
    else
        print_error "Deployment failed!"
        exit 1
    fi
}

# Main execution
main() {
    print_status "Starting Azure VM configuration deployment..."
    
    cd "$SCRIPT_DIR"
    
    check_ansible
    install_collections
    get_vm_ip
    test_connectivity
    deploy_configuration
    
    print_status "Deployment script completed!"
}

# Run main function
main "$@"

Make sure the file is executable:

chmod +x deploy.sh

This deployment script demonstrates several important DevOps automation patterns:

Error Handling and Validation:

  • Uses set -e to exit immediately on any command failure
  • Implements comprehensive pre-flight checks for Ansible installation and connectivity
  • Provides clear, colored output to help users understand deployment progress and issues

Dynamic Infrastructure Management:

  • Extracts VM IP address directly from Terraform state using terraform output
  • Automatically generates inventory files, eliminating manual configuration
  • Gracefully handles missing Terraform state with appropriate warnings

Modular Execution Flow:

  • Separates concerns into discrete functions (connectivity testing, collection installation, etc.)
  • Enables easy debugging and maintenance of individual deployment steps
  • Provides a clear execution flow that matches typical deployment workflows

Create the Ansible collections requirements file configuration-management/requirements.yml:

---
# Ansible Galaxy requirements for Azure VM configuration
collections:
  - name: community.general
    version: ">=7.0.0"
  - name: ansible.posix
    version: ">=1.5.0"

This requirements file ensures we have access to extended Ansible modules for system administration and POSIX-specific operations that our hardening roles will need.


Group Variables

Group variables in Ansible provide a powerful way to define configuration values that apply to groups of hosts or all hosts in your inventory. They follow a hierarchical precedence model where more specific variables override general ones, enabling flexible configuration management across different environments and host types.

Variables defined in group_vars/all.yml apply to every host in your inventory, while variables in group_vars/azure_vms.yml apply only to hosts in the azure_vms group. This separation allows for environment-specific customization while maintaining consistent base configurations.

Create the global variables file configuration-management/group_vars/all.yml:

# Common variables for all hosts
ssh_allowed_users: ["azureuser"]
ssh_port: 22
fail2ban_maxretry: 3
fail2ban_bantime: 3600
fail2ban_findtime: 600

# System configuration
timezone: "UTC"
ntp_servers:
  - 0.pool.ntp.org
  - 1.pool.ntp.org
  - 2.pool.ntp.org
  - 3.pool.ntp.org

# Cron configuration
backup_schedule: "0 2 * * *"  # Daily at 2 AM
log_rotation_schedule: "0 3 * * *"  # Daily at 3 AM

# Azure Monitor configuration
azure_monitor_enabled: true

# Common Azure VM settings, moved from azure_vms.yml
firewall_rules:
  - port: "{{ ssh_port | default(22) }}"
    proto: tcp
    rule: allow
    src: "{{ allowed_ssh_cidr | default('any') }}"

disable_ipv6: true
kernel_parameters:
  - { name: net.ipv4.ip_forward, value: 0 }
  - { name: net.ipv4.conf.all.send_redirects, value: 0 }
  - { name: net.ipv4.conf.default.send_redirects, value: 0 }

These global variables establish security baselines and operational standards:

Security Configuration:

  • ssh_allowed_users: Restricts SSH access to specific user accounts, implementing the principle of least privilege
  • fail2ban_* variables: Configure intrusion prevention thresholds (3 failed attempts, 1-hour ban, 10-minute detection window)

System Standards:

  • timezone: Standardizes time zone across all systems for consistent logging and scheduling
  • ntp_servers: Defines reliable time synchronization sources for accurate system clocks

Operational Automation:

  • backup_schedule and log_rotation_schedule: Establish automated maintenance windows during low-usage hours
  • azure_monitor_enabled: Controls whether monitoring components are installed and configured

Network Security:

  • firewall_rules: Defines specific firewall rules for Azure VMs, allowing SSH access while maintaining security
  • disable_ipv6: Disables IPv6 to reduce attack surface in environments that don’t require it

Kernel Hardening:

  • net.ipv4.ip_forward: Disables IP forwarding to prevent the VM from being used as a router
  • send_redirects parameters: Prevent ICMP redirect attacks by disabling redirect message transmission

The inventory file structure ties everything together. The deployment script automatically generates configuration-management/inventory/hosts.yml with the current VM IP address, but here’s the structure it creates:

all:
  children:
    azure_vms:
      hosts:
        demo-vm:
          ansible_host: <VM_IP_FROM_TERRAFORM>
          ansible_user: azureuser
          ansible_ssh_private_key_file: ~/.ssh/id_rsa
          ansible_python_interpreter: /usr/bin/python3
      vars:
        ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
        environment_name: "{{ env | default('dev') }}"
        vm_size: Standard_B1s
        location: eastus

This inventory structure demonstrates YAML-based inventory format:

  • Hierarchical organization: Groups hosts logically (azure_vms) while maintaining flexibility for expansion
  • Host-specific variables: Individual connection parameters for each VM, with the IP address automatically populated from Terraform output
  • Group-level variables: Connection and environment settings that apply to all Azure VMs

The <VM_IP_FROM_TERRAFORM> placeholder gets replaced with the actual public IP address when the deployment script runs, ensuring the inventory always reflects the current infrastructure state.


Building Reusable Ansible Roles

Ansible roles provide a structured way to organize automation tasks, variables, files, and templates into reusable components. Each role focuses on a specific aspect of system configuration, making playbooks more maintainable and enabling code reuse across different environments and projects.

Our security hardening approach implements defense-in-depth through multiple specialized roles. Each role addresses specific attack vectors and compliance requirements, working together to create a comprehensive security posture.

System Hardening Role

The system-hardening role establishes the foundation for a secure system by implementing operating system-level security controls, automated patch management, and basic system monitoring.

Create configuration-management/roles/system-hardening/tasks/main.yml:

---
- name: Update all packages
  ansible.builtin.apt:
    upgrade: dist
    update_cache: true
    autoremove: true
    autoclean: true
  register: update_result
  retries: 3
  delay: 10

- name: Install essential security packages
  ansible.builtin.apt:
    name:
      - unattended-upgrades
      - apt-listchanges
      - update-notifier-common
      - curl
      - wget
      - vim
      - htop
      - ufw
      - rsyslog
      - logrotate
    state: present

- name: Configure automatic security updates
  ansible.builtin.copy:
    dest: /etc/apt/apt.conf.d/20auto-upgrades
    content: |
      APT::Periodic::Update-Package-Lists "1";
      APT::Periodic::Unattended-Upgrade "1";
      APT::Periodic::AutocleanInterval "7";
    mode: '0644'
    backup: true

- name: Configure unattended upgrades
  ansible.builtin.copy:
    dest: /etc/apt/apt.conf.d/50unattended-upgrades
    content: |
      Unattended-Upgrade::Allowed-Origins {
          "${distro_id}:${distro_codename}";
          "${distro_id}:${distro_codename}-security";
          "${distro_id}:${distro_codename}-updates";
      };
      Unattended-Upgrade::AutoFixInterruptedDpkg "true";
      Unattended-Upgrade::MinimalSteps "true";
      Unattended-Upgrade::Remove-Unused-Dependencies "true";
      Unattended-Upgrade::Automatic-Reboot "false";
    mode: '0644'
    backup: true

- name: Set kernel parameters for security
  ansible.posix.sysctl:
    name: "{{ item.name }}"
    value: "{{ item.value }}"
    state: present
    sysctl_set: true
    reload: true
  loop: "{{ kernel_parameters }}"
  when: kernel_parameters is defined

- name: Disable IPv6 if configured
  ansible.posix.sysctl:
    name: "{{ item }}"
    value: 1
    state: present
    sysctl_set: true
    reload: true
  loop:
    - net.ipv6.conf.all.disable_ipv6
    - net.ipv6.conf.default.disable_ipv6
    - net.ipv6.conf.lo.disable_ipv6
  when: disable_ipv6 | default(false)

- name: Remove unnecessary packages
  ansible.builtin.apt:
    name:
      - telnet
      - rsh-client
      - rsh-redone-client
    state: absent
    autoremove: true

- name: Ensure rsyslog is running
  ansible.builtin.systemd:
    name: rsyslog
    state: started
    enabled: true

- name: Configure log rotation
  ansible.builtin.copy:
    dest: /etc/logrotate.d/custom
    content: |
      /var/log/*.log {
          daily
          missingok
          rotate 52
          compress
          delaycompress
          notifempty
          create 0644 root root
      }
    mode: '0644'

This role implements several critical security measures:

Automated Patch Management:

  • Configures unattended-upgrades to automatically apply security patches
  • Enables daily package list updates and weekly cleanup of package cache
  • Implements retry logic for package updates to handle temporary network issues

Kernel Security Hardening:

  • Applies sysctl parameters from group variables to harden network stack
  • Optionally disables IPv6 to reduce attack surface when not needed
  • Configures network parameters to prevent IP forwarding and redirect attacks

Attack Surface Reduction:

  • Removes legacy remote access tools (telnet, rsh) that transmit credentials in clear text
  • Installs only essential security tools and system monitoring utilities
  • Configures comprehensive log rotation to prevent disk space exhaustion

Firewall Role

The firewall role implements network-level security using Ubuntu’s Uncomplicated Firewall (UFW). It follows a default-deny approach, explicitly allowing only necessary network traffic while logging denied connections for security monitoring.

Create configuration-management/roles/firewall/tasks/main.yml:

---
- name: Reset UFW to defaults
  community.general.ufw:
    state: reset
  notify: Restart ufw

- name: Set UFW default policies
  community.general.ufw:
    direction: "{{ item.direction }}"
    policy: "{{ item.policy }}"
  loop:
    - { direction: 'incoming', policy: 'deny' }
    - { direction: 'outgoing', policy: 'allow' }
    - { direction: 'routed', policy: 'deny' }

- name: Configure UFW rules
  community.general.ufw:
    rule: "{{ item.rule }}"
    port: "{{ item.port }}"
    proto: "{{ item.proto }}"
    src: "{{ item.src | default(omit) }}"
    comment: "{{ item.comment | default(omit) }}"
  loop: "{{ firewall_rules }}"
  notify: Restart ufw

- name: Enable UFW logging
  community.general.ufw:
    logging: 'on'

- name: Enable UFW
  community.general.ufw:
    state: enabled

- name: Ensure UFW is enabled and running
  ansible.builtin.systemd:
    name: ufw
    state: started
    enabled: true

Create the corresponding handler file configuration-management/roles/firewall/handlers/main.yml:

---
- name: Restart ufw
  ansible.builtin.systemd:
    name: ufw
    state: restarted

- name: Reload ufw
  ansible.builtin.command: ufw --force reload
  changed_when: true

The firewall role implements defense-in-depth network security:

Default-Deny Security Model:

  • Denies all incoming traffic by default, requiring explicit rules for allowed services
  • Allows outgoing traffic to support system updates and normal operations
  • Denies routed traffic to prevent the VM from being used as a gateway

Flexible Rule Management:

  • Uses group variables to define firewall rules, enabling environment-specific configurations
  • Supports source IP restrictions for enhanced security
  • Enables comprehensive logging for security monitoring and incident response

Handler-Based Service Management:

  • Uses Ansible handlers to restart UFW only when configuration changes occur
  • Ensures firewall service starts automatically on system boot

SSH Hardening Role

SSH hardening is critical for cloud VMs since SSH is often the primary remote access method. This role implements SSH security best practices to prevent brute force attacks, disable weak authentication methods, and enable comprehensive logging.

Create configuration-management/roles/ssh-hardening/tasks/main.yml:

---
- name: Backup original SSH config
  ansible.builtin.copy:
    src: /etc/ssh/sshd_config
    dest: /etc/ssh/sshd_config.backup
    remote_src: true
    force: false
    mode: '0600'

- name: Configure SSH hardening
  ansible.builtin.lineinfile:
    path: /etc/ssh/sshd_config
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
    backup: true
  loop:
    - { regexp: '^#?PermitRootLogin ', line: 'PermitRootLogin no' }
    - { regexp: '^#?PasswordAuthentication ', line: 'PasswordAuthentication no' }
    - { regexp: '^#?PubkeyAuthentication ', line: 'PubkeyAuthentication yes' }
    - { regexp: '^#?AuthorizedKeysFile ', line: 'AuthorizedKeysFile .ssh/authorized_keys' }
    - { regexp: '^#?PermitEmptyPasswords ', line: 'PermitEmptyPasswords no' }
    - { regexp: '^#?ChallengeResponseAuthentication ', line: 'ChallengeResponseAuthentication no' }
    - { regexp: '^#?UsePAM ', line: 'UsePAM yes' }
    - { regexp: '^#?X11Forwarding ', line: 'X11Forwarding no' }
    - { regexp: '^#?PrintMotd ', line: 'PrintMotd no' }
    - { regexp: '^#?TCPKeepAlive ', line: 'TCPKeepAlive yes' }
    - { regexp: '^#?ClientAliveInterval ', line: 'ClientAliveInterval 300' }
    - { regexp: '^#?ClientAliveCountMax ', line: 'ClientAliveCountMax 2' }
    - { regexp: '^#?MaxAuthTries ', line: 'MaxAuthTries 3' }
    - { regexp: '^#?MaxSessions ', line: 'MaxSessions 2' }
    - { regexp: '^#?Protocol ', line: 'Protocol 2' }
    - { regexp: '^#?LogLevel ', line: 'LogLevel VERBOSE' }
  notify: Restart ssh

- name: Configure allowed users
  ansible.builtin.lineinfile:
    path: /etc/ssh/sshd_config
    regexp: '^#?AllowUsers '
    line: 'AllowUsers {{ ssh_allowed_users | join(" ") }}'
    backup: true
  when: ssh_allowed_users is defined
  notify: Restart ssh

- name: Configure strong ciphers and MACs
  ansible.builtin.blockinfile:
    path: /etc/ssh/sshd_config
    block: |
      # Strong Ciphers and MACs
      Ciphers [email protected],[email protected],[email protected],aes256-ctr,aes192-ctr,aes128-ctr
      MACs [email protected],[email protected],hmac-sha2-256,hmac-sha2-512
      KexAlgorithms [email protected],diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
    marker: "# {mark} ANSIBLE MANAGED BLOCK - SSH CRYPTO"
    backup: true
  notify: Restart ssh

- name: Validate SSH configuration
  ansible.builtin.command: sshd -t
  changed_when: false

- name: Ensure SSH service is enabled
  ansible.builtin.systemd:
    name: ssh
    enabled: true

Create the SSH handler file configuration-management/roles/ssh-hardening/handlers/main.yml:

---
- name: Restart ssh
  ansible.builtin.systemd:
    name: ssh
    state: restarted

This SSH hardening role implements comprehensive SSH security:

Authentication Security:

  • Disables root login and password authentication, enforcing key-based authentication
  • Restricts SSH access to specific user accounts defined in group variables
  • Limits authentication attempts and concurrent sessions to prevent brute force attacks

Cryptographic Hardening:

  • Configures strong encryption ciphers and message authentication codes (MACs)
  • Uses modern key exchange algorithms that provide forward secrecy
  • Disables weak cryptographic options that could be exploited

Session Management:

  • Configures client alive intervals to terminate idle sessions
  • Enables verbose logging for security auditing
  • Validates SSH configuration before applying changes to prevent lockouts

Fail2ban Role

Fail2ban provides automated intrusion prevention by monitoring log files for malicious activity and temporarily banning IP addresses that exceed failure thresholds. This role is particularly important for Internet-facing VMs that attract automated attacks.

Create configuration-management/roles/fail2ban/tasks/main.yml:

---
- name: Install fail2ban
  ansible.builtin.apt:
    name: fail2ban
    state: present

- name: Create fail2ban local configuration
  ansible.builtin.template:
    src: jail.local.j2
    dest: /etc/fail2ban/jail.local
    backup: true
    mode: '0644'
  notify: Restart fail2ban

- name: Create SSH jail configuration
  ansible.builtin.copy:
    dest: /etc/fail2ban/jail.d/ssh.conf
    content: |
      [sshd]
      enabled = true
      port = 22
      filter = sshd
      logpath = /var/log/auth.log
      maxretry = {{ fail2ban_maxretry }}
      bantime = {{ fail2ban_bantime }}
      findtime = {{ fail2ban_findtime }}
    mode: '0644'
    backup: true
  notify: Restart fail2ban

- name: Start and enable fail2ban
  ansible.builtin.systemd:
    name: fail2ban
    state: started
    enabled: true

Create the fail2ban handler configuration-management/roles/fail2ban/handlers/main.yml:

---
- name: Restart fail2ban
  ansible.builtin.systemd:
    name: fail2ban
    state: restarted

- name: Reload fail2ban
  ansible.builtin.systemd:
    name: fail2ban
    state: reloaded

Create the fail2ban configuration template configuration-management/roles/fail2ban/templates/jail.local.j2:

[DEFAULT]
# Ban hosts for one hour:
bantime = {{ fail2ban_bantime }}

# Override /etc/fail2ban/jail.d/00-firewalld.conf:
banaction = ufw

# A host is banned if it has generated "maxretry" during the last "findtime"
# seconds.
findtime = {{ fail2ban_findtime }}

# "maxretry" is the number of failures before a host get banned.
maxretry = {{ fail2ban_maxretry }}

# Destination email address used solely for the interpolations in
# jail.{conf,local,d/*} configuration files.
destemail = root@localhost

# Sender email address used solely for some actions
sender = root@localhost

# Email action configuration
action = %(action_mw)s

[sshd]
enabled = true
port = 22
filter = sshd
logpath = /var/log/auth.log
maxretry = {{ fail2ban_maxretry }}
bantime = {{ fail2ban_bantime }}
findtime = {{ fail2ban_findtime }}

The fail2ban role provides adaptive security through automated threat response:

Intelligent Threat Detection:

  • Monitors authentication logs for failed SSH login attempts
  • Uses configurable thresholds to balance security with usability
  • Integrates with UFW firewall for immediate IP address blocking

Template-Based Configuration:

  • Uses Jinja2 templates to generate configuration files from group variables
  • Enables environment-specific tuning of detection and ban parameters
  • Supports easy extension to monitor additional services

Time Synchronization Role

Accurate time synchronization is critical for security logging, authentication protocols, and distributed system coordination. This role configures chrony for reliable NTP synchronization with multiple time sources.

Create configuration-management/roles/time-sync/tasks/main.yml:

---
- name: Install chrony
  ansible.builtin.apt:
    name: chrony
    state: present
    update_cache: true

- name: Configure chrony
  ansible.builtin.template:
    src: chrony.conf.j2
    dest: /etc/chrony/chrony.conf
    backup: true
    mode: '0644'
  notify: Restart chrony

- name: Start and enable chrony service
  ansible.builtin.systemd:
    name: chrony
    state: started
    enabled: true

- name: Wait for chrony to start
  ansible.builtin.wait_for:
    timeout: 10
  delegate_to: localhost

- name: Force time synchronization
  ansible.builtin.command: chronyc makestep
  changed_when: true
  failed_when: false

- name: Check chrony sources
  ansible.builtin.command: chronyc sources -v
  register: chrony_sources
  changed_when: false

- name: Display chrony sources
  ansible.builtin.debug:
    var: chrony_sources.stdout_lines
  when: chrony_sources is defined

Create the chrony handler configuration-management/roles/time-sync/handlers/main.yml:

---
- name: Restart chrony
  ansible.builtin.systemd:
    name: chrony
    state: restarted

Create the chrony configuration template configuration-management/roles/time-sync/templates/chrony.conf.j2:

# Use public servers from the pool.ntp.org project.
{% for server in ntp_servers %}
pool {{ server }} iburst
{% endfor %}

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony/chrony.keys

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

The time synchronization role ensures accurate system clocks:

Reliable Time Sources:

  • Configures multiple NTP pool servers for redundancy
  • Uses the iburst option for faster initial synchronization
  • Implements drift compensation for improved long-term accuracy

Template-Driven Configuration:

  • Uses Jinja2 loops to iterate over NTP servers from group variables
  • Enables easy customization of time sources for different environments
  • Includes comprehensive logging and monitoring configuration

Azure Monitor Role

The azure-monitor role sets up system monitoring and prepares for integration with Azure’s monitoring services. While full Azure Monitor integration requires workspace credentials, this role provides comprehensive local monitoring capabilities.

Create configuration-management/roles/azure-monitor/tasks/main.yml:

---
- name: Add Microsoft GPG key
  ansible.builtin.apt_key:
    url: https://packages.microsoft.com/keys/microsoft.asc
    state: present
  when: azure_monitor_enabled | default(false)

- name: Add Microsoft Azure CLI repository
  ansible.builtin.apt_repository:
    repo: "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ {{ ansible_distribution_release }} main"
    state: present
    update_cache: true
  when: azure_monitor_enabled | default(false)

- name: Install Azure Monitor Agent (alternative method)
  ansible.builtin.apt:
    name:
      - azure-cli
    state: present
    update_cache: true
  when: azure_monitor_enabled | default(false)

- name: Download Azure Monitor Agent installer script
  ansible.builtin.get_url:
    url: "https://raw.githubusercontent.com/microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh"
    dest: /tmp/onboard_agent.sh
    mode: '0755'
  when: azure_monitor_enabled | default(false)
  failed_when: false

- name: Install Azure Monitor Agent via script (if available)
  ansible.builtin.command: /tmp/onboard_agent.sh -w YOUR_WORKSPACE_ID -s YOUR_SHARED_KEY
  when:
    - azure_monitor_enabled | default(false)
    - false  # Disabled until workspace credentials are provided
  failed_when: false
  changed_when: false

- name: Install basic monitoring tools instead
  ansible.builtin.apt:
    name:
      - sysstat
      - iotop
      - nethogs
      - ncdu
      - htop
      - nmon
    state: present

- name: Configure sysstat
  ansible.builtin.lineinfile:
    path: /etc/default/sysstat
    regexp: '^ENABLED='
    line: 'ENABLED="true"'
    backup: true
  notify: Restart sysstat

- name: Start and enable sysstat
  ansible.builtin.systemd:
    name: sysstat
    state: started
    enabled: true

- name: Create basic monitoring script
  ansible.builtin.copy:
    dest: /usr/local/bin/system-monitor.sh
    content: |
      #!/bin/bash
      # Basic system monitoring script
      echo "=== System Monitor Report ==="
      echo "Date: $(date)"
      echo "Uptime: $(uptime)"
      echo "Load Average: $(cat /proc/loadavg)"
      echo "Memory Usage:"
      free -h
      echo "Disk Usage:"
      df -h
      echo "Top 5 CPU processes:"
      ps aux --sort=-%cpu | head -6
      echo "Top 5 Memory processes:"
      ps aux --sort=-%mem | head -6
    mode: '0755'

- name: Schedule basic monitoring
  ansible.builtin.cron:
    name: "system monitor"
    minute: "*/15"
    job: "/usr/local/bin/system-monitor.sh >> /var/log/system-monitor.log 2>&1"

- name: Clean up downloaded files
  ansible.builtin.file:
    path: "{{ item }}"
    state: absent
  loop:
    - /tmp/azuremonitoragent.deb
    - /tmp/onboard_agent.sh

Create the monitoring handler configuration-management/roles/azure-monitor/handlers/main.yml:

---
- name: Restart sysstat
  ansible.builtin.service:
    name: sysstat
    state: restarted
    enabled: true

The azure-monitor role provides comprehensive system monitoring:

Azure Integration Preparation:

  • Installs Azure CLI for potential cloud service integration
  • Downloads Azure Monitor Agent installer for future use
  • Configures conditional installation based on workspace credentials

Local Monitoring Capabilities:

  • Installs essential system monitoring tools (sysstat, htop, iotop, etc.)
  • Creates custom monitoring scripts for regular system health checks
  • Schedules automated monitoring reports every 15 minutes

Cron Jobs Role

The cron-jobs role implements automated maintenance tasks essential for production systems, including system backups, log rotation, and proactive monitoring.

Create configuration-management/roles/cron-jobs/tasks/main.yml:

---
- name: Install necessary packages for cron
  ansible.builtin.package:
    name:
      - cron
      - logrotate
    state: present

- name: Ensure cron service is running
  ansible.builtin.systemd:
    name: cron
    state: started
    enabled: true

- name: Create backup directory
  ansible.builtin.file:
    path: /var/backups/system
    state: directory
    owner: root
    group: root
    mode: '0755'

- name: Create backup scripts directory
  ansible.builtin.file:
    path: /opt/scripts
    state: directory
    owner: root
    group: root
    mode: '0755'

- name: Create system backup script
  ansible.builtin.copy:
    dest: /opt/scripts/system_backup.sh
    content: |
      #!/bin/bash
      # System backup script
      set -e

      BACKUP_DIR="/var/backups/system"
      DATE=$(date +%Y%m%d_%H%M%S)
      BACKUP_FILE="system_backup_${DATE}.tar.gz"

      # Create backup
      tar -czf "${BACKUP_DIR}/${BACKUP_FILE}" \
          --exclude='/proc' \
          --exclude='/tmp' \
          --exclude='/sys' \
          --exclude='/dev' \
          --exclude='/var/backups' \
          --exclude='/var/cache' \
          --exclude='/var/tmp' \
          /etc \
          /home/{{ ansible_user }}/.ssh \
          /var/log 2>/dev/null || true

      # Keep only last 7 days of backups
      find "${BACKUP_DIR}" -name "system_backup_*.tar.gz" -mtime +7 -delete

      # Log backup completion
      echo "$(date): System backup completed - ${BACKUP_FILE}" >> /var/log/backup.log
    mode: '0755'

- name: Create log rotation script
  ansible.builtin.copy:
    dest: /opt/scripts/log_rotation.sh
    content: |
      #!/bin/bash
      # Log rotation script
      set -e

      # Rotate application logs
      find /var/log -name "*.log" -size +100M -exec gzip {} \;
      find /var/log -name "*.gz" -mtime +30 -delete

      # Clean old journal logs
      journalctl --vacuum-time=30d

      # Log rotation completion
      echo "$(date): Log rotation completed" >> /var/log/maintenance.log
    mode: '0755'

- name: Create system monitoring script
  ansible.builtin.copy:
    dest: /opt/scripts/system_monitor.sh
    content: |
      #!/bin/bash
      # System monitoring script
      set -e

      LOG_FILE="/var/log/system_monitor.log"

      # Check disk usage
      DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
      if [ "$DISK_USAGE" -gt 85 ]; then
          echo "$(date): WARNING - Disk usage is ${DISK_USAGE}%" >> "$LOG_FILE"
      fi

      # Check memory usage
      MEM_USAGE=$(free | grep Mem | awk '{printf("%.0f", $3/$2 * 100.0)}')
      if [ "$MEM_USAGE" -gt 90 ]; then
          echo "$(date): WARNING - Memory usage is ${MEM_USAGE}%" >> "$LOG_FILE"
      fi

      # Log system status
      echo "$(date): System check completed - Disk: ${DISK_USAGE}%, Memory: ${MEM_USAGE}%" >> "$LOG_FILE"
    mode: '0755'

- name: Set up log rotation for system logs
  ansible.builtin.cron:
    name: "logrotate system logs"
    minute: "0"
    hour: "0"
    job: "/usr/sbin/logrotate /etc/logrotate.conf"
    user: root

- name: Clean temporary files daily
  ansible.builtin.cron:
    name: "clean temp files"
    minute: "30"
    hour: "3"
    job: "find /tmp -type f -atime +7 -delete && find /var/tmp -type f -atime +30 -delete"
    user: root

- name: Update package cache weekly
  ansible.builtin.cron:
    name: "update package cache"
    minute: "0"
    hour: "2"
    weekday: "0"
    job: "apt-get update"
    user: root

- name: Security updates check
  ansible.builtin.cron:
    name: "security updates check"
    minute: "0"
    hour: "6"
    job: "apt list --upgradable 2>/dev/null | grep -i security > /var/log/security-updates.log"
    user: root

- name: Disk usage monitoring
  ansible.builtin.cron:
    name: "disk usage alert"
    minute: "*/30"
    job: "df -h | awk '$5 > 80 {print $0}' | mail -s 'Disk Usage Alert' root@localhost"
    user: root

- name: System backup (if enabled)
  ansible.builtin.cron:
    name: "system backup"
    minute: "0"
    hour: "1"
    job: "/opt/scripts/system_backup.sh"
    user: root
  when: enable_system_backup | default(false)

- name: System monitoring cron job
  ansible.builtin.cron:
    name: "system monitoring"
    minute: "*/15"
    job: "/opt/scripts/system_monitor.sh"
    user: root

- name: Log rotation cron job
  ansible.builtin.cron:
    name: "log rotation"
    minute: "0"
    hour: "2"
    job: "/opt/scripts/log_rotation.sh"
    user: root

- name: Create log files with proper permissions
  ansible.builtin.file:
    path: "{{ item }}"
    state: touch
    owner: root
    group: root
    mode: '0644'
  loop:
    - /var/log/backup.log
    - /var/log/maintenance.log
    - /var/log/system_monitor.log
    - /var/log/updates.log

The cron-jobs role automates essential maintenance tasks:

Backup Automation:

  • Creates compressed system backups including configuration files and user data
  • Implements automatic cleanup to prevent disk space exhaustion
  • Logs backup operations for monitoring and compliance

Log Management:

  • Rotates large log files to prevent disk space issues
  • Compresses old logs and removes outdated archives
  • Manages systemd journal logs with vacuum operations

Proactive Monitoring:

  • Monitors disk and memory usage with configurable thresholds
  • Generates alerts when resource usage exceeds safe limits
  • Schedules regular update checks to identify available security patches

Orchestrating the Deployment

The main Ansible playbook orchestrates all roles and defines the overall deployment workflow. Playbooks use YAML syntax to describe the desired state of systems and specify which roles should be applied to which host groups.

Create the main playbook configuration-management/site.yml:

---
- name: Configure Azure VM for Production Deployment
  hosts: azure_vms
  become: true
  gather_facts: true

  pre_tasks:
  - name: Wait for cloud-init to finish
    command: cloud-init status --wait
    changed_when: false
    failed_when: false        # skip on images without cloud-init

  - name: Wait for dpkg / apt locks to clear (max 10 min)
    shell: |
      lsof /var/lib/dpkg/lock-frontend /var/lib/apt/lists/lock || true
    register: lock_check
    retries: 20               # 20 × 30 s = 10 min
    delay: 30
    until: lock_check.stdout == ""
    changed_when: false

  - name: Update package cache
    ansible.builtin.apt:
      update_cache: yes
      cache_valid_time: 3600
    retries: 5                # extra safety in case a timer kicks in
    delay: 15
    register: apt_cache
    until: apt_cache is succeeded
    tags: [system, updates]

  roles:
    - role: system-hardening
      tags: [system, security, hardening]

    - role: firewall
      tags: [security, firewall]

    - role: ssh-hardening
      tags: [security, ssh]

    - role: fail2ban
      tags: [security, fail2ban]

    - role: time-sync
      tags: [system, time]

    - role: azure-monitor
      tags: [monitoring, azure]

    - role: cron-jobs
      tags: [system, cron, backup]

  post_tasks:
    - name: Verify all services are running
      ansible.builtin.service:
        name: "{{ item }}"
        state: started
        enabled: true
      loop:
        - fail2ban
        - chrony
      tags: [verification]

    - name: Display deployment summary
      ansible.builtin.debug:
        msg:
          - "Deployment completed successfully!"
          - "SSH Port: {{ ssh_port | default('22') }}"
      tags: [summary]

This playbook demonstrates several Ansible best practices:

Structured Execution Flow:

  • pre_tasks run before roles to ensure the system is ready for configuration
  • Roles execute in dependency order (base hardening before service-specific security)
  • post_tasks verify successful deployment and provide status information

Tag-Based Execution:

  • Each role and task includes relevant tags for selective execution
  • Enables running specific security configurations without full deployment
  • Supports maintenance workflows (e.g., ansible-playbook site.yml --tags=security)

Error Handling and Verification:

  • Includes verification steps to ensure critical services are running
  • Provides deployment summary with key configuration details
  • Uses gather_facts to collect system information for template rendering

Integration with Terraform

The integration between Terraform and Ansible is orchestrated through a comprehensive Makefile that provides a unified interface for infrastructure provisioning and configuration management. This approach follows DevOps best practices by creating a single entry point for complex multi-tool workflows.

Create a new file called Makefile in the infrastructure-in-azure project root:

# Makefile for Azure VM provisioning and configuration
# This integrates Terraform (infrastructure) with Ansible (configuration)

.PHONY: help plan apply destroy configure deploy clean status ssh validate logs

# Default variables
TERRAFORM_DIR := provisioning
ANSIBLE_DIR := configuration-management
# Use full path or shell expansion for SSH key
SSH_KEY := $(HOME)/.ssh/id_rsa

# Colors for output
GREEN := \033[0;32m
YELLOW := \033[1;33m
RED := \033[0;31m
NC := \033[0m # No Color

# Force bash usage for advanced features
SHELL := /bin/bash

help: ## Show this help message
	@echo "Azure VM Infrastructure & Configuration Management"
	@echo "=================================================="
	@echo ""
	@echo "Available targets:"
	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "  $(YELLOW)%-15s$(NC) %s\n", $$1, $$2}'
	@echo ""
	@echo "Typical workflow:"
	@echo "  1. make plan     - Review infrastructure changes"
	@echo "  2. make deploy   - Deploy infrastructure + configuration"
	@echo "  3. make ssh      - Connect to the VM"
	@echo "  4. make destroy  - Clean up resources"

check-prereqs: ## Check if required tools are installed
	@echo -e "$(GREEN)Checking prerequisites...$(NC)"
	@command -v terraform >/dev/null 2>&1 || { echo -e "$(RED)Error: terraform is not installed$(NC)" >&2; exit 1; }
	@command -v ansible >/dev/null 2>&1 || { echo -e "$(RED)Error: ansible is not installed$(NC)" >&2; exit 1; }
	@if [ -f "$(SSH_KEY)" ]; then \
		echo -e "$(GREEN)✓ Found SSH key: $(SSH_KEY)$(NC)"; \
	else \
		echo -e "$(RED)Error: SSH key not found at $(SSH_KEY)$(NC)"; \
		echo -e "$(YELLOW)Please ensure your SSH key exists at $(SSH_KEY)$(NC)"; \
		exit 1; \
	fi
	@echo -e "$(GREEN)✓ All prerequisites satisfied$(NC)"

plan: check-prereqs ## Plan Terraform infrastructure changes
	@echo -e "$(GREEN)Planning infrastructure changes...$(NC)"
	cd $(TERRAFORM_DIR) && source .env && terraform init && terraform plan
	@echo -e "$(GREEN)✓ Plan completed$(NC)"

apply: check-prereqs ## Apply Terraform infrastructure changes
	@echo -e "$(GREEN)Applying infrastructure changes...$(NC)"
	cd $(TERRAFORM_DIR) && source .env && terraform apply -auto-approve
	@echo -e "$(GREEN)✓ Infrastructure deployed$(NC)"

configure: ## Run Ansible configuration on existing VM
	@echo -e "$(GREEN)Configuring VM with Ansible...$(NC)"
	cd $(ANSIBLE_DIR) && ./deploy.sh
	@echo -e "$(GREEN)✓ VM configuration completed$(NC)"

deploy: plan apply configure ## Full deployment: provision infrastructure + configure VM
	@echo -e "$(GREEN)Full deployment completed!$(NC)"
	@echo ""
	@echo -e "$(YELLOW)Deployment Summary:$(NC)"
	@echo "=================="
	@cd $(TERRAFORM_DIR) && echo "VM Name: $$(terraform output -raw vm_name)"
	@cd $(TERRAFORM_DIR) && echo "Public IP: $$(terraform output -raw public_ip_address)"
	@cd $(TERRAFORM_DIR) && echo "SSH Command: $$(terraform output -raw ssh_connection_command)"
	@cd $(TERRAFORM_DIR) && echo "Web App: http://$$(terraform output -raw public_ip_address)"
	@echo ""
	@echo -e "$(GREEN)Your VM is ready for use!$(NC)"

status: ## Show current infrastructure status
	@echo -e "$(GREEN)Infrastructure Status:$(NC)"
	@echo "====================="
	@if [ -f "$(TERRAFORM_DIR)/terraform.tfstate" ]; then \
		cd $(TERRAFORM_DIR) && terraform show -json | jq -r '.values.root_module.resources[] | select(.type=="azurerm_linux_virtual_machine") | "VM Status: " + .values.name + " (" + .values.location + ")"' 2>/dev/null || echo "VM: Deployed"; \
		echo "Public IP: $$(cd $(TERRAFORM_DIR) && terraform output -raw public_ip_address 2>/dev/null || echo 'Not available')"; \
		echo "Resource Group: $$(cd $(TERRAFORM_DIR) && terraform output -raw resource_group_name 2>/dev/null || echo 'Not available')"; \
	else \
		echo -e "$(YELLOW)No infrastructure deployed$(NC)"; \
	fi

ssh: ## SSH into the VM
	@echo -e "$(GREEN)Connecting to VM...$(NC)"
	@cd $(TERRAFORM_DIR) && $$(terraform output -raw ssh_connection_command 2>/dev/null) || { echo -e "$(RED)Cannot get SSH command. Is the VM deployed?$(NC)" >&2; exit 1; }

test: ## Test the deployed web application
	@echo -e "$(GREEN)Testing web application...$(NC)"
	@VM_IP=$$(cd $(TERRAFORM_DIR) && terraform output -raw public_ip_address 2>/dev/null) && \
	if [ -n "$$VM_IP" ]; then \
		echo "Testing HTTP connection to $$VM_IP..."; \
		curl -v --connect-timeout 10 "http://$$VM_IP" || echo -e "$(YELLOW)Connection failed$(NC)"; \
	else \
		echo -e "$(RED)VM IP not available$(NC)"; \
	fi

logs: ## Show Ansible deployment logs
	@echo -e "$(GREEN)Recent deployment logs:$(NC)"
	@if [ -f "$(ANSIBLE_DIR)/ansible.log" ]; then \
		tail -50 "$(ANSIBLE_DIR)/ansible.log"; \
	else \
	echo -e "$(YELLOW)No deployment logs found$(NC)"; \
	fi

clean: ## Clean Terraform cache and temporary files
	@echo -e "$(GREEN)Cleaning temporary files...$(NC)"
	rm -rf $(TERRAFORM_DIR)/.terraform
	rm -f $(TERRAFORM_DIR)/.terraform.lock.hcl
	rm -f $(TERRAFORM_DIR)/terraform.tfstate.backup
	rm -f $(ANSIBLE_DIR)/ansible.log
	@echo -e "$(GREEN)✓ Cleanup completed$(NC)"

validate: ## Validate Terraform and Ansible configurations
	@echo -e "$(GREEN)Validating configurations...$(NC)"
	cd $(TERRAFORM_DIR) && terraform fmt
	cd $(TERRAFORM_DIR) && source .env && terraform init -backend=false
	cd $(TERRAFORM_DIR) && source .env && terraform validate
	cd $(ANSIBLE_DIR) && ansible-playbook --syntax-check site.yml -i inventory/hosts.yml
	@echo -e "$(GREEN)✓ All configurations valid$(NC)"

destroy: ## Destroy all infrastructure
	@echo -e "$(RED)WARNING: This will destroy all infrastructure!$(NC)"
	@read -p "Are you sure? [y/N] " -n 1 -r; \
	echo ""; \
	if [[ $$REPLY =~ ^[Yy]$$ ]]; then \
		echo -e "$(GREEN)Destroying infrastructure...$(NC)"; \
		cd $(TERRAFORM_DIR) && source .env && terraform destroy -auto-approve; \
		echo -e "$(GREEN)✓ Infrastructure destroyed$(NC)"; \
	else \
		echo -e "$(YELLOW)Destroy cancelled$(NC)"; \
	fi

# Set default target
.DEFAULT_GOAL := help

This Makefile provides several key automation benefits:

Unified Workflow Management:

  • Single entry point for complex multi-tool deployments
  • Consistent command interface across development and production environments
  • Built-in help system and colored output for improved user experience

Safety and Validation:

  • Prerequisite checking ensures required tools and credentials are available
  • Configuration validation prevents deployment of invalid configurations
  • Interactive confirmation for destructive operations

Operational Efficiency:

  • Combined deploy target orchestrates infrastructure provisioning and configuration
  • Status checking and log viewing for troubleshooting
  • Cleanup targets for development environment management

Go ahead and test the makefile out by running make help to see usage information:

Terminal output of running make help


Testing and Validation

After deployment, it’s crucial to verify that all security hardening measures are functioning correctly. This testing approach validates both the technical implementation and the security posture of your hardened VM.

Connectivity Testing

First, test basic connectivity and verify the dynamic inventory generation:

# Test the deployment script
make deploy

# Verify connectivity using Ansible
cd configuration-management
ansible all -m ping -i inventory/hosts.yml

# Check SSH connectivity manually
ssh azureuser@<VM_IP> -i ~/.ssh/id_rsa

Security Validation

Verify that security hardening measures are active:

# SSH into the VM
make ssh

# Check firewall status
sudo ufw status

# Verify fail2ban is running
sudo systemctl status fail2ban

Terminal output of running make help

# Check SSH configuration
sudo sshd -T -f /etc/ssh/sshd_config | \
	grep -Ei '(permitrootlogin|passwordauthentication|maxauthtries|port)'

# Verify time synchronization
chronyc sources -v

Terminal output of running make help

Service Monitoring

Verify that all monitoring and maintenance services are operational:

# Check cron jobs
sudo crontab -l

# Verify system monitoring logs
sudo tail -f /var/log/system_monitor.log

# Check backup operations
sudo ls -la /var/backups/system/

Tag-Based Testing

Use Ansible tags to test specific configuration areas (on your workstation):

# Test only security configurations
ansible-playbook -i inventory/hosts.yml site.yml --tags=security --check

# Verify monitoring setup
ansible-playbook -i inventory/hosts.yml site.yml --tags=monitoring --check

# Test firewall configuration
ansible-playbook -i inventory/hosts.yml site.yml --tags=firewall --check

Cleanup and Tear Down

When you’re finished with the tutorial or want to start fresh, proper cleanup prevents ongoing Azure charges and maintains a clean development environment.

Automated Cleanup

Use the Makefile for safe, interactive cleanup:

# Destroy infrastructure with confirmation
make destroy

# Clean temporary files and caches
make clean

Manual Cleanup

For more granular control over the cleanup process:

# Destroy infrastructure using Terraform directly
cd provisioning
source .env
terraform destroy

# Remove Ansible temporary files
cd ../configuration-management
rm -f ansible.log
rm -rf .ansible

Azure Resource Verification

Verify all resources are properly deleted:

# Check resource group contents
az resource list --resource-group <your-resource-group-name>

# Delete resource group if empty
az group delete --name <your-resource-group-name> --yes

Best Practices and Next Steps

Throughout this tutorial, we’ve built more than just a hardened Azure VM, we’ve created a foundation for enterprise-grade infrastructure automation that scales well beyond single-server deployments. The patterns and practices we’ve implemented here mirror those used by DevOps teams managing infrastructure at scale across major organizations.

The defense-in-depth security approach we’ve implemented demonstrates how multiple security layers work together to create a robust security posture. Rather than relying on a single security measure, we’ve combined firewall rules, SSH hardening, intrusion prevention, and automated monitoring to create overlapping security controls. This approach ensures that if one security layer is compromised, others remain in place to protect your infrastructure.

Our configuration management strategy using Ansible roles provides several key advantages that become even more valuable as your infrastructure grows. The idempotent operations we’ve built ensure that running the same playbook multiple times produces consistent results, making deployments predictable and safe. Version-controlling all our infrastructure and configuration code means every change is tracked, auditable, and reversible, critical requirements for production environments.

The operational excellence patterns we’ve established go beyond basic system administration. Automated backup and log rotation prevent the operational issues that can take down production systems, while proactive monitoring with configurable alerting thresholds helps you catch problems before they impact users. The comprehensive logging we’ve implemented supports both security auditing and troubleshooting, providing the visibility needed to maintain complex systems.

As you move toward production deployments, several scaling considerations become important. Multi-environment support requires separate variable files for development, staging, and production environments, along with environment-specific security policies and monitoring thresholds. Implementing automated testing pipelines for configuration changes ensures that infrastructure modifications are validated before reaching production systems.

For larger deployments, high availability becomes critical. This involves load balancer configuration for multi-VM deployments, database clustering and backup strategies, and comprehensive disaster recovery planning and testing. The role-based architecture we’ve built provides an excellent foundation for these more complex scenarios.

Compliance and governance requirements in enterprise environments can be addressed through automated compliance scanning and reporting, policy as code using tools like Open Policy Agent, and regular security assessments and penetration testing. The audit trails and documentation we’ve built into our Infrastructure as Code approach provide the foundation for meeting these requirements.

The skills and patterns you’ve learned in this tutorial transfer directly to larger, more complex deployments. The role-based architecture we’ve built can be extended to support additional services like web servers, databases, and application stacks. The group variable system provides the flexibility needed to manage multiple environments and host types while maintaining consistent base configurations across your infrastructure.


Conclusion

In this tutorial, we’ve successfully bridged the gap between infrastructure provisioning and application readiness by implementing comprehensive security hardening and configuration management using Ansible. Building on the foundation established in Part 1, we’ve created a production-ready virtual machine that implements security best practices and automated maintenance procedures.

The key accomplishments of this tutorial include:

Comprehensive Security Implementation:

  • Multi-layered security approach including firewall configuration, SSH hardening, and intrusion prevention
  • Automated security updates and patch management
  • Network-level protection with intelligent threat detection and response

Automated Configuration Management:

  • Dynamic inventory generation that adapts to changing infrastructure
  • Reusable Ansible roles that can be applied across multiple environments
  • Template-driven configuration that enables environment-specific customization

Operational Excellence:

  • Automated backup and monitoring systems for proactive maintenance
  • Comprehensive logging and alerting for security monitoring
  • Integration workflows that combine infrastructure provisioning with application configuration

DevOps Integration:

  • Unified automation workflows through Makefile orchestration
  • Version-controlled infrastructure and configuration code
  • Testing and validation procedures that ensure deployment reliability

This automation pipeline demonstrates real-world DevOps practices used by enterprise teams to manage cloud infrastructure at scale. The security hardening measures we’ve implemented provide a solid foundation for hosting production applications, while the automated maintenance procedures ensure long-term operational stability.

The skills and patterns learned in this tutorial directly transfer to larger, more complex deployments. The role-based architecture we’ve built can be extended to support additional services, while the group variable system provides flexibility for managing multiple environments and host types.


Further Learning Resources

To deepen your understanding of the tools and concepts covered in this tutorial, explore these comprehensive resources:

Ansible Mastery

Azure Cloud Platform

Infrastructure as Code

DevOps and Security


Stay Tuned for Part 3

In the next installment of our Infrastructure as Code in Azure series, we’ll take our automation to the next level by implementing GitHub Actions CI/CD pipelines for infrastructure deployment. You’ll learn how to:

Build Production-Ready CI/CD Pipelines:

  • Configure GitHub Actions workflows for automated Terraform deployments
  • Implement secure Azure authentication using OpenID Connect (OIDC)
  • Create approval workflows with Terraform plan output review processes

Advanced DevOps Patterns:

  • Multi-environment deployment strategies (dev, staging, production)
  • Automated testing and validation of infrastructure changes
  • Security scanning and compliance checking in CI/CD pipelines

GitOps Workflows:

  • Pull request-based infrastructure changes with automated planning
  • Environment promotion workflows and release management
  • Rollback strategies and disaster recovery automation

Part 3 will demonstrate how to move from manual deployments to fully automated, enterprise-grade infrastructure workflows that scale to support multiple teams and environments. We’ll build on the solid foundation we’ve established with Terraform and Ansible to create a complete DevOps automation pipeline.

Aaron Mathis

Aaron Mathis

Systems administrator and software engineer specializing in cloud development, AI/ML, and modern web technologies. Passionate about building scalable solutions and sharing knowledge with the developer community.

Related Articles

Discover more insights on similar topics