In this blog, we will explore how to set up simple prototype AWS infrastructure for running AI workloads using Terraform.
This guide will walk you through creating a GPU-enabled EC2 instance and other related AWS resources.
We will be using LocalAI , Open Source OpenAI alternative https://localai.io/
Please keep in mind that this setup is only for demo purposes. Do not use this setup for production environments
Terraform is a powerful Infrastructure as Code (IaC) tool that simplifies provisioning and managing cloud resources. When paired with AWS, Terraform enables developers to automate infrastructure deployment, ensuring consistency and scalability. This approach is particularly valuable for AI workloads, which often require GPU instances and precise configurations to handle computationally intensive tasks.
Prerequisites
- AWS Account: With permissions to create GPU-enabled EC2 instances, security groups, VPC and subnets
- Terraform: Installed on your local machine.
- AWS CLI: Installed and configured with appropriate credentials.
Step 1: Setting Up Terraform Provider
Create a directory for your Terraform project and initialize a configuration file named main.tf.
Define the AWS provider:
# Configure the required providers
terraform {
 required_providers {
  aws = {
   source  = "hashicorp/aws"
   version = "~> 5.0"
  }
 }
}
# Configure the AWS provider
provider "aws" {
 region  = "eu-central-1"
}
This configuration specifies the AWS provider and sets the region where resources will be created.
Step 2: Setting Up VPC
We need to create a Virtual Private Cloud (VPC). A VPC allows us to define our own network space, including subnets, routing, and internet connectivity. In this step, we will set up a VPC with a subnet, route table, and an internet gateway to enable external access.
Create a new .tf file, for example, vpc.tf, and insert the following configuration:
# Create a VPC
resource "aws_vpc" "vpc" {
 cidr_block            = "172.31.0.0/16"
 enable_dns_hostnames       = true
 enable_dns_support        = true
 assign_generated_ipv6_cidr_block = true
 tags = {
  Name = "vpc"
 }
}
# Create a Subnet
resource "aws_subnet" "private_1a" {
 vpc_id              = aws_vpc.vpc.id
 cidr_block            = "172.31.16.0/20"
 availability_zone        = "eu-central-1a"
 tags = {
  Name = "1a-subnet"
 }
}
# Create a Route table
resource "aws_route_table" "route_table" {
 vpc_id = aws_vpc.vpc.id
 tags = {
  Name = "route-table"
 }
}
# Create a Route table association, bind route table with subnet
resource "aws_route_table_association" "private_1a" {
 subnet_id    = aws_subnet.private_1a.id
 route_table_id = aws_route_table.route_table.id
}
# Create an Internet Gateway
resource "aws_internet_gateway" "internet_gateway" {
 vpc_id = aws_vpc.vpc.id
 tags = {
  Name = "internet-gateway"
 }
}
# Associate the Internet Gateway with the custom route table
resource "aws_route" "ipv4" {
 route_table_id     = aws_route_table.route_table.id
 destination_cidr_block = "0.0.0.0/0" # This represents all internet traffic
 gateway_id       = aws_internet_gateway.internet_gateway.id
}
Step 3: Defining EC2 Instance with GPU
AI workloads often require GPU-enabled instances. For this guide, we will use g4dn.xlarge, it is also the cheapest instance, which is optimized for AI and machine learning workloads.
Create a new .tf file, for example, gpu_instance.tf and insert the following:
resource "aws_instance" "gpu_instance" {
 ami             = "ami-03250b0e01c28d196"
 instance_type        = "g4dn.xlarge"
 subnet_id          = aws_subnet.private_1a.id
 vpc_security_group_ids    = [aws_security_group.gpu_instance_sg.id]
 associate_public_ip_address = true
root_block_device {
  volume_size      = 100
  delete_on_termination = true
 }
 user_data = file("user_data.sh")
 tags = {
  Name = "gpu_instance"
 }
}
output "gpu_instance_public_ip" {
 value = aws_instance.gpu_instance.public_ip
}
Step 4: Creating User Data Script
The user_data.sh script automates instance setup by installing dependencies, cloning the the repository, and downloading a model.
Create a file named user_data.sh:
#!/bin/bash
# Update and upgrade system packages
apt-get update -y
apt-get upgrade -y
# Install necessary dependencies: Docker, Git, and build-essential
apt-get install -y docker.io docker-compose-v2 git git-lfs build-essential
# Install NVIDIA drivers
apt-get install -y nvidia-driver-510 nvidia-dkms-510
# Install NVIDIA Container Toolkit for GPU support
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install the NVIDIA container runtime and configure it for Docker
apt-get update -y
apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
# Restart Docker to apply changes
systemctl restart docker
# Run the Docker container with GPU support
mkdir -p /home/ubuntu/models
echo 'services:
 django:
  container_name: local-ai
  image: localai/localai:latest-aio-gpu-nvidia-cuda-12
  volumes:
   - ./models:/build/models
  deploy:
   resources:
    reservations:
     devices:
      - driver: nvidia
       count: all
       capabilities: [gpu]
  ports:
   - 8080:8080
  restart: unless-stopped' >> /home/ubuntu/docker-compose.yml
docker compose -f /home/ubuntu/docker-compose.yml up -d
This script will do the following:
- Updates and upgrades system packages.
- Installs Docker, Git, and build-essential.
- Installs and configures NVIDIA Container Toolkit.
- Runs the container with GPU support.
Step 5: Configuring Security Group
We will enable SSH access from AWS (for the eu-central-1 region) and allow access to the app on port 8080. Define the security group in a file, e.g., sg.tf.
# Security group for the GPU instance
resource "aws_security_group" "gpu_instance_sg" {
 vpc_id    = aws_vpc.vpc.id
 name     = "gpu-instance-sg"
 description = "Security group for GPU Instance"
}
# Egress rule: Allow outbound HTTPS traffic
resource "aws_security_group_rule" "egress_https" {
 type        = "egress"
 from_port     = 443
 to_port      = 443
 protocol      = "tcp"
 security_group_id = aws_security_group.gpu_instance_sg.id
 cidr_blocks    = ["0.0.0.0/0"]
 description    = "Allow outbound HTTPS traffic"
}
# Egress rule: Allow outbound HTTP traffic
resource "aws_security_group_rule" "egress_http" {
 type        = "egress"
 from_port     = 80
 to_port      = 80
 protocol      = "tcp"
 security_group_id = aws_security_group.gpu_instance_sg.id
 cidr_blocks    = ["0.0.0.0/0"]
 description    = "Allow outbound HTTP traffic"
}
# Ingress rule: Allow SSH access from AWS EC2 Instance Connect (eu-central-1)
resource "aws_security_group_rule" "ingress_ssh_access" {
 type        = "ingress"
 from_port     = 22
 to_port      = 22
 protocol      = "tcp"
 security_group_id = aws_security_group.gpu_instance_sg.id
 cidr_blocks    = ["3.120.181.40/29"] # Adjust IP range as needed
 description    = "Allow SSH access from AWS EC2 Instance Connect (eu-central-1)"
}
# Ingress rule: Allow access to AI Frontend on port 8080
resource "aws_security_group_rule" "ingress_ai_frontend" {
 type        = "ingress"
 from_port     = 8080
 to_port        = 8080
 protocol      = "tcp"
 security_group_id = aws_security_group.gpu_instance_sg.id
 cidr_blocks    = ["0.0.0.0/0"]  #
 description    = "Allow access to AI on port 8080"
}
Like previously stated in introduction, this is only for example purpose, do not use such rules for serious and production usage.
Step 6: Initializing and Applying Terraform Configuration
At this point your directory should look like this:
Run the following commands to deploy infrastructure:
# Initialize Terraform in your project directory
terraform init Â
# Preview changes before applying them
terraform plan Â
# Apply configuration to create resources
terraform apply Â
# Get instance IP
terraform output gpu_instance_public_ip
Step 7: Accessing Your Instance
After deployment, retrieve the public IP of your instance from Terraform output or AWS Console.
It will probably take up to 20 minutes for everything to be ready.
Access LocalAI instance in your browser http://public-ipv4:8080
Step 8: About models
The Docker image used in this example comes with some preinstalled models. However, by clicking on "Models" in the top bar, you can explore and install any of the 877 available models.
For example:
Step 9: Use the models
Once the model(s) are installed, click on "Chat" in the top bar to start chatting with the AI or generate some images by clicking on „Generate images“.
Example: bird’s eye view: magnificent victorian city in a light mist, beside the sea, sunset; style: photorealistic | birds, bend walls
The resulting image is the output of the StableDiffusion model.
We can see the GPU & CPU usage on the server, required for image generation using nvtop.
That's it—pretty easy!
Conclusion
Terraform simplifies AWS infrastructure provisioning for AI workloads by automating resource creation and configuration. However, in a production environment, more robust security practices should be followed, including stricter security groups, applying the principle of least privilege with IAM roles, and using HTTPS for secure communication. For scalable container management, Amazon ECS would be preferable over EC2 instances. For AI processing at scale, AWS SageMaker is ideal, and deploying a reverse proxy along with an Application Load Balancer would ensure efficient and secure communication.