Serengeti logo BLACK white bg w slogan
Menu

Prototype AI Infrastructure on AWS with Terraform: A Step-by-Step Guide

Tony Mostovac, Mid Software Developer
12.05.2025.

In this blog, we will explore how to set up simple prototype AWS infrastructure for running AI workloads using Terraform.

This guide will walk you through creating a GPU-enabled EC2 instance and other related AWS resources.

We will be using LocalAI , Open Source OpenAI alternative https://localai.io/

Please keep in mind that this setup is only for demo purposes. Do not use this setup for production environments

Why Terraform for AWS?

Terraform is a powerful Infrastructure as Code (IaC) tool that simplifies provisioning and managing cloud resources. When paired with AWS, Terraform enables developers to automate infrastructure deployment, ensuring consistency and scalability. This approach is particularly valuable for AI workloads, which often require GPU instances and precise configurations to handle computationally intensive tasks.

Prerequisites

- AWS Account: With permissions to create GPU-enabled EC2 instances, security groups, VPC and subnets

- Terraform: Installed on your local machine.

- AWS CLI: Installed and configured with appropriate credentials.

Step 1: Setting Up Terraform Provider

Create a directory for your Terraform project and initialize a configuration file named main.tf.

Define the AWS provider:

# Configure the required providers

terraform {

  required_providers {

    aws = {

      source  = "hashicorp/aws"

      version = "~> 5.0"

    }

  }

}

# Configure the AWS provider

provider "aws" {

  region  = "eu-central-1"

}

This configuration specifies the AWS provider and sets the region where resources will be created.

Step 2: Setting Up VPC

We need to create a Virtual Private Cloud (VPC). A VPC allows us to define our own network space, including subnets, routing, and internet connectivity. In this step, we will set up a VPC with a subnet, route table, and an internet gateway to enable external access.

Create a new .tf file, for example, vpc.tf, and insert the following configuration:

# Create a VPC

resource "aws_vpc" "vpc" {

  cidr_block                       = "172.31.0.0/16"

  enable_dns_hostnames             = true

  enable_dns_support               = true

  assign_generated_ipv6_cidr_block = true

  tags = {

    Name = "vpc"

  }

}

# Create a Subnet

resource "aws_subnet" "private_1a" {

  vpc_id                          = aws_vpc.vpc.id

  cidr_block                      = "172.31.16.0/20"

  availability_zone               = "eu-central-1a"

  tags = {

    Name = "1a-subnet"

  }

}

# Create a Route table

resource "aws_route_table" "route_table" {

  vpc_id = aws_vpc.vpc.id

  tags = {

    Name = "route-table"

  }

}

# Create a Route table association, bind route table with subnet

resource "aws_route_table_association" "private_1a" {

  subnet_id      = aws_subnet.private_1a.id

  route_table_id = aws_route_table.route_table.id

}

# Create an Internet Gateway

resource "aws_internet_gateway" "internet_gateway" {

  vpc_id = aws_vpc.vpc.id

  tags = {

    Name = "internet-gateway"

  }

}

# Associate the Internet Gateway with the custom route table

resource "aws_route" "ipv4" {

  route_table_id         = aws_route_table.route_table.id

  destination_cidr_block = "0.0.0.0/0" # This represents all internet traffic

  gateway_id             = aws_internet_gateway.internet_gateway.id

}

Step 3: Defining EC2 Instance with GPU

AI workloads often require GPU-enabled instances. For this guide, we will use g4dn.xlarge, it is also the cheapest instance, which is optimized for AI and machine learning workloads.

Create a new .tf file, for example, gpu_instance.tf and insert the following:

resource "aws_instance" "gpu_instance" {

  ami                         = "ami-03250b0e01c28d196"

  instance_type               = "g4dn.xlarge"

  subnet_id                   = aws_subnet.private_1a.id

  vpc_security_group_ids      = [aws_security_group.gpu_instance_sg.id]

  associate_public_ip_address = true

  root_block_device {

    volume_size           = 100

    delete_on_termination = true

  }

  user_data = file("user_data.sh")

  tags = {

    Name = "gpu_instance"

  }

}

output "gpu_instance_public_ip" {

  value = aws_instance.gpu_instance.public_ip

}

Step 4: Creating User Data Script

The user_data.sh script automates instance setup by installing dependencies, cloning the the repository, and downloading a model.

Create a file named user_data.sh:

#!/bin/bash

# Update and upgrade system packages

apt-get update -y

apt-get upgrade -y

# Install necessary dependencies: Docker, Git, and build-essential

apt-get install -y docker.io docker-compose-v2 git git-lfs build-essential

# Install NVIDIA drivers

apt-get install -y nvidia-driver-510 nvidia-dkms-510

# Install NVIDIA Container Toolkit for GPU support

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \

  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \

    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \

    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the NVIDIA container runtime and configure it for Docker

apt-get update -y

apt-get install -y nvidia-container-toolkit

nvidia-ctk runtime configure --runtime=docker

# Restart Docker to apply changes

systemctl restart docker

# Run the Docker container with GPU support

mkdir -p /home/ubuntu/models

echo 'services:

  django:

    container_name: local-ai

    image: localai/localai:latest-aio-gpu-nvidia-cuda-12

    volumes:

      - ./models:/build/models

    deploy:

      resources:

        reservations:

          devices:

            - driver: nvidia

              count: all

              capabilities: [gpu]

    ports:

      - 8080:8080

    restart: unless-stopped' >> /home/ubuntu/docker-compose.yml

docker compose -f /home/ubuntu/docker-compose.yml up -d

This script will do the following:

- Updates and upgrades system packages.

- Installs Docker, Git, and build-essential.

- Installs and configures NVIDIA Container Toolkit.

- Runs the container with GPU support.

Step 5: Configuring Security Group

We will enable SSH access from AWS (for the eu-central-1 region) and allow access to the app on port 8080. Define the security group in a file, e.g., sg.tf.

# Security group for the GPU instance

resource "aws_security_group" "gpu_instance_sg" {

  vpc_id      = aws_vpc.vpc.id

  name        = "gpu-instance-sg"

  description = "Security group for GPU Instance"

}

# Egress rule: Allow outbound HTTPS traffic

resource "aws_security_group_rule" "egress_https" {

  type              = "egress"

  from_port         = 443

  to_port           = 443

  protocol          = "tcp"

  security_group_id = aws_security_group.gpu_instance_sg.id

  cidr_blocks       = ["0.0.0.0/0"]

  description       = "Allow outbound HTTPS traffic"

}

# Egress rule: Allow outbound HTTP traffic

resource "aws_security_group_rule" "egress_http" {

  type              = "egress"

  from_port         = 80

  to_port           = 80

  protocol          = "tcp"

  security_group_id = aws_security_group.gpu_instance_sg.id

  cidr_blocks       = ["0.0.0.0/0"]

  description       = "Allow outbound HTTP traffic"

}

# Ingress rule: Allow SSH access from AWS EC2 Instance Connect (eu-central-1)

resource "aws_security_group_rule" "ingress_ssh_access" {

  type              = "ingress"

  from_port         = 22

  to_port           = 22

  protocol          = "tcp"

  security_group_id = aws_security_group.gpu_instance_sg.id

  cidr_blocks       = ["3.120.181.40/29"] # Adjust IP range as needed

  description       = "Allow SSH access from AWS EC2 Instance Connect (eu-central-1)"

}

# Ingress rule: Allow access to AI Frontend on port 8080

resource "aws_security_group_rule" "ingress_ai_frontend" {

  type              = "ingress"

  from_port         = 8080

  to_port             = 8080

  protocol          = "tcp"

  security_group_id = aws_security_group.gpu_instance_sg.id

  cidr_blocks       = ["0.0.0.0/0"]  #

  description       = "Allow access to AI on port 8080"

}

Like previously stated in introduction, this is only for example purpose, do not use such rules for serious and production usage.

Step 6: Initializing and Applying Terraform Configuration

At this point your directory should look like this:

image

Run the following commands to deploy infrastructure:

# Initialize Terraform in your project directory

terraform init  

# Preview changes before applying them

terraform plan  

# Apply configuration to create resources

terraform apply  

# Get instance IP

terraform output gpu_instance_public_ip

Step 7: Accessing Your Instance

After deployment, retrieve the public IP of your instance from Terraform output or AWS Console.

It will probably take up to 20 minutes for everything to be ready.

Access LocalAI instance in your browser http://public-ipv4:8080

image

Step 8: About models

The Docker image used in this example comes with some preinstalled models. However, by clicking on "Models" in the top bar, you can explore and install any of the 877 available models.

For example:

image

Step 9: Use the models

Once the model(s) are installed, click on "Chat" in the top bar to start chatting with the AI or  generate some images by clicking on „Generate images“.

Example: bird’s eye view: magnificent victorian city in a light mist, beside the sea, sunset; style: photorealistic | birds, bend walls

image

The resulting image is the output of the StableDiffusion model.

We can see the GPU & CPU usage on the server, required for image generation using nvtop.

image

That's it—pretty easy!

Conclusion

Terraform simplifies AWS infrastructure provisioning for AI workloads by automating resource creation and configuration. However, in a production environment, more robust security practices should be followed, including stricter security groups, applying the principle of least privilege with IAM roles, and using HTTPS for secure communication. For scalable container management, Amazon ECS would be preferable over EC2 instances. For AI processing at scale, AWS SageMaker is ideal, and deploying a reverse proxy along with an Application Load Balancer would ensure efficient and secure communication.

Let's do business

cross