a video transcoding system

Jul 20, 2025

I’ve previously built a version of this problem statement solved, you can look at the demo here:

https://vodio.it.kodify.live.

The problem statement is as follows:

Transcoding refers to the process of converting a given input video into different bitrates, where bitrates refers to 1080p, 720p, 360p, etc. Hence essentially, I want to be able to transcode videos into different video qualities (1080p, 720p, 360p) etc.

Generally speaking, this can be done pretty easily. You could just run use this tool called ffmpeg on your own local machine which allows you to transcode videos:

ffmpeg -i input.mp4 \

-c:v libx264 -b:v 3000k -maxrate 3000k -bufsize 6000k -c:a aac -b:a 128k output_3000k.mp4 \

-c:v libx264 -b:v 1500k -maxrate 1500k -bufsize 3000k -c:a aac -b:a 96k output_1500k.mp4 \

-c:v libx264 -b:v 800k -maxrate 800k -bufsize 1600k -c:a aac -b:a 64k output_800k.mp4

This is intuitive, however, for video platforms like YouTube, Netflix, etc, they need to transcode multiple videos in parallel. How would that get handled then?

Initial Naive Solution:

As these requests come to YouTube servers to transcode videos, we can transcode these videos on the YouTube servers itself.

Now while this sounds fine in theory, the reality is more chaotic. A few things immediately break down with this approach:

These transcoding operations are computationally expensive and long-running.
If multiple users upload videos at the same time, we’ll need to manage several processes in parallel.
If a single backend instance dies, it could take down several running transcodes with it.
Scaling this setup automatically is non-trivial. You’d have to configure autoscaling, and manage concurrency manually.

This approach quickly turns into a mess, and doesn’t scale gracefully under pressure.

The Better Approach: ECS-Based Transcoding System

To solve these problems, I built a cleaner, more robust solution using AWS ECS (Elastic Container Service).

Here’s how it works:

Each video upload spins up one ECS task per video. This means no shared memory, no race conditions, and total isolation.
If one container fails, it only affects that one video. No global failure cascading into other jobs.
Failed jobs can be restarted automatically, and health checks can route traffic away from unhealthy tasks.
Once the transcoding finishes, the ECS task shuts down cleanly—no lingering zombie processes.
And since ECS integrates with CloudWatch (AWS’s monitoring service), I get job-level metrics and logs out of the box.

Before I jump into the solution I went with, quick detour—let's talk about containers.

If you haven’t used Docker before, think of it like a way to package up your app and all its dependencies into a box that runs what you ask it to run. That box is called a container.

Now imagine if you could spin up one of these containers per video job, instead of running everything on a single server. That way, if one job crashes or slows down, it doesn’t affect the others. Each job runs in complete isolation. Clean. Predictable.

This is where AWS ECS (Elastic Container Service) comes in.

ECS is basically a managed version of running these Dockerfiles as Task Definitions, which is just a fancy way of saying it’ll run the Dockerfile for you.

So that’s exactly what I ended up doing.

When someone uploads a video, I create a new ECS task, which is essentially running the ffmpeg commands defined in a Dockerfile.
Each task only handles one video transcoding job. No shared memory, no side effects, no weird concurrency issues.
If the container fails mid-job, that’s fine—it only affects that one video. The others keep chugging along.
Once the transcoding is complete, the ECS task shuts down cleanly, and it is managed by AWS themselves.
ECS can also stream logs into CloudWatch (AWS’s monitoring solution) for job level performance.

This ECS-based architecture addresses the crux of the problem:

Transcoding is slow and CPU-heavy
Each video job should be sandboxed from the rest
And the system should gracefully handle failures and scale when needed

By isolating each job into a container, ECS gave me that: clean job boundaries, automatic cleanup, and fail-safety.

Aneesh’s Substack

Discussion about this post

Ready for more?