System Design Course

System Design Course

Share this post

System Design Course
System Design Course
Day 3: Creating a Simple Log Collector Service
Copy link
Facebook
Email
Notes
More

Day 3: Creating a Simple Log Collector Service

254-Day Distributed Log Processing System Implementation

System Design Course's avatar
System Design Course
May 15, 2025
∙ Paid
9

Share this post

System Design Course
System Design Course
Day 3: Creating a Simple Log Collector Service
Copy link
Facebook
Email
Notes
More
3
Share

Week 1: Setting Up the Infrastructure

Introduction: What Are We Building Today?

Today, we're creating a log collector service that watches local log files and detects new entries. This service builds upon yesterday's log generator and represents a critical component in our distributed log processing system.

Imagine you're a detective monitoring surveillance cameras across a city. You can't watch all cameras simultaneously, so you need a system that automatically alerts you when something important happens. Our log collector works similarly – it continuously watches log files and notifies the system when new information appears.

Why This Matters in Distributed Systems

In real-world distributed systems, logs are the lifeline for understanding what's happening across multiple services. Companies like Netflix, Amazon, and Google collect billions of log entries daily to:

  • Identify system failures before they affect users

  • Track suspicious activities for security purposes

  • Monitor performance to prevent slowdowns

  • Troubleshoot issues when they occur

The log collector is often the first step in a log pipeline that eventually feeds data to dashboards, alerts, and analytics systems that engineers rely on daily.

Where This Fits in Our Overall System

Let's understand where our log collector fits:

  1. Log Generator (Yesterday's component): Creates log entries

  2. Log Collector (Today's component): Watches and captures new log entries

  3. Log Processor (Future component): Analyzes and transforms logs

  4. Log Storage (Future component): Saves processed logs

  5. Log Query Engine (Future component): Allows searching logs

Our collector is the bridge between log generation and processing, ensuring no valuable information gets missed.

How It Fits Into Our Overall System

Yesterday, we built a log generator. Today's collector will watch those logs, detect new entries, and prepare them for the next stage of processing. This component acts as the "gathering" stage in our data pipeline, solving the problem of how to detect and capture constantly changing information.

Implementation: Building Our Log Collector

Let's create a service that:

  1. Watches specified log files

  2. Detects when new entries appear

  3. Captures those entries for further processing

We'll use Python with the watchdog library to monitor file changes.

Let's create a simple but effective log collector service. We'll use Python for its simplicity and readability.

Step 1: Set Up the Project Structure

First, let's organize our project:

mkdir log-collector
cd log-collector
touch log_collector.py
touch Dockerfile
touch docker-compose.yml
touch requirements.txt
mkdir sample_logs
touch sample_logs/app.log

Step 2: Install Dependencies

In the requirements.txt file, add:

watchdog==2.1.9
pyyaml==6.0

The watchdog library will help us monitor file changes efficiently.

Step 3: Code the Log Collector

Let's implement our log collector in log_collector.py:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 System Design Course
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More