Day 3: Creating a Simple Log Collector Service

254-Day Distributed Log Processing System Implementation

May 15, 2025

∙ Paid

Week 1: Setting Up the Infrastructure

Introduction: What Are We Building Today?

Today, we're creating a log collector service that watches local log files and detects new entries. This service builds upon yesterday's log generator and represents a critical component in our distributed log processing system.

Imagine you're a detective monitoring surveillance cameras across a city. You can't watch all cameras simultaneously, so you need a system that automatically alerts you when something important happens. Our log collector works similarly – it continuously watches log files and notifies the system when new information appears.

Why This Matters in Distributed Systems

In real-world distributed systems, logs are the lifeline for understanding what's happening across multiple services. Companies like Netflix, Amazon, and Google collect billions of log entries daily to:

Identify system failures before they affect users
Track suspicious activities for security purposes
Monitor performance to prevent slowdowns
Troubleshoot issues when they occur

The log collector is often the first step in a log pipeline that eventually feeds data to dashboards, alerts, and analytics systems that engineers rely on daily.

Where This Fits in Our Overall System

Let's understand where our log collector fits:

Log Generator (Yesterday's component): Creates log entries
Log Collector (Today's component): Watches and captures new log entries
Log Processor (Future component): Analyzes and transforms logs
Log Storage (Future component): Saves processed logs
Log Query Engine (Future component): Allows searching logs

Our collector is the bridge between log generation and processing, ensuring no valuable information gets missed.

How It Fits Into Our Overall System

Yesterday, we built a log generator. Today's collector will watch those logs, detect new entries, and prepare them for the next stage of processing. This component acts as the "gathering" stage in our data pipeline, solving the problem of how to detect and capture constantly changing information.

Implementation: Building Our Log Collector

Let's create a service that:

Watches specified log files
Detects when new entries appear
Captures those entries for further processing

We'll use Python with the watchdog library to monitor file changes.

Let's create a simple but effective log collector service. We'll use Python for its simplicity and readability.

Step 1: Set Up the Project Structure

First, let's organize our project:

mkdir log-collector
cd log-collector
touch log_collector.py
touch Dockerfile
touch docker-compose.yml
touch requirements.txt
mkdir sample_logs
touch sample_logs/app.log

Step 2: Install Dependencies

In the requirements.txt file, add:

watchdog==2.1.9
pyyaml==6.0

The watchdog library will help us monitor file changes efficiently.

Step 3: Code the Log Collector

Let's implement our log collector in log_collector.py: