logconsolidator — unified log pipeline with RAG

2026-05-05T02:11:43Z [INFO] api-gateway request_id=a3f9 path=/v1/checkout latency_ms=42

2026-05-05T02:11:44Z [WARN] checkout-svc pool_wait_ms=890 conn_active=24/25

2026-05-05T02:11:45Z [ERROR] checkout-svc psycopg.OperationalError: connection pool exhausted

2026-05-05T02:11:46Z [INFO] auth-svc user_id=8821 action=login ok=true

2026-05-05T02:11:47Z [DEBUG] ingest line_count=2048 queue_depth=412 dropped=0

2026-05-05T02:11:48Z [INFO] chroma upsert collection=entries n=128 ms=84

2026-05-05T02:11:49Z [WARN] api-gateway 503 upstream=checkout-svc retries=2

2026-05-05T02:11:50Z [INFO] postgres write batch_size=64 rows=64 ms=11

2026-05-05T02:11:51Z [INFO] reports.daily window=24h entries=18432 model=gpt-4o-mini

2026-05-05T02:11:52Z [INFO] rag.query retrieved=12 model=gpt-4o-mini total_ms=820

2026-05-05T02:11:53Z [DEBUG] watcher path=/var/log/api/access.log offset=104857

2026-05-05T02:11:54Z [INFO] parser source=checkout-svc parsed=512 dropped=0

2026-05-05T02:11:55Z [WARN] rate_limit user_id=991 bucket=api hits=120/min

2026-05-05T02:11:56Z [ERROR] uvicorn worker_id=2 exception=TimeoutError after=30s

2026-05-05T02:11:57Z [INFO] sse client=192.168.1.42 stream=/query duration_ms=1820

2026-05-05T02:11:58Z [INFO] fastapi GET /reports 200 ms=4

2026-05-05T02:11:59Z [DEBUG] queue raw=412/2048 structured=88/2048

2026-05-05T02:12:00Z [INFO] scheduler tick name=daily_report next_run=00:05:00Z

2026-05-05T02:12:01Z [INFO] chroma.search query="checkout 5xx" k=10 ms=63

2026-05-05T02:12:02Z [INFO] postgres SELECT count(*) FROM logs WHERE level='ERROR'

2026-05-05T02:12:03Z [INFO] shutdown signal=SIGTERM draining=true

2026-05-05T02:12:04Z [INFO] ingest source=api-gateway lines=8 bytes=2048

The pipeline

A bounded, back-pressured path from disk to insight.

Two queues, two output adapters, one process. Producers can never overwhelm consumers — by design.

◐

01 · ingest

FileWatcher

Polling thread reads new lines from configured sources.

≋

02 · queue

Raw lines

Bounded queue.Queue · natural backpressure.

⎬

03 · parse

LogEntry

Source ID, timestamp, raw, fields — into a dataclass.

≋

04 · queue

Structured

Second bounded queue feeds parallel adapters.

⇉

05 · dispatch

Postgres + Chroma

Durable store of record · vector index for semantic search.

What's in the box

Operational logging and a knowledge base — same data, two surfaces.

You don't choose between dashboards and natural-language search. The pipeline writes both, and the API serves both.

Real-time ingest

A polling FileWatcher tails your sources and pushes lines into a bounded queue, so a noisy producer can't take down the pipeline.

PostgreSQL of record

Every structured LogEntry is persisted via psycopg v3 — durable, queryable, and ready for your existing SQL tooling.

ChromaDB vector store

Each entry is upserted as a vector document into a persistent local collection, enabling semantic similarity search out of the box.

Grafana-ready

Postgres is wired directly as a Grafana data source — build dashboards and run ad-hoc SQL on live log data without touching the app.

RAG over your logs

Chroma retrieves the most relevant entries by semantic similarity, OpenAI streams a grounded answer back — all behind one SSE endpoint.

Daily summary reports

A background scheduler runs the same RAG engine over the last 24 hours of Postgres data and emits a Markdown report — no setup needed.

Architecture

One process. One entry point. Clean shutdown.

A single python3 main.py starts the ingest pipeline, the HTTP server, and the report scheduler — with unified signal handling so Ctrl-C drains everything cleanly.

FastAPI + Uvicorn — serves /query as Server-Sent Events, plus /reports CRUD and the static front-end.
Two bounded queues — between watcher → parser and parser → adapters, giving you backpressure for free.
Parallel output adapters — Postgres and Chroma consume from the same dispatch queue; one writer per store.
Streaming answers — the OpenAI Chat Completions response streams token-by-token through SSE to the caller.

PostgreSQL · psycopg v3

durable store · grafana data source

ChromaDB

persistent local · semantic similarity

RAG

OpenAI Chat Completions

retrieve · ground · stream answer

Grafana

dashboards · ad-hoc SQL · live data

Quickstart

Five minutes from clone to streaming answers.

Configure your sources, point at a Postgres instance, and run a single command. The HTTP API and the daily reporter are already wired up.

config/sources/ssh_auth.json json

{
  "id": "ssh_auth",
  "path": "/var/log/secure.log",
  "parser": {
    "type": "regex",
    "patterns": {
      "timestamp": "(\\w+\\s+\\d+\\s+\\d+:\\d+:\\d+)",
      "session_id": "\\[(\\d+)\\]",
      "port":      "port\\s+(\\d+)"
    }
  },
  "classify": [
    {
      "match":                "failed password",
      "service":              "sshd",
      "event_type":           "failed_login",
      "severity":             "medium",
      "is_security_relevant": true
    }
  ]
}

terminal shell

# 1. install
pip install -r requirements.txt

# 2. run everything (ingest + API + scheduler)
python3 main.py

# 3. run only the RAG / HTTP API
uvicorn logconsolidator.api.server:create_app --factory \
        --host 0.0.0.0 --port 8000

# 4. run only the log formatter
python3 -m src.logconsolidator

Every log line, queryable in plain English.