Streaming ingest · vector search · grounded answers

Every log line, queryable in plain English.

logconsolidator ingests your log files in real time, persists them to PostgreSQL and a ChromaDB vector store, and lets you ask natural-language questions over the result — with Grafana dashboards and auto-generated daily reports out of the box.

~/logconsolidator — python3 main.py
$ python3 main.py
[ingest] watching 4 sources · queue=bounded(2048)
[parse] LogEntry pipeline ready
[postgres] connected · table=logs
[chroma] collection=entries · persistent
[api] uvicorn @ http://0.0.0.0:8000
[reports] daily summary scheduled · 00:05 UTC

curl -N localhost:8000/query?q="what failed overnight?"
data: Three services reported sustained 5xx between 02:11–02:34 UTC.
data: Root cause was a Postgres connection pool exhaustion in checkout-svc

A bounded, back-pressured path from disk to insight.

Two queues, two output adapters, one process. Producers can never overwhelm consumers — by design.

01 · ingest
FileWatcher
Polling thread reads new lines from configured sources.
02 · queue
Raw lines
Bounded queue.Queue · natural backpressure.
03 · parse
LogEntry
Source ID, timestamp, raw, fields — into a dataclass.
04 · queue
Structured
Second bounded queue feeds parallel adapters.
05 · dispatch
Postgres + Chroma
Durable store of record · vector index for semantic search.

Operational logging and a knowledge base — same data, two surfaces.

You don't choose between dashboards and natural-language search. The pipeline writes both, and the API serves both.

Real-time ingest

A polling FileWatcher tails your sources and pushes lines into a bounded queue, so a noisy producer can't take down the pipeline.

PostgreSQL of record

Every structured LogEntry is persisted via psycopg v3 — durable, queryable, and ready for your existing SQL tooling.

ChromaDB vector store

Each entry is upserted as a vector document into a persistent local collection, enabling semantic similarity search out of the box.

Grafana-ready

Postgres is wired directly as a Grafana data source — build dashboards and run ad-hoc SQL on live log data without touching the app.

RAG over your logs

Chroma retrieves the most relevant entries by semantic similarity, OpenAI streams a grounded answer back — all behind one SSE endpoint.

Daily summary reports

A background scheduler runs the same RAG engine over the last 24 hours of Postgres data and emits a Markdown report — no setup needed.

One process. One entry point. Clean shutdown.

A single python3 main.py starts the ingest pipeline, the HTTP server, and the report scheduler — with unified signal handling so Ctrl-C drains everything cleanly.

  • FastAPI + Uvicorn — serves /query as Server-Sent Events, plus /reports CRUD and the static front-end.
  • Two bounded queues — between watcher → parser and parser → adapters, giving you backpressure for free.
  • Parallel output adapters — Postgres and Chroma consume from the same dispatch queue; one writer per store.
  • Streaming answers — the OpenAI Chat Completions response streams token-by-token through SSE to the caller.
PG
PostgreSQL · psycopg v3
durable store · grafana data source
CH
ChromaDB
persistent local · semantic similarity
RAG
OpenAI Chat Completions
retrieve · ground · stream answer
GF
Grafana
dashboards · ad-hoc SQL · live data

Five minutes from clone to streaming answers.

Configure your sources, point at a Postgres instance, and run a single command. The HTTP API and the daily reporter are already wired up.

config/sources/ssh_auth.json json
{
  "id": "ssh_auth",
  "path": "/var/log/secure.log",
  "parser": {
    "type": "regex",
    "patterns": {
      "timestamp": "(\\w+\\s+\\d+\\s+\\d+:\\d+:\\d+)",
      "session_id": "\\[(\\d+)\\]",
      "port":      "port\\s+(\\d+)"
    }
  },
  "classify": [
    {
      "match":                "failed password",
      "service":              "sshd",
      "event_type":           "failed_login",
      "severity":             "medium",
      "is_security_relevant": true
    }
  ]
}
terminal shell
# 1. install
pip install -r requirements.txt

# 2. run everything (ingest + API + scheduler)
python3 main.py

# 3. run only the RAG / HTTP API
uvicorn logconsolidator.api.server:create_app --factory \
        --host 0.0.0.0 --port 8000

# 4. run only the log formatter
python3 -m src.logconsolidator

Stop grepping. Start asking.

Run logconsolidator alongside your existing services and turn unstructured logs into a knowledge base your whole team can query.