feat(haproxy-logs): ingest HAProxy request logs into Richie DB

Add a pipeline to load HAProxy `option httplog` lines into the Richie
database so bot/crawler traffic can be analyzed.

- model: HaproxyRequest mirroring the httplog format, with a unique
  line_hash dedup key and indexes on common filter columns
- migration: create the haproxy_request table (unique line_hash + indexes)
- haproxy_logs package:
  - parser: httplog line -> columns, strips the journald prefix and
    hashes the normalized line
  - ingest: batched, idempotent insert that skips rows whose line_hash
    already exists, so re-ingesting the same logs is a no-op
  - cli: ingest-only `haproxy-logs` command reading stdin or a file
- tests: parsing of a real GPTBot line and idempotent re-ingestion
This commit is contained in:
2026-06-23 21:13:20 -04:00
parent e1c4ae0d6e
commit 1d1bafbd30
7 changed files with 576 additions and 0 deletions
+1
View File
@@ -0,0 +1 @@
"""Load HAProxy ``option httplog`` lines into SQLite and query them."""