LogicMonitor OTLP Data Pipeline

Async data pipeline for ingesting and storing OTLP formatted JSON metrics from LogicMonitor Data Publisher (HTTPS) into Azure Data Lake Gen2 as Parquet. Serves as the data ingestion layer for the Precursor predictive ML ecosystem.

Ecosystem Overview

HttpIngest is the ingestion layer of a multi-tier ML ecosystem for predictive monitoring. It receives metrics and writes Parquet to ADLS. Downstream consumers read ADLS directly.

                         DATA LAYER (this project)
+-------------------------------------------------------------------------+
|                                                                         |
|   LogicMonitor         HttpIngest              Azure Data Lake Gen2     |
|   Collectors    --->   (Container App)   --->  (Parquet files)          |
|   (OTLP metrics)       /api/HttpIngest         stlmingestdatalake       |
|                                                                         |
+-------------------------------------------------------------------------+
                                   |
                      Direct ADLS reads (DuckDB, gsutil)
                                   |
              +--------------------+--------------------+
              v                                         v
+---------------------------+            +---------------------------+
|       ML LAYER            |            |    QUANTUM + PDP LAYER    |
|                           |            |                           |
|  Precursor (GCP Cloud Run)|            |  quantum_mcp              |
|  - X-DEC, Chronos, TTM   |            |  - QUBO optimization      |
|  - Training via Vertex AI |            |  - D-Wave, Qiskit, IBM    |
|  - Reads GCS Parquet      |            |  - Reads PDP tracker      |
+---------------------------+            +---------------------------+

Architecture

Current Version: v53 (ADLS-only mode, deployed 2026-03-31)

Components:

Azure Container Apps (async Python FastAPI, 0-3 replicas, scale-to-zero)
Azure Data Lake Gen2 (Parquet files, Hive-partitioned by time)
Azure Managed Identity (passwordless auth)

Data Flow:

LogicMonitor Collector 117 --> Data Publisher (OTLP, gzip)
    --> Container App (/api/HttpIngest)
    --> Async buffer (600s interval or 50K rows)
    --> ADLS Parquet (year/month/day/hour partitions)
    --> DuckDB direct reads (weekly training cron)
    --> gsutil cp to GCS (for Vertex AI training)

Key Features:

Async processing with buffered writes (non-blocking ADLS uploads via asyncio.to_thread)
Hive-partitioned Parquet files (year/month/day/hour)
Scale-to-zero (0 replicas when idle, scales to 3 under load)
Managed identity authentication (no password storage)
Gzip decompression on ingest
Prometheus /metrics endpoint

Removed in v53 (Session 28 cleanup, -12,100 lines):

Azure Synapse Serverless SQL (workspace deleted, never used for training)
ML query endpoints (/api/ml/*) -- training reads ADLS via DuckDB directly
PostgreSQL hot cache
Export format handlers (Grafana, PowerBI, CSV, JSON)
ODBC drivers from Docker image (~200MB savings)

Prerequisites

Azure subscription with Container Apps and Data Lake Gen2
LogicMonitor account with Collector HTTPS Publisher enabled
Azure CLI (az) version 2.50+
Python 3.12+ with uv package manager

Quick Start

# Clone repository
git clone https://github.com/ryanmat/HttpIngest.git
cd HttpIngest

# Install dependencies
uv sync

# Run locally (requires Azure credentials)
uv run python -m uvicorn containerapp_main:app --reload --host 0.0.0.0 --port 8000

Environment Variables

Required (set on Container App):

USE_MANAGED_IDENTITY=true           # Use Azure managed identity
ENABLE_COLLECTOR_PUBLISHER=true     # Enable collector data ingestion

Data Lake Configuration:

DATALAKE_ACCOUNT=stlmingestdatalake       # Storage account name
DATALAKE_FILESYSTEM=metrics               # Container name
DATALAKE_BASE_PATH=otlp                   # Base path in container
DATALAKE_FLUSH_INTERVAL_SECONDS=600       # Buffer flush interval (10 min)
DATALAKE_FLUSH_THRESHOLD_ROWS=50000       # Flush when buffer hits this size

Deployment

Quick Deploy

# 1. Build image in ACR
az acr build --registry acrctalmhttps001 \
  --image httpingest:v53 \
  --file Dockerfile.containerapp .

# 2. Deploy to Container App
az containerapp update --name ca-cta-lm-ingest \
  --resource-group CTA_Resource_Group \
  --image acrctalmhttps001.azurecr.io/httpingest:v53

# 3. Verify health
curl https://ca-cta-lm-ingest.greensea-6af53795.eastus.azurecontainerapps.io/api/health

Automated Deploy

./scripts/deploy.sh v53

API Endpoints

Ingestion

# Ingest OTLP metrics
curl -X POST https://your-app.azurecontainerapps.io/api/HttpIngest \
  -H "Content-Type: application/json" \
  -d @otlp_payload.json

# With gzip compression
curl -X POST https://your-app.azurecontainerapps.io/api/HttpIngest \
  -H "Content-Type: application/json" \
  -H "Content-Encoding: gzip" \
  --data-binary @otlp_payload.json.gz

Health

curl https://your-app.azurecontainerapps.io/api/health

Response:

{
  "status": "healthy",
  "version": "53.0.0",
  "mode": "datalake_only",
  "components": {
    "datalake": { "status": "healthy" }
  }
}

Metrics

# Prometheus-format metrics
curl https://your-app.azurecontainerapps.io/metrics

LogicMonitor Configuration

Data Publisher is configured on Collector 117 in the LM portal:

publisher.http.enable=true
publisher.http.url=https://ca-cta-lm-ingest.greensea-6af53795.eastus.azurecontainerapps.io/api/HttpIngest
publisher.http.format=otlp
publisher.http.compression=gzip
publisher.http.batch.size=100
publisher.http.batch.interval=30

Data Lake Structure

Parquet files are Hive-partitioned by time:

stlmingestdatalake/
  metrics/
    otlp/
      metric_data/
        year=2026/
          month=03/
            day=31/
              hour=12/
                part-20260331120000-abc123.parquet

Downstream consumers read ADLS directly:

Weekly training cron: DuckDB reads Parquet partitions, uploads to GCS via gsutil
Compaction script: scripts/compact_parquet.py merges small files to day-level (manual)

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src --cov-report=html

# Run specific test file
uv run pytest tests/test_datalake_components.py -v

Project Structure

.
├── containerapp_main.py      # FastAPI application entry point
├── Dockerfile.containerapp   # Container image (non-root, single worker)
├── pyproject.toml           # Python dependencies (uv), version 53.0.0
├── src/                     # Source code
│   ├── datalake_writer.py   # Async Data Lake Parquet writer
│   ├── ingestion_router.py  # Route data to ADLS backend
│   ├── otlp_parser.py       # OTLP parsing and normalization
│   └── tracing.py           # OpenTelemetry instrumentation
├── scripts/
│   ├── deploy.sh            # Automated ACR build + Container App deploy
│   └── compact_parquet.py   # Merge small Parquet files to day-level
├── tests/                   # 10 test files
└── docs/                    # Documentation

Monitoring

Container App Logs

# Stream logs
az containerapp logs show --name ca-cta-lm-ingest \
  --resource-group CTA_Resource_Group --follow

# Check for errors
az containerapp logs show --name ca-cta-lm-ingest \
  --resource-group CTA_Resource_Group --tail 50 | grep ERROR

Scaling

# Current: 0-3 replicas, scale-to-zero enabled
az containerapp update --name ca-cta-lm-ingest \
  --resource-group CTA_Resource_Group \
  --min-replicas 0 --max-replicas 3

Troubleshooting

Data Not Appearing

Default flush interval is 600 seconds (10 min), or 50,000 rows
Check container logs for flush activity: grep flush
Verify Data Publisher is active on Collector 117 in LM portal

Container App Not Starting

Check ACR image tag matches deployed revision
Verify Managed Identity has Storage Blob Data Contributor on ADLS

Stale Partitions

Run compaction script to merge small hourly files: uv run python scripts/compact_parquet.py

License

See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogicMonitor OTLP Data Pipeline

Ecosystem Overview

Architecture

Prerequisites

Quick Start

Environment Variables

Deployment

Quick Deploy

Automated Deploy

API Endpoints

Ingestion

Health

Metrics

LogicMonitor Configuration

Data Lake Structure

Running Tests

Project Structure

Monitoring

Container App Logs

Scaling

Troubleshooting

Data Not Appearing

Container App Not Starting

Stale Partitions

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.containerapp		Dockerfile.containerapp
LICENSE		LICENSE
README.md		README.md
containerapp_main.py		containerapp_main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LogicMonitor OTLP Data Pipeline

Ecosystem Overview

Architecture

Prerequisites

Quick Start

Environment Variables

Deployment

Quick Deploy

Automated Deploy

API Endpoints

Ingestion

Health

Metrics

LogicMonitor Configuration

Data Lake Structure

Running Tests

Project Structure

Monitoring

Container App Logs

Scaling

Troubleshooting

Data Not Appearing

Container App Not Starting

Stale Partitions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages