Client project · AI automation

EyeNet

AI & automation infrastructure — LLM pipelines, ETL/ELT, containerized microservices, and production web/mobile apps.

Apr 2025 – May 2026RemotePartial NDA

0%Less manual work

0+Docs / week

0K+Daily requests

0%Extraction accuracy

The Challenge

Hooks and events arrived with inconsistent schemas — missing fields, wrong types, variable structures. Impossible to store for KPIs without a robust normalization layer.

The Solution

A hybrid AI infrastructure combining proprietary APIs with a self-hosted GPU cluster. Sub-agent architecture for memory, context, and specialized delegation.

The Impact

65% reduction in manual work, 300+ docs processed weekly, 10K+ daily requests, and 92% extraction accuracy across two production systems.

RoleAI Systems Engineer

TimelineApr 2025 – May 2026

ArchitectureHybrid Cloud + GPU Cluster

Core StackPython, n8n, Docker, DeepSeek

“Using proprietary APIs for everything was not viable at scale. The decision to build our own GPU cluster changed the economics of the entire operation.”

The Infrastructure Bet

The problem

What forced the architecture.

Three hard problems that shaped every technical decision.

Unnormalized Data

External hooks arrived with no fixed schema. Fields missing, types inconsistent, structures variable. We had to build a normalization layer before any AI could process the data.

Data quality

Controlled AI Costs

Using OpenAI/Gemini for everything scaled linearly in cost. We built a GPU cluster with open-source models and reserved proprietary APIs only where ROI justified it.

Infrastructure

Agents with Memory

Assistants had to handle multiple intents, remember context, and delegate to specialized sub-agents without losing coherence in long conversations.

AI systems

How it works

From sources to deliverables.

Two production systems. Different architecture, same principle: clean data → intelligent processing → actionable output.

Pinch to zoom · Drag to pan

Key decisions

Trade-offs that shaped the systems.

Models

Own Cluster vs External APIs

For high-volume and repetitive use cases, proprietary API costs scale linearly. We mounted a GPU cluster with open-source models (LLaMA, Mistral, DeepSeek) for base workloads, reserving OpenAI/Gemini for cases where quality justified cost.

Trade-off

Higher operational complexity in exchange for full control over latency, cost, and data privacy.

Normalization

Schema-First from Day One

Hooks arrived with variable structures. The temptation was to process and move on — but that would have made KPIs impossible. We designed the MongoDB schema thinking about future aggregations before writing the first pipeline.

Trade-off

More initial design time, zero technical debt in later analytics.

Agents

Sub-Agents vs Monolith

A single agent with all tools collapsed in long contexts and made routing errors. We separated into specialized sub-agents by domain (calendar, documents, data, general response) with a central orchestrator that delegates.

Trade-off

More nodes in the graph, explicit routing logic, but predictable and debuggable behavior.

Monitoring

Telegram as Alert System

Instead of building a monitoring dashboard from scratch, we integrated Telegram notifications directly into workflows. Every critical pipeline reports success/failure to the operator in real time.

Trade-off

Not Grafana, but instant, zero overhead, and the team already uses Telegram.

Stack

Technologies and tools.

Ingestion

WebhooksREST APIsTelegramn8n workflows

Processing

DeepSeekLLaMAMistralOpenAIGeminiGPU Cluster

Storage

MongoDBPostgreSQLRedisGoogle Drive

Delivery

DockerCI/CDFastAPIReact NativeNext.jsTelegram

Monitoring

Telegram alertsGrafanaPrometheusStructured logging

Pythonn8nDockerFastAPIOpenAIGeminiDeepSeekLLaMAMistralPostgreSQLMongoDBRedisCI/CDReact NativeNginxGrafana

Lessons learned

What broke in production.

Problems we had to recognize and fix in production.

Dirty Data in Production

The first production webhooks arrived with fields in unexpected formats that the normalization schema didn't anticipate. Required rapid iterations on the Code layer before stabilizing.

Data quality

Cluster vs Production Gap

Models that worked well on the local cluster failed in production due to quantization differences and available memory. We learned to separate model evaluation environments from production inference.

Infrastructure

Context Window in Long Conversations

In long conversations, the main agent lost relevant context. Simple memory wasn't enough — we had to implement sliding window memory with context summarization.

AI systems

Back to projects

Client project · AI automation

EyeNet

AI & automation infrastructure — LLM pipelines, ETL/ELT, containerized microservices, and production web/mobile apps.

Apr 2025 – May 2026RemotePartial NDA

0%Less manual work

0+Docs / week

0K+Daily requests

0%Extraction accuracy

The Challenge

Hooks and events arrived with inconsistent schemas — missing fields, wrong types, variable structures. Impossible to store for KPIs without a robust normalization layer.

The Solution

A hybrid AI infrastructure combining proprietary APIs with a self-hosted GPU cluster. Sub-agent architecture for memory, context, and specialized delegation.

The Impact

65% reduction in manual work, 300+ docs processed weekly, 10K+ daily requests, and 92% extraction accuracy across two production systems.

RoleAI Systems Engineer

TimelineApr 2025 – May 2026

ArchitectureHybrid Cloud + GPU Cluster

Core StackPython, n8n, Docker, DeepSeek

“Using proprietary APIs for everything was not viable at scale. The decision to build our own GPU cluster changed the economics of the entire operation.”

The Infrastructure Bet

The problem

What forced the architecture.

Three hard problems that shaped every technical decision.

Unnormalized Data

External hooks arrived with no fixed schema. Fields missing, types inconsistent, structures variable. We had to build a normalization layer before any AI could process the data.

Data quality

Controlled AI Costs

Using OpenAI/Gemini for everything scaled linearly in cost. We built a GPU cluster with open-source models and reserved proprietary APIs only where ROI justified it.

Infrastructure

Agents with Memory

Assistants had to handle multiple intents, remember context, and delegate to specialized sub-agents without losing coherence in long conversations.

AI systems

How it works

From sources to deliverables.

Two production systems. Different architecture, same principle: clean data → intelligent processing → actionable output.

Pinch to zoom · Drag to pan

Key decisions

Trade-offs that shaped the systems.

Models

Own Cluster vs External APIs

Trade-off

Higher operational complexity in exchange for full control over latency, cost, and data privacy.

Normalization

Schema-First from Day One

Trade-off

More initial design time, zero technical debt in later analytics.

Agents

Sub-Agents vs Monolith

Trade-off

More nodes in the graph, explicit routing logic, but predictable and debuggable behavior.

Monitoring

Telegram as Alert System

Instead of building a monitoring dashboard from scratch, we integrated Telegram notifications directly into workflows. Every critical pipeline reports success/failure to the operator in real time.

Trade-off

Not Grafana, but instant, zero overhead, and the team already uses Telegram.

Stack

Technologies and tools.

Ingestion

WebhooksREST APIsTelegramn8n workflows

Processing

DeepSeekLLaMAMistralOpenAIGeminiGPU Cluster

Storage

MongoDBPostgreSQLRedisGoogle Drive

Delivery

DockerCI/CDFastAPIReact NativeNext.jsTelegram

Monitoring

Telegram alertsGrafanaPrometheusStructured logging

Pythonn8nDockerFastAPIOpenAIGeminiDeepSeekLLaMAMistralPostgreSQLMongoDBRedisCI/CDReact NativeNginxGrafana

Lessons learned

What broke in production.

Problems we had to recognize and fix in production.

Dirty Data in Production

The first production webhooks arrived with fields in unexpected formats that the normalization schema didn't anticipate. Required rapid iterations on the Code layer before stabilizing.

Data quality

Cluster vs Production Gap

Models that worked well on the local cluster failed in production due to quantization differences and available memory. We learned to separate model evaluation environments from production inference.

Infrastructure

Context Window in Long Conversations

In long conversations, the main agent lost relevant context. Simple memory wasn't enough — we had to implement sliding window memory with context summarization.

AI systems

Summary

The Challenge

The Solution

The Impact

What forced the architecture.

Unnormalized Data

Controlled AI Costs

Agents with Memory

From sources to deliverables.

Trade-offs that shaped the systems.

Own Cluster vs External APIs

Trade-off

Schema-First from Day One

Trade-off

Sub-Agents vs Monolith

Trade-off

Telegram as Alert System

Trade-off

Technologies and tools.

Ingestion

Processing

Storage

Delivery

Monitoring

What broke in production.

Dirty Data in Production

Cluster vs Production Gap

Context Window in Long Conversations

Summary

The Challenge

The Solution

The Impact

What forced the architecture.

Unnormalized Data

Controlled AI Costs

Agents with Memory

From sources to deliverables.

Trade-offs that shaped the systems.

Own Cluster vs External APIs

Trade-off

Schema-First from Day One

Trade-off

Sub-Agents vs Monolith

Trade-off

Telegram as Alert System

Trade-off

Technologies and tools.

Ingestion

Processing

Storage

Delivery

Monitoring

What broke in production.

Dirty Data in Production

Cluster vs Production Gap

Context Window in Long Conversations