Blog
Modernizing Public Safety Networking: Spearheading eBPF with Cilium
As part of the current move to Kubernetes, the adoption of Cilium and eBPF is
underway. This strategic modernization aims to replace legacy iptables routing
with a high-performance, kernel-native networking foundation, providing the
sub-millisecond latency and deep observability required for real-time crisis management.
The Image is the OS: Scaling Immutable Infrastructure with Bootable Containers (bootc)
The future of the operating system involves bootc (Bootable Containers),
representing the final frontier of Platform Engineering. This transition
enables managing the entire OS lifecycle using OCI-compliant GitOps
workflows similar to those for applications, aiming for a declarative,
immutable foundation.
Unifying Observability: Eliminating Monitoring Silos with OpenTelemetry
In mission-critical systems, fragmented monitoring is a liability.
OpenTelemetry represents the production observability standard, unifying
traces, metrics, and logs into a single, vendor-neutral pipeline. This provides
high-fidelity data required to maintain public safety infrastructure with
absolute confidence.
Scalable Monitoring: Why VictoriaMetrics is the Modern Alternative to Prometheus
As infrastructure grows, so does the volume of metrics. VictoriaMetrics offers a high-performance, cost-effective
monitoring solution that remains compatible with the Prometheus ecosystem while providing better compression,
lower resource usage, and simpler horizontal scaling.
Infrastructure as Data: Building a Self-Service Platform with Crossplane
As mission-critical infrastructure moves to Kubernetes,
Crossplane is being used to transform cloud resources into
declarative data. This transition is the foundation of an Internal Developer
Platform (IDP), aiming to eliminate provisioning bottlenecks and empower
engineering teams with a self-service model.
Visual Systems Thinking: Mapping Complex Kubernetes Architectures
In complex distributed systems, YAML is not a substitute for a mental model.
Automated visualization was used to transform "Black Box" manifests into clear,
navigable architectural diagrams, ensuring a shared understanding
of mission-critical public safety infrastructure.
Secure AI Adoption: Local LLMs with Ollama for Enterprise Privacy
GenAI offers a 10x productivity boost, but for mission-critical SRE and development, public
APIs are a non-starter. The use of local LLMs was pioneered
with Ollama to bring AI-assisted coding to teams without leaking proprietary
code or public safety data.
Database Performance at Scale: Tuning CloudNativePG for High-Throughput Workloads
Moving mission-critical databases to Kubernetes is only half the battle.
CloudNativePG clusters were tuned to support a 10x increase in connection
volume and high-throughput ETL pipelines, proving that self-managed
database performance can rival—and exceed—managed cloud services.
AI-Assisted SRE: Spearheading Triage with K8sGPT and Local Inference
As critical infrastructure migrates to Kubernetes, the integration of
K8sGPT and local LLMs is being spearheaded. This strategic initiative
aims to automate the initial triage of cluster failures, transforming
cryptic logs into actionable remediation steps while maintaining absolute
data privacy in production environments.
Beyond the Theory: Implementing Google SRE Principles in High-Stakes Environments
Google's SRE principles provide the blueprint, but real-world execution is a cultural
challenge. During past projects, "The SRE Book" was translated into actionable
engineering practices, using SLOs as Code and Error Budgets to balance rapid deployment
with the uncompromising uptime required for public safety.
Ransomware-Proof Infrastructure: Immutable Backups with MinIO and S3 Object Lock
In mission-critical SRE, "having a backup" is no longer enough.
A production-grade immutable storage vault has been implemented using
MinIO and S3 Object Lock. This provides a "Write Once, Read Many" (WORM)
guarantee for offsite backups, ensuring that public safety data remains
impervious to ransomware or accidental deletion.
Secure Data Governance: Scalable Database Management with pgAdmin and OAuth2
Granting developers access to production data shouldn't mean compromising on security.
pgAdmin was deployed on Kubernetes with OIDC integration, replacing insecure
port-forwarding with a centralized, identity-aware portal for the entire
Cloud-Native PG fleet.
Enterprise PostgreSQL on Kubernetes: Achieving High Availability with CloudNativePG
Managing stateful workloads on Kubernetes is a Tier-1 SRE challenge.
Implementing CloudNativePG (CNPG) as a production standard has
enabled the creation of a self-healing, highly available PostgreSQL
platform that matches RDS in reliability while surpassing it in
portability and control.
Automating High-Availability PostgreSQL on AWS: A Deep Dive into Trusted Postgres Architect (TPA)
Deploying production-ready PostgreSQL clusters requires more than just `apt-get install`. Trusted Postgres Architect
(TPA) by EDB brings Infrastructure as Code (IaC) principles to database orchestration, allowing you to provision,
configure, and manage highly available clusters on AWS EC2 with Ansible-driven automation.
The Developer Interface: Optimizing the Physical-to-Digital Bridge with QMK
As an SRE, the most important tool isn't the terminal—it's the interface that
connects the mind to the machine. Over the years, QMK Firmware has been used to
build customized, programmable hardware that reduces repetitive strain
and accelerates mission-critical workflows.
Converged Reliability: Strategic Hybrid Cloud with Harvester and Rancher
Cloud-Native doesn't always mean "In the Cloud." During recent
infrastructure projects, Harvester was used to transform bare-metal
hardware into a private cloud, bridging the gap between legacy VMs
and modern Kubernetes workloads with a single, unified control plane.
The SRE Knowledge Graph: Building a Second Brain for Mission-Critical Operations
In an SRE career spanning 26 years, the most valuable asset isn't just the code—it's
the accumulated knowledge of how systems fail and recover. Throughout a career,
Logseq has been used to build a private, graph-based "Second Brain,"
transforming scattered notes into a searchable, interconnected knowledge base
for mission-critical operations.
Mapping the Monolith: Visualizing 26 Years of Systems Integration with yEd
In an era of browser-based tools, yEd remains the SRE's secret weapon for mapping
chaotic systems. Algorithmic layouts
are consistently used to transform hundreds of undocumented database
relationships into clear, hierarchical maps, providing the clarity needed to
scale complex platforms.

