Surya Let's collaborate
Staff AI Platform & Cloud Engineer · PayPal Chennai, India

I build the platform layer that lets enterprises connect AI to their backends.

Thirteen years across cloud, data, and enterprise systems — now focused on agentic enablement at production grade. I design the governed, auditable paths that let teams wire LLMs and agents into backend systems: MCP server frameworks, tool-use SDKs, and the middleware that absorbs identity, data governance, and guardrails.

RoleStaff Engineer · PayPal
FocusAgentic AI · MCP · AIOps
BaseGoogle Cloud · Kubernetes
StatusBuilding in public
Selected work
The pattern I build
Agent / LLM Tool-use
MCP Server FastMCP
Federated Identity SAML · OAuth
Guardrails Governance
Enterprise Systems Databases · APIs
01 — Selected Work

Platform work,
in production.

01

Governed AI-to-Backend Integration (MCP)

Defined an MCP → federated-identity → enterprise-systems topology, now the org-standard pattern for agentic workloads. First proven across SAP HANA, OData, and SaaS landscapes — cutting new integration build time from weeks to days.

MCP · Federated Identity · Enterprise Backends
AI Enablement
02

End-to-End MCP Framework & Reusable SDKs

Shipped an MCP framework with shared SDK libraries that absorb federated auth, query governance, and security guardrails — so consuming teams inherit production-grade safety by default instead of rebuilding it per system. Pioneered the FastMCP runtime in production.

FastMCP · Python · SAML / OAuth · Guardrails
AI Enablement
03

AIOps Agent Fleet

Architecting AIOps across three agentic domains — Monitoring & Alerting (triage, Dataproc/BigQuery monitoring, vulnerability tracking), Operations (change coordination, patch-day automation, system-health analysis), and Knowledge & Query (L1 knowledge and troubleshooting agents).

LangGraph · n8n · RAG · Vector DBs
AIOps
04

Conversational Health-Monitoring Agent

Productionized an agent giving SRE and platform teams natural-language access to system health, sessions, and workload analytics on the HANA platform — now the reference implementation other agents are built from.

LLM Tool-Use · Embeddings · OpenTelemetry
AIOps
05

FinOps Governance Program

Drove FinOps governance delivering verified, recurring cost savings — BigQuery slot reservations, GCP committed-use discounts, rightsizing, and quota management — with no performance degradation.

BigQuery Slots · GCP CUDs · Rightsizing
Cloud & FinOps
06

Self-Service Platform & Terraform Frameworks

Built Terraform provisioning frameworks that eliminated ~80% of manual ClickOps and self-service platforms that offloaded ~70% of L1/L2 tasks — plus CLI-generated, production-ready MCP servers and an interactive docs site that turns one-off builds into a repeatable authoring path.

Terraform · GKE · CLI Tooling · GitOps
Platform
07

Real-Time Streaming Platform

Architected Apache Spark / Dataproc platforms processing 200M+ events per day — PySpark pipelines over Pub/Sub streaming and batch ingestion at multi-terabyte scale, with GKE-based orchestration and capacity monitoring.

PySpark · Dataproc · Pub/Sub · BigQuery
Cloud & Data
02 — Capabilities

What I work in,
concretely.

A

AI & Agentic

The platform layer for agentic workloads — governed, auditable, and reusable across teams.

  • Model Context Protocol (MCP)
  • FastMCP
  • LLM Tool-Use Frameworks
  • LangGraph
  • n8n
  • RAG & Vector DBs
  • Guardrail / Policy Design
B

Cloud & Infrastructure

Multi-cloud platforms with a deep Google Cloud foundation, built for resilience and cost.

  • Google Cloud (GCP)
  • AWS & Azure
  • Kubernetes & GKE
  • Terraform & Helm
  • Cloud Build · GitOps · CI/CD
  • FinOps Governance
C

Data & Streaming

Big-data platforms supporting streaming and batch ingestion at multi-terabyte scale.

  • Apache Spark
  • Dataproc
  • BigQuery
  • Pub/Sub
  • Apache NiFi
  • SAP HANA
D

Reliability & Security

Observability, chaos engineering, and InfoSec collaboration for audit-ready systems.

  • OpenTelemetry & Prometheus
  • Cloud Monitoring · Datadog
  • Chaos Engineering · MTTR
  • IAM · SAML / SSO / MFA · OAuth
  • DLP · TLS · Vuln Management
  • SOX · PCI-DSS Audit Readiness
Daily Tooling
MCP Google Cloud Kubernetes Terraform Docker Helm SAP Spark Datadog Prometheus Grafana OpenTelemetry Python Ansible GitHub Actions Jenkins Harness
03 — Experience

Thirteen years,
one through-line.

  1. 2017 — Present

    PayPal

    Staff System & Cloud Engineer

    Leading AI & agentic platform engineering (MCP) and AIOps — plus large-scale GCP migrations, Spark/Dataproc data platforms, FinOps governance, and SRE.

  2. 2015 — 2017

    DXC Technologies (formerly CSC India)

    Application Delivery Engineer

    Ansible-based automation for patch management; SAP/Linux hardening for audit mandates; HA/DR with Pacemaker and SAP HANA clusters for Fortune 500 clients.

  3. 2012 — 2015

    Wipro Technologies

    Senior Project Engineer

    System automation and DR setups for global SAP clients; contributed to the HANA Center of Excellence on performance tuning and migration assessments.

Certifications

  • Google CloudProfessional Cloud Architect
  • Google CloudProfessional Cloud Network Engineer
  • Google CloudProfessional Cloud DevOps Engineer
  • Google CloudAssociate Cloud Engineer
  • CNCFCertified Kubernetes Administrator (CKA)
  • HashiCorpTerraform Associate
  • Red HatCertified Engineer (RHCE) & RHCSA
  • SAPHANA DBA & Integration Developer
05 — Contact

Let's build something
that survives production.

I write and build around agentic platforms, MCP, and AIOps. Always happy to compare notes or talk shop — email is the fastest way to reach me.

suryal.k90@gmail.com