Skip to content

Infrastructure & Server Setup

TendoPay runs on AWS with Singapore (ap-southeast-1) as the primary region and Mumbai, India (ap-south-1) as the fallback region, fronted by Cloudflare at the edge. This page describes the server stack — the runtime, the services it depends on, and how requests are handled in production.

High-Level Topology

TendoPay High-Level Topology

Requests reach Cloudflare first, where caching, WAF, and bot protection are applied. A handful of latency-sensitive endpoints (for example, the repayment calculator) are served entirely from Cloudflare Workers at the edge — they never hit the origin. Everything else is proxied to the origin in AWS.

The Server Stack

Web Tier — Nginx + PHP

  • Nginx terminates TLS at the origin and reverse-proxies to the Laravel application.
  • Server-side TLS uses TLS 1.2 (with TLS 1.3 enabled where supported), backed by SHA-256 certificates.
  • Nginx also handles a small number of upstream proxies for third-party SDKs that require a same-origin endpoint.

Application Tier — PHP / Laravel

  • The Laravel application runs on a current, supported PHP release (PHP 8.x).
  • Required PHP extensions include the standard set plus intl, imagick, soap, xsl, redis, and a few others needed by specific integrations.
  • See the Backend (Laravel) page for details on what the application itself does.

Queue / Worker Tier — Horizon

  • Background work runs in a Laravel Horizon queue tier backed by Redis.
  • Workers are split into separate supervised process pools so that:
    • Web-facing jobs (OTP, webhook delivery, transactional email) stay responsive on the web tier.
    • Batch jobs (statement generation, reporting, payroll runs) run on the dedicated batch server so heavy work cannot starve the web pool.
    • Export jobs (large Excel/PDF exports) run on their own pool, also pinned to the batch server.
  • Worker processes are managed by Supervisor and restart automatically on failure.

Data Tier

  • Amazon Aurora (MySQL-compatible) as the primary relational store. Aurora runs as a cluster — a single writer with read replicas — and provides storage-level replication, automatic failover within the region, and cross-region replication into the India fallback (see Multi-Region Layout below).
  • Redis for cache, sessions, queues, broadcasting, and rate limiting.
  • Meilisearch for full-text search behind the admin dashboard.
  • AWS S3 for object storage — KYC documents, signatures, generated PDFs, exports. Buckets are versioned and replicated cross-region to the fallback region.

Edge — Cloudflare

  • CDN and caching for static assets and cacheable HTML.
  • WAF and bot protection at the edge.
  • Cloudflare Turnstile as an alternative challenge mechanism for high-risk forms.
  • Cloudflare Workers for latency-sensitive endpoints that benefit from running close to the user. The repayment calculator is one example — it runs entirely on Workers without round-tripping to AWS.
  • DNS-level failover at Cloudflare routes traffic to the India fallback region when Singapore health checks fail for a sustained window.

Multi-Region Layout

TendoPay runs in two AWS regions for resilience and regulatory continuity:

RoleRegionCodeUse
PrimarySingaporeap-southeast-1Serves all live customer and merchant traffic. Aurora writer and full application tier live here.
FallbackMumbai, Indiaap-south-1Warm standby. Aurora cross-region read replica, replicated S3 buckets, and a scaled-down application tier ready to take over on failover.
  • The fallback region is a warm standby — application servers run at minimum capacity and scale up under Auto Scaling when traffic is shifted.
  • Aurora cross-region replication keeps the India cluster within seconds of Singapore. On failover, the India read replica is promoted to a writer.
  • S3 Cross-Region Replication (CRR) mirrors all object-storage buckets to India.
  • Redis and Meilisearch are restored from backup in the fallback region rather than running hot — both hold derived state that can be rebuilt.
  • Cloudflare DNS is the cutover mechanism: an automated health check flips DNS to the India load balancer when Singapore is unreachable.

VPC, EC2, and Elastic Scaling

Each region runs its own Amazon VPC with public and private subnets spread across two Availability Zones:

  • Public subnets host the Application Load Balancer, NAT Gateways, and Bastion hosts for ops access.
  • Private subnets host the EC2 application instances, Horizon workers, the batch server, and the Aurora cluster.
  • Security Groups and NACLs enforce least-privilege traffic flow — only the load balancer can reach the app instances on the application port, only the app instances can reach Aurora, etc.
  • EC2 Auto Scaling Groups size the application tier elastically based on CPU and request-count CloudWatch alarms. Minimum, desired, and maximum capacity are tuned per environment.
  • VPC peering / Transit Gateway connects Singapore and India for replication traffic without traversing the public internet.

Load Balancer & Failover

  • An AWS Application Load Balancer (ALB) sits in front of the EC2 application instances in each region, terminating TLS and distributing traffic across healthy targets in both AZs.
  • The ALB performs HTTP health checks on each instance; unhealthy targets are pulled out of rotation automatically.
  • Auto Scaling replaces failed instances to keep capacity at the desired count.
  • If an entire AZ becomes unhealthy the ALB continues serving from the surviving AZ.
  • If the whole Singapore region is unavailable, Cloudflare DNS failover redirects traffic to the India ALB, which targets the warm-standby application tier and the promoted Aurora cluster.

Batch Server

A dedicated batch server runs alongside the web tier for heavy scheduled and long-running work:

  • Statement generation, payroll runs, partner reconciliation files, large CSV/XLSX exports.
  • Runs the Laravel scheduler and the heavy-batch Horizon worker pool, isolated from web-facing workers so that a long export cannot starve OTP delivery or webhook handling.
  • Provisioned on its own EC2 instance type (memory-optimised) and sized independently of the web tier.
  • Has its own Auto Scaling Group with a maximum of one instance under normal load, but can be replaced by a fresh instance automatically if it becomes unhealthy.
  • Mirrored in the India fallback region in stopped state, ready to be started during a regional failover.

AWS Backup

All stateful resources are protected by AWS Backup with a tiered retention plan:

  • Daily backups retained for 31 days — gives a month of point-in-time options for accidental data loss, app regressions, or partner-data disputes.
  • Weekly backups retained for 52 weeks — gives a full year of weekly snapshots for compliance, audit, and long-range investigation requests.
  • Backup vaults are encrypted with KMS and are subject to vault-lock policies that prevent early deletion.
  • Backups are copied cross-region from Singapore into India so that a Singapore-region outage does not put the backup chain at risk.
  • Restores are exercised on a recurring schedule into an isolated account so that the recovery path is known to work, not assumed.

Resources covered include the Aurora cluster, EBS volumes attached to EC2 (app, batch, Redis, Meilisearch), and S3 buckets containing customer documents.

Deployment

  • Servers are managed via Laravel Forge on top of AWS EC2 hosts.
  • Supervisor manages long-running processes (queue workers, Horizon, batch jobs).
  • Releases are zero-downtime: a new release is built into a versioned directory and the current symlink is flipped once health checks pass. Auto Scaling Group instance refresh is used to roll new AMIs out gradually.
  • Database migrations run as part of the deploy, against the Aurora writer; replicas (including the India cross-region replica) catch up automatically.

CI / CD

  • GitHub Actions runs the full check matrix on every pull request — static analysis, code style, unit and feature tests, Cypress, lint for both Vue stacks.
  • Dependabot keeps dependencies current.
  • Release Drafter assembles release notes from merged PRs.

Observability

  • Errors and exceptions are reported into Jira — exceptions caught in the app create or update tickets in the engineering Jira project automatically, so every production error has an owner, a status, and a triage history.
  • Application and infrastructure metrics are collected by New Relic — APM traces for the Laravel app, throughput/latency for HTTP endpoints, queue depth and worker health, and host-level metrics for the EC2 fleet. Alert policies in New Relic page the on-call rotation when SLOs are breached.
  • CloudWatch collects AWS-side metrics (EC2, ALB, Aurora, AWS Backup) and forwards alarms to the same on-call pager that New Relic feeds.
  • Aurora Performance Insights is enabled for slow-query investigation.
  • AWS Backup events (success / failure / restore tests) page on-call so retention failures are caught the same day they happen.
  • Horizon and the Laravel scheduler monitor give operators in-app visibility into queue depth and scheduled-job liveness; the same signals are mirrored into New Relic for alerting.
  • Cloudflare provides edge-level traffic and security analytics, plus the health-check signal that drives DNS failover to India.