Terraform approach (how I keep it production-safe)
- Modules-first: small modules with clear interfaces, versioned releases, minimal side-effects.
- State hygiene: remote state + locking; avoid “one giant state” for blast-radius control.
- CI gating: fmt/validate/plan on PRs; apply only with approvals and protected branches.
- Guardrails: least privilege IAM, drift detection, tagging standards.
Real-world failure modes
- “Applied successfully but app is down” → infra != correctness; use health checks + progressive delivery.
- “State locked/corrupted” → strict backends + unlock procedures with audit trail.
- “Noisy plan diffs” → normalize inputs, stabilize modules, justify ignore_changes sparingly.
Links