Purpose: Standardized 8-category, 29-criteria framework for evaluating system testability and NFR compliance during architecture review (Phase 3) and NFR assessment.
When to Use:
How to Use:
Question: Can we verify this effectively without manual toil?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 1.1 | Isolation: Can the service be tested with all downstream dependencies (DBs, APIs, Queues) mocked or stubbed? | Flaky tests; inability to test in isolation | P1: Service runs with mocked DB, P1: Service runs with mocked API, P2: Integration tests with real deps |
| 1.2 | Headless Interaction: Is 100% of the business logic accessible via API (REST/gRPC) to bypass the UI for testing? | Slow, brittle UI-based automation | P0: All core logic callable via API, P1: No UI dependency for critical paths |
| 1.3 | State Control: Do we have “Seeding APIs” or scripts to inject specific data states (e.g., “User with expired subscription”) instantly? | Long setup times; inability to test edge cases | P0: Seed baseline data, P0: Inject edge case data states, P1: Cleanup after tests |
| 1.4 | Sample Requests: Are there valid and invalid cURL/JSON sample requests provided in the design doc for QA to build upon? | Ambiguity on how to consume the service | P1: Valid request succeeds, P1: Invalid request fails with clear error |
Common Gaps:
Mitigation Examples:
/api/test-data seeding endpoints (dev/staging only)Question: How do we fuel our tests safely?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 2.1 | Segregation: Does the design support multi-tenancy or specific headers (e.g., x-test-user) to keep test data out of prod metrics? | Skewed business analytics; data pollution | P0: Multi-tenant isolation (customer A ≠ customer B), P1: Test data excluded from prod metrics |
| 2.2 | Generation: Can we use synthetic data, or do we rely on scrubbing production data (GDPR/PII risk)? | Privacy violations; dependency on stale data | P0: Faker-based synthetic data, P1: No production data in tests |
| 2.3 | Teardown: Is there a mechanism to “reset” the environment or clean up data after destructive tests? | Environment rot; subsequent test failures | P0: Automated cleanup after tests, P2: Environment reset script |
Common Gaps:
customer_id scoping in queries (cross-tenant data leakage risk)Mitigation Examples:
customer_id in all queries, add test-specific headersQuestion: Can it grow, and will it stay up?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 3.1 | Statelessness: Is the service stateless? If not, how is session state replicated across instances? | Inability to auto-scale horizontally | P1: Service restart mid-request → no data loss, P2: Horizontal scaling under load |
| 3.2 | Bottlenecks: Have we identified the weakest link (e.g., database connections, API rate limits) under load? | System crash during peak traffic | P2: Load test identifies bottleneck, P2: Connection pool exhaustion handled |
| 3.3 | SLA Definitions: What is the target Availability (e.g., 99.9%) and does the architecture support redundancy to meet it? | Breach of contract; customer churn | P1: Availability target defined, P2: Redundancy validated (multi-region/zone) |
| 3.4 | Circuit Breakers: If a dependency fails, does this service fail fast or hang? | Cascading failures taking down the whole platform | P1: Circuit breaker opens on 5 failures, P1: Auto-reset after recovery, P2: Timeout prevents hanging |
Common Gaps:
Mitigation Examples:
Question: What happens when the worst-case scenario occurs?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 4.1 | RTO/RPO: What is the Recovery Time Objective (how long to restore) and Recovery Point Objective (max data loss)? | Extended outages; data loss liability | P2: RTO defined and tested, P2: RPO validated (backup frequency) |
| 4.2 | Failover: Is region/zone failover automated or manual? Has it been practiced? | “Heroics” required during outages; human error | P2: Automated failover works, P2: Manual failover documented and tested |
| 4.3 | Backups: Are backups immutable and tested for restoration integrity? | Ransomware vulnerability; corrupted backups | P2: Backup restore succeeds, P2: Backup immutability validated |
Common Gaps:
Mitigation Examples:
Question: Is the design safe by default?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 5.1 | AuthN/AuthZ: Does it implement standard protocols (OAuth2/OIDC)? Are permissions granular (Least Privilege)? | Unauthorized access; data leaks | P0: OAuth flow works, P0: Expired token rejected, P0: Insufficient permissions return 403, P1: Scope enforcement |
| 5.2 | Encryption: Is data encrypted at rest (DB) and in transit (TLS)? | Compliance violations; data theft | P1: Milvus data-at-rest encrypted, P1: TLS 1.2+ enforced, P2: Certificate rotation works |
| 5.3 | Secrets: Are API keys/passwords stored in a Vault (not in code or config files)? | Credentials leaked in git history | P1: No hardcoded secrets in code, P1: Secrets loaded from AWS Secrets Manager |
| 5.4 | Input Validation: Are inputs sanitized against Injection attacks (SQLi, XSS)? | System compromise via malicious payloads | P1: SQL injection sanitized, P1: XSS escaped, P2: Command injection prevented |
Common Gaps:
Mitigation Examples:
Question: Can we operate and fix this in production?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 6.1 | Tracing: Does the service propagate W3C Trace Context / Correlation IDs for distributed tracing? | Impossible to debug errors across microservices | P2: W3C Trace Context propagated (EventBridge → Lambda → Service), P2: Correlation ID in all logs |
| 6.2 | Logs: Can log levels (INFO vs DEBUG) be toggled dynamically without a redeploy? | Inability to diagnose issues in real-time | P2: Log level toggle works without redeploy, P2: Logs structured (JSON format) |
| 6.3 | Metrics: Does it expose RED metrics (Rate, Errors, Duration) for Prometheus/Datadog? | Flying blind regarding system health | P2: /metrics endpoint exposes RED metrics, P2: Prometheus/Datadog scrapes successfully |
| 6.4 | Config: Is configuration externalized? Can we change behavior without a code build? | Rigid system; full deploys needed for minor tweaks | P2: Config change without code build, P2: Feature flags toggle behavior |
Common Gaps:
Mitigation Examples:
Question: How does it perform, and how does it feel?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 7.1 | Latency (QoS): What are the P95 and P99 latency targets? | Slow API responses affecting throughput | P3: P95 latency <Xs (load test), P3: P99 latency <Ys (load test) |
| 7.2 | Throttling (QoS): Is there Rate Limiting to prevent “noisy neighbors” or DDoS? | Service degradation for all users due to one bad actor | P2: Rate limiting enforced, P2: 429 returned when limit exceeded |
| 7.3 | Perceived Performance (QoE): Does the UI show optimistic updates or skeletons while loading? | App feels sluggish to the user | P2: Skeleton/spinner shown while loading (E2E), P2: Optimistic updates (E2E) |
| 7.4 | Degradation (QoE): If the service is slow, does it show a friendly message or a raw stack trace? | Poor user trust; frustration | P2: Friendly error message shown (not stack trace), P1: Error boundary catches exceptions (E2E) |
Common Gaps:
Mitigation Examples:
Question: How easily can we ship this?
| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
|---|---|---|---|
| 8.1 | Zero Downtime: Does the design support Blue/Green or Canary deployments? | Maintenance windows required (downtime) | P2: Blue/Green deployment works, P2: Canary deployment gradual rollout |
| 8.2 | Backward Compatibility: Can we deploy the DB changes separately from the Code changes? | “Lock-step” deployments; high risk of breaking changes | P2: DB migration before code deploy, P2: Code handles old and new schema |
| 8.3 | Rollback: Is there an automated rollback trigger if Health Checks fail post-deploy? | Prolonged outages after a bad deploy | P2: Health check fails → automated rollback, P2: Rollback completes within RTO |
Common Gaps:
Mitigation Examples:
System-Level Mode (Phase 3):
In test-design-architecture.md:
## NFR Testability Requirements
**Based on ADR Quality Readiness Checklist**
### 1. Testability & Automation
Can we verify this effectively without manual toil?
| Criterion | Status | Gap/Requirement | Risk if Unmet |
| ---------------------------------------------------------------- | --------------- | ------------------------------------ | --------------------------------------- |
| ⬜ Isolation: Can service be tested with downstream deps mocked? | ⚠️ Gap | No mock endpoints for Athena queries | Flaky tests; can't test in isolation |
| ⬜ Headless: 100% business logic accessible via API? | ✅ Covered | All MCP tools are REST APIs | N/A |
| ⬜ State Control: Seeding APIs to inject data states? | ⚠️ Gap | Need `/api/test-data` endpoints | Long setup times; can't test edge cases |
| ⬜ Sample Requests: Valid/invalid cURL/JSON samples provided? | ⬜ Not Assessed | Pending ADR Tool schemas finalized | Ambiguity on how to consume service |
**Actions Required:**
- [ ] Backend: Implement mock endpoints for Athena (R-002 blocker)
- [ ] Backend: Implement `/api/test-data` seeding APIs (R-002 blocker)
- [ ] PM: Finalize ADR Tool schemas with sample requests (Q4)
In test-design-qa.md:
## NFR Test Coverage Plan
**Based on ADR Quality Readiness Checklist**
### 1. Testability & Automation (4 criteria)
**Prerequisites from Architecture doc:**
- [ ] R-002: Test data seeding APIs implemented (blocker)
- [ ] Mock endpoints available for Athena queries
| Criterion | Test Scenarios | Priority | Test Count | Owner |
| ------------------------------- | -------------------------------------------------------------------- | -------- | ---------- | ---------------- |
| Isolation: Mock downstream deps | Mock Athena queries, Mock Milvus, Service runs isolated | P1 | 3 | Backend Dev + QA |
| Headless: API-accessible logic | All MCP tools callable via REST, No UI dependency for business logic | P0 | 5 | QA |
| State Control: Seeding APIs | Create test customer, Seed 1000 transactions, Inject edge cases | P0 | 4 | QA |
| Sample Requests: cURL examples | Valid request succeeds, Invalid request fails with clear error | P1 | 2 | QA |
**Detailed Test Scenarios:**
- [ ] Isolation: Service runs with Athena mocked (returns fixture data)
- [ ] Isolation: Service runs with Milvus mocked (returns ANN fixture)
- [ ] State Control: Seed test customer with 1000 baseline transactions
- [ ] State Control: Inject edge case (expired subscription user)
Output Structure:
# NFR Assessment: {Feature Name}
**Based on ADR Quality Readiness Checklist (8 categories, 29 criteria)**
## Assessment Summary
| Category | Status | Criteria Met | Evidence | Next Action |
| ----------------------------- | ----------- | ------------ | -------------------------------------- | -------------------- |
| 1. Testability & Automation | ⚠️ CONCERNS | 2/4 | Mock endpoints missing | Implement R-002 |
| 2. Test Data Strategy | ✅ PASS | 3/3 | Faker + auto-cleanup | None |
| 3. Scalability & Availability | ⚠️ CONCERNS | 1/4 | SLA undefined | Define SLA |
| 4. Disaster Recovery | ⚠️ CONCERNS | 0/3 | No RTO/RPO defined | Define recovery plan |
| 5. Security | ✅ PASS | 4/4 | OAuth 2.1 + TLS + Vault + Sanitization | None |
| 6. Monitorability | ⚠️ CONCERNS | 2/4 | No metrics endpoint | Add /metrics |
| 7. QoS & QoE | ⚠️ CONCERNS | 1/4 | Latency targets undefined | Define SLOs |
| 8. Deployability | ✅ PASS | 3/3 | Blue/Green + DB migrations + Rollback | None |
**Overall:** 14/29 criteria met (48%) → ⚠️ CONCERNS
**Gate Decision:** CONCERNS (requires mitigation plan before GA)
---
## Detailed Assessment
### 1. Testability & Automation (2/4 criteria met)
**Question:** Can we verify this effectively without manual toil?
| Criterion | Status | Evidence | Gap/Action |
| ---------------------------- | ------ | ------------------------ | -------------------------- |
| ⬜ Isolation: Mock deps | ⚠️ | No Athena mock | Implement mock endpoints |
| ⬜ Headless: API-accessible | ✅ | All MCP tools are REST | N/A |
| ⬜ State Control: Seeding | ⚠️ | `/api/test-data` pending | Pre-implementation blocker |
| ⬜ Sample Requests: Examples | ⬜ | Pending schemas | Finalize ADR Tools |
**Overall Status:** ⚠️ CONCERNS (2/4 criteria met)
**Next Actions:**
- [ ] Backend: Implement Athena mock endpoints (pre-implementation)
- [ ] Backend: Implement `/api/test-data` (pre-implementation)
- [ ] PM: Finalize sample requests (implementation phase)
{Repeat for all 8 categories}
For test-design workflow:
For nfr-assess workflow:
For Architecture teams:
For QA teams: