You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

step-04-evaluate-and-score.md 6.9KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254
  1. ---
  2. name: 'step-04-evaluate-and-score'
  3. description: 'Orchestrate adaptive NFR evidence domain audits (agent-team, subagent, or sequential)'
  4. nextStepFile: '{skill-root}/steps-c/step-04e-aggregate-nfr.md'
  5. ---
  6. # Step 4: Orchestrate Adaptive NFR Evidence Audit
  7. ## STEP GOAL
  8. Select execution mode deterministically, then audit NFR evidence domains using agent-team, subagent, or sequential execution while preserving output contracts.
  9. ## MANDATORY EXECUTION RULES
  10. - 📖 Read the entire step file before acting
  11. - ✅ Speak in `{communication_language}`
  12. - ✅ Resolve execution mode from config (`tea_execution_mode`, `tea_capability_probe`)
  13. - ✅ Apply fallback rules deterministically when requested mode is unsupported
  14. - ✅ Wait for required worker steps to complete
  15. - ❌ Do NOT skip capability checks when probing is enabled
  16. ---
  17. ## EXECUTION PROTOCOLS:
  18. - 🎯 Follow the MANDATORY SEQUENCE exactly
  19. - 💾 Wait for subagent outputs
  20. - 📖 Load the next step only when instructed
  21. ---
  22. ## MANDATORY SEQUENCE
  23. ### 1. Prepare Execution Context
  24. **Generate unique timestamp:**
  25. ```javascript
  26. const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
  27. ```
  28. **Prepare context:**
  29. ```javascript
  30. const parseBooleanFlag = (value, defaultValue = true) => {
  31. if (typeof value === 'string') {
  32. const normalized = value.trim().toLowerCase();
  33. if (['false', '0', 'off', 'no'].includes(normalized)) return false;
  34. if (['true', '1', 'on', 'yes'].includes(normalized)) return true;
  35. }
  36. if (value === undefined || value === null) return defaultValue;
  37. return Boolean(value);
  38. };
  39. const subagentContext = {
  40. system_context: /* from Step 1 */,
  41. nfr_thresholds: /* from Step 2 */,
  42. evidence_gathered: /* from Step 3 */,
  43. config: {
  44. execution_mode: config.tea_execution_mode || 'auto', // "auto" | "subagent" | "agent-team" | "sequential"
  45. capability_probe: parseBooleanFlag(config.tea_capability_probe, true), // supports booleans and "false"/"true" strings
  46. },
  47. timestamp: timestamp
  48. };
  49. ```
  50. ---
  51. ### 2. Resolve Execution Mode with Capability Probe
  52. ```javascript
  53. const normalizeUserExecutionMode = (mode) => {
  54. if (typeof mode !== 'string') return null;
  55. const normalized = mode.trim().toLowerCase().replace(/[-_]/g, ' ').replace(/\s+/g, ' ');
  56. if (normalized === 'auto') return 'auto';
  57. if (normalized === 'sequential') return 'sequential';
  58. if (normalized === 'subagent' || normalized === 'sub agent' || normalized === 'subagents' || normalized === 'sub agents') {
  59. return 'subagent';
  60. }
  61. if (normalized === 'agent team' || normalized === 'agent teams' || normalized === 'agentteam') {
  62. return 'agent-team';
  63. }
  64. return null;
  65. };
  66. const normalizeConfigExecutionMode = (mode) => {
  67. if (mode === 'subagent') return 'subagent';
  68. if (mode === 'auto' || mode === 'sequential' || mode === 'subagent' || mode === 'agent-team') {
  69. return mode;
  70. }
  71. return null;
  72. };
  73. // Explicit user instruction in the active run takes priority over config.
  74. const explicitModeFromUser = normalizeUserExecutionMode(runtime.getExplicitExecutionModeHint?.() || null);
  75. const requestedMode = explicitModeFromUser || normalizeConfigExecutionMode(subagentContext.config.execution_mode) || 'auto';
  76. const probeEnabled = subagentContext.config.capability_probe;
  77. const supports = {
  78. subagent: false,
  79. agentTeam: false,
  80. };
  81. if (probeEnabled) {
  82. supports.subagent = runtime.canLaunchSubagents?.() === true;
  83. supports.agentTeam = runtime.canLaunchAgentTeams?.() === true;
  84. }
  85. let resolvedMode = requestedMode;
  86. if (requestedMode === 'auto') {
  87. if (supports.agentTeam) resolvedMode = 'agent-team';
  88. else if (supports.subagent) resolvedMode = 'subagent';
  89. else resolvedMode = 'sequential';
  90. } else if (probeEnabled && requestedMode === 'agent-team' && !supports.agentTeam) {
  91. resolvedMode = supports.subagent ? 'subagent' : 'sequential';
  92. } else if (probeEnabled && requestedMode === 'subagent' && !supports.subagent) {
  93. resolvedMode = 'sequential';
  94. }
  95. subagentContext.execution = {
  96. requestedMode,
  97. resolvedMode,
  98. probeEnabled,
  99. supports,
  100. };
  101. ```
  102. Resolution precedence:
  103. 1. Explicit user request in this run (`agent team` => `agent-team`; `subagent` => `subagent`; `sequential`; `auto`)
  104. 2. `tea_execution_mode` from config
  105. 3. Runtime capability fallback (when probing enabled)
  106. If probing is disabled, honor the requested mode strictly. If that mode cannot be executed at runtime, fail with explicit error instead of silent fallback.
  107. ---
  108. ### 3. Dispatch 4 NFR Workers
  109. **Subagent A: Security Evidence Audit**
  110. - File: `./step-04a-subagent-security.md`
  111. - Output: `/tmp/tea-nfr-security-${timestamp}.json`
  112. - Execution:
  113. - `agent-team` or `subagent`: launch non-blocking
  114. - `sequential`: run blocking and wait
  115. - Status: Running... ⟳
  116. **Subagent B: Performance Evidence Audit**
  117. - File: `./step-04b-subagent-performance.md`
  118. - Output: `/tmp/tea-nfr-performance-${timestamp}.json`
  119. - Status: Running... ⟳
  120. **Subagent C: Reliability Evidence Audit**
  121. - File: `./step-04c-subagent-reliability.md`
  122. - Output: `/tmp/tea-nfr-reliability-${timestamp}.json`
  123. - Status: Running... ⟳
  124. **Subagent D: Scalability Evidence Audit**
  125. - File: `./step-04d-subagent-scalability.md`
  126. - Output: `/tmp/tea-nfr-scalability-${timestamp}.json`
  127. - Status: Running... ⟳
  128. In `agent-team` and `subagent` modes, runtime decides worker scheduling and concurrency.
  129. ---
  130. ### 4. Wait for Expected Worker Completion
  131. **If `resolvedMode` is `agent-team` or `subagent`:**
  132. ```
  133. ⏳ Waiting for 4 NFR subagents to complete...
  134. ├── Subagent A (Security): Running... ⟳
  135. ├── Subagent B (Performance): Running... ⟳
  136. ├── Subagent C (Reliability): Running... ⟳
  137. └── Subagent D (Scalability): Running... ⟳
  138. [... time passes ...]
  139. ✅ All 4 NFR subagents completed!
  140. ```
  141. **If `resolvedMode` is `sequential`:**
  142. ```
  143. ✅ Sequential mode: each worker already completed during dispatch.
  144. ```
  145. ---
  146. ### 5. Verify All Outputs Exist
  147. ```javascript
  148. const outputs = ['security', 'performance', 'reliability', 'scalability'].map((domain) => `/tmp/tea-nfr-${domain}-${timestamp}.json`);
  149. outputs.forEach((output) => {
  150. if (!fs.existsSync(output)) {
  151. throw new Error(`Subagent output missing: ${output}`);
  152. }
  153. });
  154. ```
  155. ---
  156. ### 6. Execution Report
  157. ```
  158. 🚀 Performance Report:
  159. - Execution Mode: {resolvedMode}
  160. - Total Elapsed: ~mode-dependent
  161. - Parallel Gain: ~67% faster when mode is subagent/agent-team
  162. ```
  163. ---
  164. ### 7. Proceed to Aggregation
  165. Load next step: `{nextStepFile}`
  166. The aggregation step will:
  167. - Read all 4 NFR domain outputs
  168. - Calculate overall risk level
  169. - Aggregate compliance status
  170. - Identify cross-domain risks
  171. - Generate executive summary
  172. ---
  173. ## EXIT CONDITION
  174. Proceed when all 4 required worker steps completed and outputs exist.
  175. ---
  176. ## 🚨 SYSTEM SUCCESS METRICS
  177. ### ✅ SUCCESS:
  178. - All required worker steps completed
  179. - Fallback behavior respected configuration and capability probe rules
  180. ### ❌ FAILURE:
  181. - One or more subagents failed
  182. - Unsupported requested mode with probing disabled