選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。

step-03f-aggregate-scores.md 6.9KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277
  1. ---
  2. name: 'step-03f-aggregate-scores'
  3. description: 'Aggregate quality dimension scores into overall 0-100 score'
  4. nextStepFile: '{skill-root}/steps-c/step-04-generate-report.md'
  5. outputFile: '{test_artifacts}/test-review.md'
  6. ---
  7. # Step 3F: Aggregate Quality Scores
  8. ## STEP GOAL
  9. Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
  10. ---
  11. ## MANDATORY EXECUTION RULES
  12. - 📖 Read the entire step file before acting
  13. - ✅ Speak in `{communication_language}`
  14. - ✅ Read all 4 subagent outputs
  15. - ✅ Calculate weighted overall score
  16. - ✅ Aggregate violations by severity
  17. - ❌ Do NOT re-evaluate quality (use subagent outputs)
  18. ---
  19. ## EXECUTION PROTOCOLS:
  20. - 🎯 Follow the MANDATORY SEQUENCE exactly
  21. - 💾 Record outputs before proceeding
  22. - 📖 Load the next step only when instructed
  23. ---
  24. ## MANDATORY SEQUENCE
  25. ### 1. Read All Subagent Outputs
  26. ```javascript
  27. // Use the SAME timestamp generated in Step 3 (do not regenerate).
  28. const timestamp = subagentContext?.timestamp;
  29. if (!timestamp) {
  30. throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
  31. }
  32. const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
  33. const results = {};
  34. dimensions.forEach((dim) => {
  35. const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
  36. results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
  37. });
  38. ```
  39. **Verify all succeeded:**
  40. ```javascript
  41. const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
  42. if (!allSucceeded) {
  43. throw new Error('One or more quality subagents failed!');
  44. }
  45. ```
  46. ---
  47. ### 2. Calculate Weighted Overall Score
  48. **Dimension Weights** (based on TEA quality priorities):
  49. ```javascript
  50. const weights = {
  51. determinism: 0.3, // 30% - Reliability and flake prevention
  52. isolation: 0.3, // 30% - Parallel safety and independence
  53. maintainability: 0.25, // 25% - Readability and long-term health
  54. performance: 0.15, // 15% - Speed and execution efficiency
  55. };
  56. ```
  57. **Calculate overall score:**
  58. ```javascript
  59. const overallScore = dimensions.reduce((sum, dim) => {
  60. return sum + results[dim].score * weights[dim];
  61. }, 0);
  62. const roundedScore = Math.round(overallScore);
  63. ```
  64. **Determine grade:**
  65. ```javascript
  66. const getGrade = (score) => {
  67. if (score >= 90) return 'A';
  68. if (score >= 80) return 'B';
  69. if (score >= 70) return 'C';
  70. if (score >= 60) return 'D';
  71. return 'F';
  72. };
  73. const overallGrade = getGrade(roundedScore);
  74. ```
  75. ---
  76. ### 3. Aggregate Violations by Severity
  77. **Collect all violations from all dimensions:**
  78. ```javascript
  79. const allViolations = dimensions.flatMap((dim) =>
  80. results[dim].violations.map((v) => ({
  81. ...v,
  82. dimension: dim,
  83. })),
  84. );
  85. // Group by severity
  86. const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
  87. const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
  88. const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
  89. const violationSummary = {
  90. total: allViolations.length,
  91. HIGH: highSeverity.length,
  92. MEDIUM: mediumSeverity.length,
  93. LOW: lowSeverity.length,
  94. };
  95. ```
  96. ---
  97. ### 4. Prioritize Recommendations
  98. **Extract recommendations from all dimensions:**
  99. ```javascript
  100. const allRecommendations = dimensions.flatMap((dim) =>
  101. results[dim].recommendations.map((rec) => ({
  102. dimension: dim,
  103. recommendation: rec,
  104. impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
  105. })),
  106. );
  107. // Sort by impact (HIGH first)
  108. const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
  109. ```
  110. ---
  111. ### 5. Create Review Summary Object
  112. **Aggregate all results:**
  113. ```javascript
  114. const reviewSummary = {
  115. overall_score: roundedScore,
  116. overall_grade: overallGrade,
  117. quality_assessment: getQualityAssessment(roundedScore),
  118. dimension_scores: {
  119. determinism: results.determinism.score,
  120. isolation: results.isolation.score,
  121. maintainability: results.maintainability.score,
  122. performance: results.performance.score,
  123. },
  124. dimension_grades: {
  125. determinism: results.determinism.grade,
  126. isolation: results.isolation.grade,
  127. maintainability: results.maintainability.grade,
  128. performance: results.performance.grade,
  129. },
  130. violations_summary: violationSummary,
  131. all_violations: allViolations,
  132. high_severity_violations: highSeverity,
  133. top_10_recommendations: prioritizedRecommendations,
  134. subagent_execution: 'PARALLEL (4 quality dimensions)',
  135. performance_gain: '~60% faster than sequential',
  136. };
  137. // Save for Step 4 (report generation)
  138. fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
  139. ```
  140. ---
  141. ### 6. Display Summary to User
  142. ```
  143. ✅ Quality Evaluation Complete (Parallel Execution)
  144. 📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
  145. 📈 Dimension Scores:
  146. - Determinism: {determinism_score}/100 ({determinism_grade})
  147. - Isolation: {isolation_score}/100 ({isolation_grade})
  148. - Maintainability: {maintainability_score}/100 ({maintainability_grade})
  149. - Performance: {performance_score}/100 ({performance_grade})
  150. ℹ️ Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
  151. ⚠️ Violations Found:
  152. - HIGH: {high_count} violations
  153. - MEDIUM: {medium_count} violations
  154. - LOW: {low_count} violations
  155. - TOTAL: {total_count} violations
  156. 🚀 Performance: Parallel execution ~60% faster than sequential
  157. ✅ Ready for report generation (Step 4)
  158. ```
  159. ---
  160. ---
  161. ### 7. Save Progress
  162. **Save this step's accumulated work to `{outputFile}`.**
  163. - **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
  164. ```yaml
  165. ---
  166. stepsCompleted: ['step-03f-aggregate-scores']
  167. lastStep: 'step-03f-aggregate-scores'
  168. lastSaved: '{date}'
  169. ---
  170. ```
  171. Then write this step's output below the frontmatter.
  172. - **If `{outputFile}` already exists**, update:
  173. - Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
  174. - Set `lastStep: 'step-03f-aggregate-scores'`
  175. - Set `lastSaved: '{date}'`
  176. - Append this step's output to the appropriate section of the document.
  177. ---
  178. ## EXIT CONDITION
  179. Proceed to Step 4 when:
  180. - ✅ All subagent outputs read successfully
  181. - ✅ Overall score calculated
  182. - ✅ Violations aggregated
  183. - ✅ Recommendations prioritized
  184. - ✅ Summary saved to temp file
  185. - ✅ Output displayed to user
  186. - ✅ Progress saved to output document
  187. Load next step: `{nextStepFile}`
  188. ---
  189. ## 🚨 SYSTEM SUCCESS METRICS
  190. ### ✅ SUCCESS:
  191. - All 4 subagent outputs read and parsed
  192. - Overall score calculated with proper weights
  193. - Violations aggregated correctly
  194. - Summary complete and saved
  195. ### ❌ FAILURE:
  196. - Failed to read one or more subagent outputs
  197. - Score calculation incorrect
  198. - Summary missing or incomplete
  199. **Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.