|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- ---
- name: 'step-03f-aggregate-scores'
- description: 'Aggregate quality dimension scores into overall 0-100 score'
- nextStepFile: '{skill-root}/steps-c/step-04-generate-report.md'
- outputFile: '{test_artifacts}/test-review.md'
- ---
-
- # Step 3F: Aggregate Quality Scores
-
- ## STEP GOAL
-
- Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
-
- ---
-
- ## MANDATORY EXECUTION RULES
-
- - 📖 Read the entire step file before acting
- - ✅ Speak in `{communication_language}`
- - ✅ Read all 4 subagent outputs
- - ✅ Calculate weighted overall score
- - ✅ Aggregate violations by severity
- - ❌ Do NOT re-evaluate quality (use subagent outputs)
-
- ---
-
- ## EXECUTION PROTOCOLS:
-
- - 🎯 Follow the MANDATORY SEQUENCE exactly
- - 💾 Record outputs before proceeding
- - 📖 Load the next step only when instructed
-
- ---
-
- ## MANDATORY SEQUENCE
-
- ### 1. Read All Subagent Outputs
-
- ```javascript
- // Use the SAME timestamp generated in Step 3 (do not regenerate).
- const timestamp = subagentContext?.timestamp;
- if (!timestamp) {
- throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
- }
- const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
- const results = {};
-
- dimensions.forEach((dim) => {
- const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
- results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
- });
- ```
-
- **Verify all succeeded:**
-
- ```javascript
- const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
- if (!allSucceeded) {
- throw new Error('One or more quality subagents failed!');
- }
- ```
-
- ---
-
- ### 2. Calculate Weighted Overall Score
-
- **Dimension Weights** (based on TEA quality priorities):
-
- ```javascript
- const weights = {
- determinism: 0.3, // 30% - Reliability and flake prevention
- isolation: 0.3, // 30% - Parallel safety and independence
- maintainability: 0.25, // 25% - Readability and long-term health
- performance: 0.15, // 15% - Speed and execution efficiency
- };
- ```
-
- **Calculate overall score:**
-
- ```javascript
- const overallScore = dimensions.reduce((sum, dim) => {
- return sum + results[dim].score * weights[dim];
- }, 0);
-
- const roundedScore = Math.round(overallScore);
- ```
-
- **Determine grade:**
-
- ```javascript
- const getGrade = (score) => {
- if (score >= 90) return 'A';
- if (score >= 80) return 'B';
- if (score >= 70) return 'C';
- if (score >= 60) return 'D';
- return 'F';
- };
-
- const overallGrade = getGrade(roundedScore);
- ```
-
- ---
-
- ### 3. Aggregate Violations by Severity
-
- **Collect all violations from all dimensions:**
-
- ```javascript
- const allViolations = dimensions.flatMap((dim) =>
- results[dim].violations.map((v) => ({
- ...v,
- dimension: dim,
- })),
- );
-
- // Group by severity
- const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
- const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
- const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
-
- const violationSummary = {
- total: allViolations.length,
- HIGH: highSeverity.length,
- MEDIUM: mediumSeverity.length,
- LOW: lowSeverity.length,
- };
- ```
-
- ---
-
- ### 4. Prioritize Recommendations
-
- **Extract recommendations from all dimensions:**
-
- ```javascript
- const allRecommendations = dimensions.flatMap((dim) =>
- results[dim].recommendations.map((rec) => ({
- dimension: dim,
- recommendation: rec,
- impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
- })),
- );
-
- // Sort by impact (HIGH first)
- const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
- ```
-
- ---
-
- ### 5. Create Review Summary Object
-
- **Aggregate all results:**
-
- ```javascript
- const reviewSummary = {
- overall_score: roundedScore,
- overall_grade: overallGrade,
- quality_assessment: getQualityAssessment(roundedScore),
-
- dimension_scores: {
- determinism: results.determinism.score,
- isolation: results.isolation.score,
- maintainability: results.maintainability.score,
- performance: results.performance.score,
- },
-
- dimension_grades: {
- determinism: results.determinism.grade,
- isolation: results.isolation.grade,
- maintainability: results.maintainability.grade,
- performance: results.performance.grade,
- },
-
- violations_summary: violationSummary,
-
- all_violations: allViolations,
-
- high_severity_violations: highSeverity,
-
- top_10_recommendations: prioritizedRecommendations,
-
- subagent_execution: 'PARALLEL (4 quality dimensions)',
- performance_gain: '~60% faster than sequential',
- };
-
- // Save for Step 4 (report generation)
- fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
- ```
-
- ---
-
- ### 6. Display Summary to User
-
- ```
- ✅ Quality Evaluation Complete (Parallel Execution)
-
- 📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
-
- 📈 Dimension Scores:
- - Determinism: {determinism_score}/100 ({determinism_grade})
- - Isolation: {isolation_score}/100 ({isolation_grade})
- - Maintainability: {maintainability_score}/100 ({maintainability_grade})
- - Performance: {performance_score}/100 ({performance_grade})
-
- ℹ️ Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
-
- ⚠️ Violations Found:
- - HIGH: {high_count} violations
- - MEDIUM: {medium_count} violations
- - LOW: {low_count} violations
- - TOTAL: {total_count} violations
-
- 🚀 Performance: Parallel execution ~60% faster than sequential
-
- ✅ Ready for report generation (Step 4)
- ```
-
- ---
-
- ---
-
- ### 7. Save Progress
-
- **Save this step's accumulated work to `{outputFile}`.**
-
- - **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
-
- ```yaml
- ---
- stepsCompleted: ['step-03f-aggregate-scores']
- lastStep: 'step-03f-aggregate-scores'
- lastSaved: '{date}'
- ---
- ```
-
- Then write this step's output below the frontmatter.
-
- - **If `{outputFile}` already exists**, update:
- - Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
- - Set `lastStep: 'step-03f-aggregate-scores'`
- - Set `lastSaved: '{date}'`
- - Append this step's output to the appropriate section of the document.
-
- ---
-
- ## EXIT CONDITION
-
- Proceed to Step 4 when:
-
- - ✅ All subagent outputs read successfully
- - ✅ Overall score calculated
- - ✅ Violations aggregated
- - ✅ Recommendations prioritized
- - ✅ Summary saved to temp file
- - ✅ Output displayed to user
- - ✅ Progress saved to output document
-
- Load next step: `{nextStepFile}`
-
- ---
-
- ## 🚨 SYSTEM SUCCESS METRICS
-
- ### ✅ SUCCESS:
-
- - All 4 subagent outputs read and parsed
- - Overall score calculated with proper weights
- - Violations aggregated correctly
- - Summary complete and saved
-
- ### ❌ FAILURE:
-
- - Failed to read one or more subagent outputs
- - Score calculation incorrect
- - Summary missing or incomplete
-
- **Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.
|