You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

checklist.md 15KB

5 päivää sitten
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475
  1. # Test Quality Review - Validation Checklist
  2. Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
  3. ---
  4. ## Prerequisites
  5. Note: `test-review` is optional and only audits existing tests; it does not generate tests.
  6. Coverage analysis is out of scope for this workflow. Use `trace` for coverage metrics and coverage gate decisions.
  7. ### Test File Discovery
  8. - [ ] Test file(s) identified for review (single/directory/suite scope)
  9. - [ ] Test files exist and are readable
  10. - [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
  11. - [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
  12. ### Knowledge Base Loading
  13. - [ ] tea-index.csv loaded successfully
  14. - [ ] `test-quality.md` loaded (Definition of Done)
  15. - [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
  16. - [ ] `network-first.md` loaded (Route intercept before navigate)
  17. - [ ] `data-factories.md` loaded (Factory patterns)
  18. - [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
  19. - [ ] All other enabled fragments loaded successfully
  20. ### Context Gathering
  21. - [ ] Story file discovered or explicitly provided (if available)
  22. - [ ] Test design document discovered or explicitly provided (if available)
  23. - [ ] Acceptance criteria extracted from story (if available)
  24. - [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
  25. ---
  26. ## Process Steps
  27. ### Step 1: Context Loading
  28. - [ ] Review scope determined (single/directory/suite)
  29. - [ ] Test file paths collected
  30. - [ ] Related artifacts discovered (story, test-design)
  31. - [ ] Knowledge base fragments loaded successfully
  32. - [ ] Quality criteria flags read from workflow variables
  33. ### Step 2: Test File Parsing
  34. **For Each Test File:**
  35. - [ ] File read successfully
  36. - [ ] File size measured (lines, KB)
  37. - [ ] File structure parsed (describe blocks, it blocks)
  38. - [ ] Test IDs extracted (if present)
  39. - [ ] Priority markers extracted (if present)
  40. - [ ] Imports analyzed
  41. - [ ] Dependencies identified
  42. **Test Structure Analysis:**
  43. - [ ] Describe block count calculated
  44. - [ ] It/test block count calculated
  45. - [ ] BDD structure identified (Given-When-Then)
  46. - [ ] Fixture usage detected
  47. - [ ] Data factory usage detected
  48. - [ ] Network interception patterns identified
  49. - [ ] Assertions counted
  50. - [ ] Waits and timeouts cataloged
  51. - [ ] Conditionals (if/else) detected
  52. - [ ] Try/catch blocks detected
  53. - [ ] Shared state or globals detected
  54. ### Step 3: Quality Criteria Validation
  55. Coverage criteria are intentionally excluded from this checklist.
  56. **For Each Enabled Criterion:**
  57. #### BDD Format (if `check_given_when_then: true`)
  58. - [ ] Given-When-Then structure evaluated
  59. - [ ] Status assigned (PASS/WARN/FAIL)
  60. - [ ] Violations recorded with line numbers
  61. - [ ] Examples of good/bad patterns noted
  62. #### Test IDs (if `check_test_ids: true`)
  63. - [ ] Test ID presence validated
  64. - [ ] Test ID format checked (e.g., 1.3-E2E-001)
  65. - [ ] Status assigned (PASS/WARN/FAIL)
  66. - [ ] Missing IDs cataloged
  67. #### Priority Markers (if `check_priority_markers: true`)
  68. - [ ] P0/P1/P2/P3 classification validated
  69. - [ ] Status assigned (PASS/WARN/FAIL)
  70. - [ ] Missing priorities cataloged
  71. #### Hard Waits (if `check_hard_waits: true`)
  72. - [ ] sleep(), waitForTimeout(), hardcoded delays detected
  73. - [ ] Justification comments checked
  74. - [ ] Status assigned (PASS/WARN/FAIL)
  75. - [ ] Violations recorded with line numbers and recommended fixes
  76. #### Determinism (if `check_determinism: true`)
  77. - [ ] Conditionals (if/else/switch) detected
  78. - [ ] Try/catch abuse detected
  79. - [ ] Random values (Math.random, Date.now) detected
  80. - [ ] Status assigned (PASS/WARN/FAIL)
  81. - [ ] Violations recorded with recommended fixes
  82. #### Isolation (if `check_isolation: true`)
  83. - [ ] Cleanup hooks (afterEach/afterAll) validated
  84. - [ ] Shared state detected
  85. - [ ] Global variable mutations detected
  86. - [ ] Resource cleanup verified
  87. - [ ] Status assigned (PASS/WARN/FAIL)
  88. - [ ] Violations recorded with recommended fixes
  89. #### Fixture Patterns (if `check_fixture_patterns: true`)
  90. - [ ] Fixtures detected (test.extend)
  91. - [ ] Pure functions validated
  92. - [ ] mergeTests usage checked
  93. - [ ] beforeEach complexity analyzed
  94. - [ ] Status assigned (PASS/WARN/FAIL)
  95. - [ ] Violations recorded with recommended fixes
  96. #### Data Factories (if `check_data_factories: true`)
  97. - [ ] Factory functions detected
  98. - [ ] Hardcoded data (magic strings/numbers) detected
  99. - [ ] Faker.js or similar usage validated
  100. - [ ] API-first setup pattern checked
  101. - [ ] Status assigned (PASS/WARN/FAIL)
  102. - [ ] Violations recorded with recommended fixes
  103. #### Network-First (if `check_network_first: true`)
  104. - [ ] page.route() before page.goto() validated
  105. - [ ] Race conditions detected (route after navigate)
  106. - [ ] Network wait patterns checked (`interceptNetworkCall` preferred over ad hoc `waitForResponse`)
  107. - [ ] Status assigned (PASS/WARN/FAIL)
  108. - [ ] Violations recorded with recommended fixes
  109. #### Assertions (if `check_assertions: true`)
  110. - [ ] Explicit assertions counted
  111. - [ ] Implicit waits without assertions detected
  112. - [ ] Assertion specificity validated
  113. - [ ] Status assigned (PASS/WARN/FAIL)
  114. - [ ] Violations recorded with recommended fixes
  115. #### Test Length (if `check_test_length: true`)
  116. - [ ] File line count calculated
  117. - [ ] Threshold comparison (≤300 lines ideal)
  118. - [ ] Status assigned (PASS/WARN/FAIL)
  119. - [ ] Splitting recommendations generated (if >300 lines)
  120. #### Test Duration (if `check_test_duration: true`)
  121. - [ ] Test complexity analyzed (as proxy for duration if no execution data)
  122. - [ ] Threshold comparison (≤1.5 min target)
  123. - [ ] Status assigned (PASS/WARN/FAIL)
  124. - [ ] Optimization recommendations generated
  125. #### Flakiness Patterns (if `check_flakiness_patterns: true`)
  126. - [ ] Tight timeouts detected (e.g., { timeout: 1000 })
  127. - [ ] Race conditions detected
  128. - [ ] Timing-dependent assertions detected
  129. - [ ] Retry logic detected
  130. - [ ] Environment-dependent assumptions detected
  131. - [ ] Status assigned (PASS/WARN/FAIL)
  132. - [ ] Violations recorded with recommended fixes
  133. ---
  134. ### Step 4: Quality Score Calculation
  135. **Violation Counting:**
  136. - [ ] Critical (P0) violations counted
  137. - [ ] High (P1) violations counted
  138. - [ ] Medium (P2) violations counted
  139. - [ ] Low (P3) violations counted
  140. - [ ] Violation breakdown by criterion recorded
  141. **Score Calculation:**
  142. - [ ] Starting score: 100
  143. - [ ] Critical violations deducted (-10 each)
  144. - [ ] High violations deducted (-5 each)
  145. - [ ] Medium violations deducted (-2 each)
  146. - [ ] Low violations deducted (-1 each)
  147. - [ ] Bonus points added (max +30):
  148. - [ ] Excellent BDD structure (+5 if applicable)
  149. - [ ] Comprehensive fixtures (+5 if applicable)
  150. - [ ] Comprehensive data factories (+5 if applicable)
  151. - [ ] Network-first pattern (+5 if applicable)
  152. - [ ] Perfect isolation (+5 if applicable)
  153. - [ ] All test IDs present (+5 if applicable)
  154. - [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
  155. **Quality Grade:**
  156. - [ ] Grade assigned based on score:
  157. - 90-100: A+ (Excellent)
  158. - 80-89: A (Good)
  159. - 70-79: B (Acceptable)
  160. - 60-69: C (Needs Improvement)
  161. - <60: F (Critical Issues)
  162. ---
  163. ### Step 5: Review Report Generation
  164. **Report Sections Created:**
  165. - [ ] **Header Section**:
  166. - [ ] Test file(s) reviewed listed
  167. - [ ] Review date recorded
  168. - [ ] Review scope noted (single/directory/suite)
  169. - [ ] Quality score and grade displayed
  170. - [ ] **Executive Summary**:
  171. - [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
  172. - [ ] Key strengths listed (3-5 bullet points)
  173. - [ ] Key weaknesses listed (3-5 bullet points)
  174. - [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
  175. - [ ] **Quality Criteria Assessment**:
  176. - [ ] Table with all criteria evaluated
  177. - [ ] Status for each criterion (PASS/WARN/FAIL)
  178. - [ ] Violation count per criterion
  179. - [ ] **Critical Issues (Must Fix)**:
  180. - [ ] P0/P1 violations listed
  181. - [ ] Code location provided for each (file:line)
  182. - [ ] Issue explanation clear
  183. - [ ] Recommended fix provided with code example
  184. - [ ] Knowledge base reference provided
  185. - [ ] **Recommendations (Should Fix)**:
  186. - [ ] P2/P3 violations listed
  187. - [ ] Code location provided for each (file:line)
  188. - [ ] Issue explanation clear
  189. - [ ] Recommended improvement provided with code example
  190. - [ ] Knowledge base reference provided
  191. - [ ] **Best Practices Examples** (if good patterns found):
  192. - [ ] Good patterns highlighted from tests
  193. - [ ] Knowledge base fragments referenced
  194. - [ ] Examples provided for others to follow
  195. - [ ] **Knowledge Base References**:
  196. - [ ] All fragments consulted listed
  197. - [ ] Links to detailed guidance provided
  198. ---
  199. ### Step 6: Optional Outputs Generation
  200. **Inline Comments** (if `generate_inline_comments: true`):
  201. - [ ] Inline comments generated at violation locations
  202. - [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
  203. - [ ] Comments added to test files (no logic changes)
  204. - [ ] Test files remain valid and executable
  205. **Quality Badge** (if `generate_quality_badge: true`):
  206. - [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
  207. - [ ] Badge format suitable for README or documentation
  208. - [ ] Badge saved to output folder
  209. **Story Update** (if `append_to_story: true` and story file exists):
  210. - [ ] "Test Quality Review" section created
  211. - [ ] Quality score included
  212. - [ ] Critical issues summarized
  213. - [ ] Link to full review report provided
  214. - [ ] Story file updated successfully
  215. ---
  216. ### Step 7: Save and Notify
  217. **Outputs Saved:**
  218. - [ ] Review report saved to `{output_file}`
  219. - [ ] Inline comments written to test files (if enabled)
  220. - [ ] Quality badge saved (if enabled)
  221. - [ ] Story file updated (if enabled)
  222. - [ ] All outputs are valid and readable
  223. **Summary Message Generated:**
  224. - [ ] Quality score and grade included
  225. - [ ] Critical issue count stated
  226. - [ ] Recommendation provided (Approve/Request changes/Block)
  227. - [ ] Next steps clarified
  228. - [ ] Message displayed to user
  229. ---
  230. ## Output Validation
  231. ### Review Report Completeness
  232. - [ ] All required sections present
  233. - [ ] No placeholder text or TODOs in report
  234. - [ ] All code locations are accurate (file:line)
  235. - [ ] All code examples are valid and demonstrate fix
  236. - [ ] All knowledge base references are correct
  237. ### Review Report Accuracy
  238. - [ ] Quality score matches violation breakdown
  239. - [ ] Grade matches score range
  240. - [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
  241. - [ ] Violations correctly attributed to quality criteria
  242. - [ ] No false positives (violations are legitimate issues)
  243. - [ ] No false negatives (critical issues not missed)
  244. ### Review Report Clarity
  245. - [ ] Executive summary is clear and actionable
  246. - [ ] Issue explanations are understandable
  247. - [ ] Recommended fixes are implementable
  248. - [ ] Code examples are correct and runnable
  249. - [ ] Recommendation (Approve/Request changes) is clear
  250. ---
  251. ## Quality Checks
  252. ### Knowledge-Based Validation
  253. - [ ] All feedback grounded in knowledge base fragments
  254. - [ ] Recommendations follow proven patterns
  255. - [ ] No arbitrary or opinion-based feedback
  256. - [ ] Knowledge fragment references accurate and relevant
  257. ### Actionable Feedback
  258. - [ ] Every issue includes recommended fix
  259. - [ ] Every fix includes code example
  260. - [ ] Code examples demonstrate correct pattern
  261. - [ ] Fixes reference knowledge base for more detail
  262. ### Severity Classification
  263. - [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
  264. - [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
  265. - [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
  266. - [ ] Low (P3) issues are minor style/preference (verbose tests)
  267. ### Context Awareness
  268. - [ ] Review considers project context (some patterns may be justified)
  269. - [ ] Violations with justification comments noted as acceptable
  270. - [ ] Edge cases acknowledged
  271. - [ ] Recommendations are pragmatic, not dogmatic
  272. ---
  273. ## Integration Points
  274. ### Story File Integration
  275. - [ ] Story file discovered correctly (if available)
  276. - [ ] Acceptance criteria extracted and used for context
  277. - [ ] Test quality section appended to story (if enabled)
  278. - [ ] Link to review report added to story
  279. ### Test Design Integration
  280. - [ ] Test design document discovered correctly (if available)
  281. - [ ] Priority context (P0/P1/P2/P3) extracted and used
  282. - [ ] Review validates tests align with prioritization
  283. - [ ] Misalignment flagged (e.g., P0 scenario missing tests)
  284. ### Knowledge Base Integration
  285. - [ ] tea-index.csv loaded successfully
  286. - [ ] All required fragments loaded
  287. - [ ] Fragments applied correctly to validation
  288. - [ ] Fragment references in report are accurate
  289. ---
  290. ## Edge Cases and Special Situations
  291. ### Empty or Minimal Tests
  292. - [ ] If test file is empty, report notes "No tests found"
  293. - [ ] If test file has only boilerplate, report notes "No meaningful tests"
  294. - [ ] Score reflects lack of content appropriately
  295. ### Legacy Tests
  296. - [ ] Legacy tests acknowledged in context
  297. - [ ] Review provides practical recommendations for improvement
  298. - [ ] Recognizes that complete refactor may not be feasible
  299. - [ ] Prioritizes critical issues (flakiness) over style
  300. ### Test Framework Variations
  301. - [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
  302. - [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
  303. - [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
  304. - [ ] Knowledge fragments applied appropriately for framework
  305. ### Justified Violations
  306. - [ ] Violations with justification comments in code noted as acceptable
  307. - [ ] Justifications evaluated for legitimacy
  308. - [ ] Report acknowledges justified patterns
  309. - [ ] Score not penalized for justified violations
  310. ---
  311. ## Final Validation
  312. ### Review Completeness
  313. - [ ] All enabled quality criteria evaluated
  314. - [ ] All test files in scope reviewed
  315. - [ ] All violations cataloged
  316. - [ ] All recommendations provided
  317. - [ ] Review report is comprehensive
  318. ### Review Accuracy
  319. - [ ] Quality score is accurate
  320. - [ ] Violations are correct (no false positives)
  321. - [ ] Critical issues not missed (no false negatives)
  322. - [ ] Code locations are correct
  323. - [ ] Knowledge base references are accurate
  324. ### Review Usefulness
  325. - [ ] Feedback is actionable
  326. - [ ] Recommendations are implementable
  327. - [ ] Code examples are correct
  328. - [ ] Review helps developer improve tests
  329. - [ ] Review educates on best practices
  330. ### Workflow Complete
  331. - [ ] All checklist items completed
  332. - [ ] All outputs validated and saved
  333. - [ ] User notified with summary
  334. - [ ] Review ready for developer consumption
  335. - [ ] Follow-up actions identified (if any)
  336. ---
  337. ## Notes
  338. Record any issues, observations, or important context during workflow execution:
  339. - **Test Framework**: [Playwright, Jest, Cypress, etc.]
  340. - **Review Scope**: [single file, directory, full suite]
  341. - **Quality Score**: [0-100 score, letter grade]
  342. - **Critical Issues**: [Count of P0/P1 violations]
  343. - **Recommendation**: [Approve / Approve with comments / Request changes / Block]
  344. - **Special Considerations**: [Legacy code, justified patterns, edge cases]
  345. - **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]