Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456
  1. # File Utilities
  2. ## Principle
  3. Read and validate files (CSV, XLSX, PDF, ZIP) with automatic parsing, type-safe results, and download handling. Simplify file operations in Playwright tests with built-in format support and validation helpers.
  4. ## Rationale
  5. Testing file operations in Playwright requires boilerplate:
  6. - Manual download handling
  7. - External parsing libraries for each format
  8. - No validation helpers
  9. - Type-unsafe results
  10. - Repetitive path handling
  11. The `file-utils` module provides:
  12. - **Auto-parsing**: CSV, XLSX, PDF, ZIP automatically parsed
  13. - **Download handling**: Single function for UI or API-triggered downloads
  14. - **Type-safe**: TypeScript interfaces for parsed results
  15. - **Validation helpers**: Row count, header checks, content validation
  16. - **Format support**: Multiple sheet support (XLSX), text extraction (PDF), archive extraction (ZIP)
  17. ## Why Use This Instead of Vanilla Playwright?
  18. | Vanilla Playwright | File Utils |
  19. | ------------------------------------------- | ------------------------------------------------ |
  20. | ~80 lines per CSV flow (download + parse) | ~10 lines end-to-end |
  21. | Manual event orchestration for downloads | Encapsulated in `handleDownload()` |
  22. | Manual path handling and `saveAs` | Returns a ready-to-use file path |
  23. | Manual existence checks and error handling | Centralized in one place via utility patterns |
  24. | Manual CSV parsing config (headers, typing) | `readCSV()` returns `{ data, headers }` directly |
  25. ## Pattern Examples
  26. ### Example 1: UI-Triggered CSV Download
  27. **Context**: User clicks button, CSV downloads, validate contents.
  28. **Implementation**:
  29. ```typescript
  30. import { handleDownload, readCSV } from '@seontechnologies/playwright-utils/file-utils';
  31. import path from 'node:path';
  32. const DOWNLOAD_DIR = path.join(__dirname, '../downloads');
  33. test('should download and validate CSV', async ({ page }) => {
  34. const downloadPath = await handleDownload({
  35. page,
  36. downloadDir: DOWNLOAD_DIR,
  37. trigger: () => page.getByTestId('download-button-text/csv').click(),
  38. });
  39. const csvResult = await readCSV({ filePath: downloadPath });
  40. // Access parsed data and headers
  41. const { data, headers } = csvResult.content;
  42. expect(headers).toEqual(['ID', 'Name', 'Email']);
  43. expect(data[0]).toMatchObject({
  44. ID: expect.any(String),
  45. Name: expect.any(String),
  46. Email: expect.any(String),
  47. });
  48. });
  49. ```
  50. **Key Points**:
  51. - `handleDownload` waits for download, returns file path
  52. - `readCSV` auto-parses to `{ headers, data }`
  53. - Type-safe access to parsed content
  54. - Clean up downloads in `afterEach`
  55. ### Example 2: XLSX with Multiple Sheets
  56. **Context**: Excel file with multiple sheets (e.g., Summary, Details, Errors).
  57. **Implementation**:
  58. ```typescript
  59. import { readXLSX } from '@seontechnologies/playwright-utils/file-utils';
  60. test('should read multi-sheet XLSX', async () => {
  61. const downloadPath = await handleDownload({
  62. page,
  63. downloadDir: DOWNLOAD_DIR,
  64. trigger: () => page.click('[data-testid="export-xlsx"]'),
  65. });
  66. const xlsxResult = await readXLSX({ filePath: downloadPath });
  67. // Verify worksheet structure
  68. expect(xlsxResult.content.worksheets.length).toBeGreaterThan(0);
  69. const worksheet = xlsxResult.content.worksheets[0];
  70. expect(worksheet).toBeDefined();
  71. expect(worksheet).toHaveProperty('name');
  72. // Access sheet data
  73. const sheetData = worksheet?.data;
  74. expect(Array.isArray(sheetData)).toBe(true);
  75. // Use type assertion for type safety
  76. const firstRow = sheetData![0] as Record<string, unknown>;
  77. expect(firstRow).toHaveProperty('id');
  78. });
  79. ```
  80. **Key Points**:
  81. - `worksheets` array with `name` and `data` properties
  82. - Access sheets by name
  83. - Each sheet has its own headers and data
  84. - Type-safe sheet iteration
  85. ### Example 3: PDF Text Extraction
  86. **Context**: Validate PDF report contains expected content.
  87. **Implementation**:
  88. ```typescript
  89. import { readPDF } from '@seontechnologies/playwright-utils/file-utils';
  90. test('should validate PDF report', async () => {
  91. const downloadPath = await handleDownload({
  92. page,
  93. downloadDir: DOWNLOAD_DIR,
  94. trigger: () => page.getByTestId('download-button-Text-based PDF Document').click(),
  95. });
  96. const pdfResult = await readPDF({ filePath: downloadPath });
  97. // content is extracted text from all pages
  98. expect(pdfResult.pagesCount).toBe(1);
  99. expect(pdfResult.fileName).toContain('.pdf');
  100. expect(pdfResult.content).toContain('All you need is the free Adobe Acrobat Reader');
  101. });
  102. ```
  103. **PDF Reader Options:**
  104. ```typescript
  105. const result = await readPDF({
  106. filePath: '/path/to/document.pdf',
  107. mergePages: false, // Keep pages separate (default: true)
  108. debug: true, // Enable debug logging
  109. maxPages: 10, // Limit processing to first 10 pages
  110. });
  111. ```
  112. **Important Limitation - Vector-based PDFs:**
  113. Text extraction may fail for PDFs that store text as vector graphics (e.g., those generated by jsPDF):
  114. ```typescript
  115. // Vector-based PDF example (extraction fails gracefully)
  116. const pdfResult = await readPDF({ filePath: downloadPath });
  117. expect(pdfResult.pagesCount).toBe(1);
  118. expect(pdfResult.info.extractionNotes).toContain('Text extraction from vector-based PDFs is not supported.');
  119. ```
  120. Such PDFs will have:
  121. - `textExtractionSuccess: false`
  122. - `isVectorBased: true`
  123. - Explanatory message in `extractionNotes`
  124. ### Example 4: ZIP Archive Validation
  125. **Context**: Validate ZIP contains expected files and extract specific file.
  126. **Implementation**:
  127. ```typescript
  128. import { readZIP } from '@seontechnologies/playwright-utils/file-utils';
  129. test('should validate ZIP archive', async () => {
  130. const downloadPath = await handleDownload({
  131. page,
  132. downloadDir: DOWNLOAD_DIR,
  133. trigger: () => page.click('[data-testid="download-backup"]'),
  134. });
  135. const zipResult = await readZIP({ filePath: downloadPath });
  136. // Check file list
  137. expect(Array.isArray(zipResult.content.entries)).toBe(true);
  138. expect(zipResult.content.entries).toContain('Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv');
  139. // Extract specific file
  140. const targetFile = 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv';
  141. const zipWithExtraction = await readZIP({
  142. filePath: downloadPath,
  143. fileToExtract: targetFile,
  144. });
  145. // Access extracted file buffer
  146. const extractedFiles = zipWithExtraction.content.extractedFiles || {};
  147. const fileBuffer = extractedFiles[targetFile];
  148. expect(fileBuffer).toBeInstanceOf(Buffer);
  149. expect(fileBuffer?.length).toBeGreaterThan(0);
  150. });
  151. ```
  152. **Key Points**:
  153. - `content.entries` lists all files in archive
  154. - `fileToExtract` extracts specific files to Buffer
  155. - Validate archive structure
  156. - Read and parse individual files from ZIP
  157. ### Example 5: API-Triggered Download
  158. **Context**: API endpoint returns file download (not UI click).
  159. **Implementation**:
  160. ```typescript
  161. test('should download via API', async ({ page, request }) => {
  162. const downloadPath = await handleDownload({
  163. page, // Still need page for download events
  164. downloadDir: DOWNLOAD_DIR,
  165. trigger: async () => {
  166. const response = await request.get('/api/export/csv', {
  167. headers: { Authorization: 'Bearer token' },
  168. });
  169. if (!response.ok()) {
  170. throw new Error(`Export failed: ${response.status()}`);
  171. }
  172. },
  173. });
  174. const { content } = await readCSV({ filePath: downloadPath });
  175. expect(content.data).toHaveLength(100);
  176. });
  177. ```
  178. **Key Points**:
  179. - `trigger` can be async API call
  180. - API must return `Content-Disposition` header
  181. - Still need `page` for download events
  182. - Works with authenticated endpoints
  183. ### Example 6: Reading CSV from Buffer (ZIP extraction)
  184. **Context**: Read CSV content directly from a Buffer (e.g., extracted from ZIP).
  185. **Implementation**:
  186. ```typescript
  187. // Read from a Buffer (e.g., extracted from a ZIP)
  188. const zipResult = await readZIP({
  189. filePath: 'archive.zip',
  190. fileToExtract: 'data.csv',
  191. });
  192. const fileBuffer = zipResult.content.extractedFiles?.['data.csv'];
  193. const csvFromBuffer = await readCSV({ content: fileBuffer });
  194. // Read from a string
  195. const csvString = 'name,age\nJohn,30\nJane,25';
  196. const csvFromString = await readCSV({ content: csvString });
  197. const { data, headers } = csvFromString.content;
  198. expect(headers).toContain('name');
  199. expect(headers).toContain('age');
  200. ```
  201. ## API Reference
  202. ### CSV Reader Options
  203. | Option | Type | Default | Description |
  204. | -------------- | ------------------ | -------- | -------------------------------------- |
  205. | `filePath` | `string` | - | Path to CSV file (mutually exclusive) |
  206. | `content` | `string \| Buffer` | - | Direct content (mutually exclusive) |
  207. | `delimiter` | `string \| 'auto'` | `','` | Value separator, auto-detect if 'auto' |
  208. | `encoding` | `string` | `'utf8'` | File encoding |
  209. | `parseHeaders` | `boolean` | `true` | Use first row as headers |
  210. | `trim` | `boolean` | `true` | Trim whitespace from values |
  211. ### XLSX Reader Options
  212. | Option | Type | Description |
  213. | ----------- | -------- | ------------------------------ |
  214. | `filePath` | `string` | Path to XLSX file |
  215. | `sheetName` | `string` | Name of sheet to set as active |
  216. ### PDF Reader Options
  217. | Option | Type | Default | Description |
  218. | ------------ | --------- | ------- | --------------------------- |
  219. | `filePath` | `string` | - | Path to PDF file (required) |
  220. | `mergePages` | `boolean` | `true` | Merge text from all pages |
  221. | `maxPages` | `number` | - | Maximum pages to extract |
  222. | `debug` | `boolean` | `false` | Enable debug logging |
  223. ### ZIP Reader Options
  224. | Option | Type | Description |
  225. | --------------- | -------- | ---------------------------------- |
  226. | `filePath` | `string` | Path to ZIP file |
  227. | `fileToExtract` | `string` | Specific file to extract to Buffer |
  228. ### Return Values
  229. #### CSV Reader Return Value
  230. ```typescript
  231. {
  232. content: {
  233. data: Array<Array<string | number>>, // Parsed rows (excludes header row if parseHeaders: true)
  234. headers: string[] | null // Column headers (null if parseHeaders: false)
  235. }
  236. }
  237. ```
  238. #### XLSX Reader Return Value
  239. ```typescript
  240. {
  241. content: {
  242. worksheets: Array<{
  243. name: string; // Sheet name
  244. rows: Array<Array<any>>; // All rows including headers
  245. headers?: string[]; // First row as headers (if present)
  246. }>;
  247. }
  248. }
  249. ```
  250. #### PDF Reader Return Value
  251. ```typescript
  252. {
  253. content: string, // Extracted text (merged or per-page based on mergePages)
  254. pagesCount: number, // Total pages in PDF
  255. fileName?: string, // Original filename if available
  256. info?: Record<string, any> // PDF metadata (author, title, etc.)
  257. }
  258. ```
  259. > **Note**: When `mergePages: false`, `content` is an array of strings (one per page). When `maxPages` is set, only that many pages are extracted.
  260. #### ZIP Reader Return Value
  261. ```typescript
  262. {
  263. content: {
  264. entries: Array<{
  265. name: string, // File/directory path within ZIP
  266. size: number, // Uncompressed size in bytes
  267. isDirectory: boolean // True for directories
  268. }>,
  269. extractedFiles: Record<string, Buffer | string> // Extracted file contents by path
  270. }
  271. }
  272. ```
  273. > **Note**: When `fileToExtract` is specified, only that file appears in `extractedFiles`.
  274. ## Download Cleanup Pattern
  275. ```typescript
  276. test.afterEach(async () => {
  277. // Clean up downloaded files
  278. await fs.remove(DOWNLOAD_DIR);
  279. });
  280. ```
  281. ## Comparison with Vanilla Playwright
  282. Vanilla Playwright (real test) snippet:
  283. ```typescript
  284. // ~80 lines of boilerplate!
  285. const [download] = await Promise.all([page.waitForEvent('download'), page.getByTestId('download-button-CSV Export').click()]);
  286. const failure = await download.failure();
  287. expect(failure).toBeNull();
  288. const filePath = testInfo.outputPath(download.suggestedFilename());
  289. await download.saveAs(filePath);
  290. await expect
  291. .poll(
  292. async () => {
  293. try {
  294. await fs.access(filePath);
  295. return true;
  296. } catch {
  297. return false;
  298. }
  299. },
  300. { timeout: 5000, intervals: [100, 200, 500] },
  301. )
  302. .toBe(true);
  303. const csvContent = await fs.readFile(filePath, 'utf-8');
  304. const parseResult = parse(csvContent, {
  305. header: true,
  306. skipEmptyLines: true,
  307. dynamicTyping: true,
  308. transformHeader: (header: string) => header.trim(),
  309. });
  310. if (parseResult.errors.length > 0) {
  311. throw new Error(`CSV parsing errors: ${JSON.stringify(parseResult.errors)}`);
  312. }
  313. const data = parseResult.data as Array<Record<string, unknown>>;
  314. const headers = parseResult.meta.fields || [];
  315. ```
  316. With File Utils, the same flow becomes:
  317. ```typescript
  318. const downloadPath = await handleDownload({
  319. page,
  320. downloadDir: DOWNLOAD_DIR,
  321. trigger: () => page.getByTestId('download-button-text/csv').click(),
  322. });
  323. const { data, headers } = (await readCSV({ filePath: downloadPath })).content;
  324. ```
  325. ## Related Fragments
  326. - `overview.md` - Installation and imports
  327. - `api-request.md` - API-triggered downloads
  328. - `recurse.md` - Poll for file generation completion
  329. ## Anti-Patterns
  330. **DON'T leave downloads in place:**
  331. ```typescript
  332. test('creates file', async () => {
  333. await handleDownload({ ... })
  334. // File left in downloads folder
  335. })
  336. ```
  337. **DO clean up after tests:**
  338. ```typescript
  339. test.afterEach(async () => {
  340. await fs.remove(DOWNLOAD_DIR);
  341. });
  342. ```