File & Document Processing Nodes

Work with files, PDFs, and images in your workflows

File & Document Processing Nodes

Work with files, PDFs, and images. Read uploads, generate documents, parse content, transform images. Everything you need to handle files in your workflows.


File Operations Node

When to use: When you need to read, write, copy, or delete files from disk or cloud storage.

The File Operations Node gives you filesystem access. Read files for processing, write results to disk, move files around, list directories.

Configuration

Configuration:
  operation: read | write | copy | move | delete | list
  path: "/data/{{ input.filename }}"
  cloud_storage: s3 | gcs | azure_blob  # Optional

File Reading Examples

Example 1: Read CSV File

When to use: Process uploaded CSV files for import, analysis, etc.

Incoming data:

{
  "filename": "customers.csv"
}

File Operations Configuration:

operation: read
path: "/uploads/{{ input.filename }}"

Output:

{
  "content": "name,email,phone\nAlice,alice@example.com,555-0101\nBob,bob@example.com,555-0102\n...",
  "file_size_bytes": 1024,
  "path": "/uploads/customers.csv"
}

Connect to Data Parser to convert CSV string → array of objects → Loop → Process each row!

Example 2: Read JSON Configuration File

operation: read
path: "/config/{{ input.config_name }}.json"

Perfect for loading environment-specific configs, feature flags, rules, etc.

Example 3: Read File from S3

operation: read
path: "s3://my-bucket/documents/{{ input.document_id }}.pdf"
cloud_storage: s3

Example 4: Read Multiple Files (List + Loop)

operation: list
path: "/data/{{ input.date }}"
pattern: "*.json"

Output:

{
  "files": [
    "/data/2025-02-10/file1.json",
    "/data/2025-02-10/file2.json",
    "/data/2025-02-10/file3.json"
  ],
  "count": 3
}

Loop through files and process each one!


File Writing Examples

Example 1: Generate and Save Report

When to use: Create files for download, archiving, or sharing.

Incoming data:

{
  "report_id": "RPT-001",
  "content": "Sales Report for Q1 2025\n\nTotal: $500,000\n..."
}

File Operations Configuration:

operation: write
path: "/reports/{{ input.report_id }}.txt"
content: "{{ input.content }}"

Creates /reports/RPT-001.txt for download!

Example 2: Save JSON to File

operation: write
path: "/exports/{{ input.export_id }}.json"
content: "{{ input.data | json }}"

Example 3: Write to Cloud (S3)

operation: write
path: "s3://backups/{{ input.backup_date }}/data.json"
cloud_storage: s3
content: "{{ input.data | json }}"

Perfect for automated backups!


File Movement Examples

Example 1: Move Processed File

operation: move
path: "/inbox/{{ input.filename }}"
destination: "/processed/{{ input.filename }}"

Processed → move to archive folder.

Example 2: Copy File

operation: copy
path: "/source/template.docx"
destination: "/output/{{ input.document_id }}.docx"

Create file from template!


Common File Patterns

Pattern 1: Upload, Read, Process

Start (file upload)
  ↓
File Operations (read)
  ↓
Data Parser (parse CSV/JSON)
  ↓
Loop (for each record)
  ├─ Transform
  ├─ Validate
  ├─ Database Insert
  └─ Next
     ↓
     File Operations (move to processed folder)
     ↓
     Notification (email summary)

Pattern 2: Generate and Download

Start (user request)
  ↓
AI Agent or Code (generate content)
  ↓
File Operations (write to disk)
  ↓
HTTP Response (send file to user)

PDF Operations Node

When to use: When you need to generate, merge, split, parse, or encrypt PDFs.

The PDF Operations Node handles all PDF workflows—create from templates, extract text, merge multiple PDFs, etc.

Configuration

Configuration:
  operation: generate | parse | merge | split | extract | encrypt
  template: "invoice_template"
  data: "{{ input.invoice_data }}"

PDF Generation Examples

Example 1: Generate Invoice from Template

When to use: Create invoices, receipts, reports, contracts—anything PDF.

Incoming data:

{
  "invoice_id": "INV-001",
  "customer_name": "Alice Johnson",
  "items": [
    { "name": "Widget", "price": 25.00, "qty": 2 },
    { "name": "Gadget", "price": 50.00, "qty": 1 }
  ],
  "total": 100.00,
  "due_date": "2025-03-10"
}

PDF Operations Configuration:

operation: generate
template: "invoice_template"
data:
  invoice_id: "{{ input.invoice_id }}"
  customer_name: "{{ input.customer_name }}"
  items: "{{ input.items }}"
  total: "{{ input.total }}"
  due_date: "{{ input.due_date }}"
output_format: pdf

Output:

{
  "pdf_data": "base64-encoded-pdf",
  "filename": "invoice_INV-001.pdf"
}

Save to disk or email directly!

Example 2: Generate from HTML

operation: generate
source_type: html
html_content: |
  <html>
    <h1>{{ invoice_id }}</h1>
    <p>Customer: {{ customer_name }}</p>
    <table>
      {% for item in items %}
      <tr><td>{{ item.name }}</td><td>${{ item.price }}</td></tr>
      {% endfor %}
    </table>
    <p>Total: ${{ total }}</p>
  </html>
output_format: pdf

PDF Parsing Examples

Example 1: Extract Text from PDF

When to use: Extract information from uploaded documents—resumes, contracts, forms.

Incoming data:

{
  "pdf_file": "resume.pdf"
}

PDF Operations Configuration:

operation: extract
source: "/uploads/{{ input.pdf_file }}"
extract_type: text

Output:

{
  "text": "Alice Johnson\nphone: 555-0101\nemail: alice@example.com\n...",
  "page_count": 1
}

Pass to AI Agent for extraction/analysis!

Example 2: Parse Structured PDF

operation: parse
source: "/uploads/contract.pdf"
structure: form
extract_fields:
  - signature_field
  - date_field
  - amount_field

Example 3: Merge Multiple PDFs

When to use: Combine multiple PDFs—contracts, documents, reports.

Incoming data:

{
  "pdfs": ["cover.pdf", "chapter1.pdf", "chapter2.pdf"]
}

PDF Operations Configuration:

operation: merge
files:
  - "/templates/{{ input.pdfs[0] }}"
  - "/templates/{{ input.pdfs[1] }}"
  - "/templates/{{ input.pdfs[2] }}"
output_format: pdf

Creates one merged PDF!

Example 4: Split PDF

operation: split
source: "/documents/large_document.pdf"
split_strategy: by_page
output_path: "/documents/split/"

Outputs: split_1.pdf, split_2.pdf, etc.

Example 5: Encrypt PDF

When to use: Protect sensitive documents with passwords.

operation: encrypt
source: "/documents/confidential.pdf"
owner_password: "{{ credentials.pdf_master_password }}"
user_password: "{{ input.user_password }}"
permissions:
  - no_print
  - no_copy

Common PDF Patterns

Pattern 1: Generate and Email

Start (order confirmed)
  ↓
PDF Operations (generate invoice)
  ↓
File Operations (save to S3)
  ↓
Email (send PDF to customer)

Pattern 2: Extract and Classify

Start (PDF upload)
  ↓
PDF Operations (extract text)
  ↓
AI Agent (classify document type)
  ↓
Switch (route by type)
  ├─ Contract → Legal team
  ├─ Invoice → Accounting
  └─ Report → Managers

Pattern 3: Multi-Step Document Creation

Start
  ├─ PDF Generate (cover page)
  ├─ PDF Generate (chapter 1)
  ├─ PDF Generate (chapter 2)
  └─ PDF Generate (appendix)Merge (combine all)Encrypt (add password)Email

Image Processing Node

When to use: When you need to resize, crop, compress, rotate, watermark, or convert images.

The Image Processing Node handles all image transformations—optimize for web, create thumbnails, add branding, etc.

Configuration

Configuration:
  operation: resize | crop | compress | rotate | watermark | convert
  width: 800
  height: 600
  format: png | jpg | webp

Image Examples

Example 1: Generate Thumbnail

When to use: Create small previews for galleries, lists, etc.

Incoming data:

{
  "image_path": "/uploads/product-photo.jpg"
}

Image Processing Configuration:

operation: resize
source: "{{ input.image_path }}"
width: 300
height: 300
quality: 80
output_format: webp
output_path: "/thumbnails/product-photo-thumb.webp"

Creates optimized thumbnail!

Example 2: Batch Image Optimization

Start (with array of images)
  ↓
Loop (for each image)
  ├─ Image Processing (resize to 1200px)
  ├─ Image Processing (compress, quality 75)
  ├─ Image Processing (convert to WebP)
  ├─ File Operations (save optimized version)
  └─ Next

Perfect for website optimization!

Example 3: Add Watermark

operation: watermark
source: "/photos/event-photo.jpg"
watermark_image: "/assets/logo-watermark.png"
position: bottom-right
opacity: 0.7
output_path: "/watermarked/event-photo.jpg"

Example 4: Crop Image

operation: crop
source: "/images/full-photo.jpg"
x: 100
y: 50
width: 500
height: 300
output_path: "/images/cropped.jpg"

Example 5: Rotate and Convert Format

operation: rotate
source: "/images/photo.jpg"
degrees: 90
output_format: png
output_path: "/images/photo-rotated.png"

Image Processing Patterns

Pattern 1: Batch Image Processing

Start (photos uploaded)
  ↓
Loop (for each photo)
  ├─ Image Processing (resize to standard size)
  ├─ Image Processing (compress)
  ├─ Image Processing (convert to WebP)
  ├─ File Operations (save)
  └─ Database (record new path)
     ↓
     Notification (done)

Pattern 2: Generate Social Media Assets

Start (user uploads photo)
  ├─ Image Processing (resize to 1080x1080 for Instagram)
  ├─ Image Processing (resize to 1200x630 for Facebook)
  ├─ Image Processing (resize to 1024x512 for Twitter)
  ├─ File Operations (save all versions)
  ↓
  Email (download links)

Pattern 3: Smart Image with OCR

Start (document image)
  ↓
Image Processing (optimize contrast)
  ↓
File Operations (save processed image)
  ↓
AI Agent (read text via OCR)
  ↓
Data Transform (extract structured data)
  ↓
Database (store extracted info)

File Processing Best Practices

Security

  1. Validate file types - check MIME type, not just extension
  2. Scan for malware - use third-party scanning service before processing
  3. Set file size limits - prevent DoS via huge uploads
  4. Use encryption for sensitive files

Performance

  1. Process in background - don't block waiting for file operations
  2. Optimize images - compress before storing (saves 60-80% space)
  3. Use cloud storage - S3/GCS for large files instead of local disk
  4. Cache PDFs - don't regenerate identical PDFs

Reliability

  1. Add retry logic - file operations can fail (disk full, permissions)
  2. Validate output - check file was created/readable
  3. Log operations - track what files were created/modified
  4. Clean up temp files - don't leave processed files lying around

Next Steps