N8N使用本地部署的firecrawl抓取网页

Docker 环境

services:
  n8n:
    image: n8nio/n8n
    container_name: n8n
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=your_password
      - N8N_HOST=localhost
    volumes:
      - D:\Software\n8n_firecrawl\n8n_data:/home/node/.n8n
      - D:\Software\n8n_firecrawl\LocalFile:/LocalFile
    networks:
      - n8n_network

  firecrawl:
    image: ghcr.io/firecrawl/firecrawl:latest
    container_name: firecrawl
    restart: unless-stopped
    depends_on:
      - redis
      - nuq-postgres
      - playwright-service
    ports:
      - "3000:3002"
    environment:
      - HOST=0.0.0.0
      - PORT=3002
      - REDIS_URL=redis://redis:6379
      - REDIS_RATE_LIMIT_URL=redis://redis:6379
      - PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape
      - NUQ_DATABASE_URL=postgres://postgres:postgres@host.docker.internal:5432/postgres
      - USE_DB_AUTHENTICATION=false
      - OPENAI_API_KEY=123456789
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - n8n_network

  redis:
    image: redis:7-alpine
    container_name: redis
    restart: unless-stopped
    networks:
      - n8n_network

  nuq-postgres:
    image: ghcr.io/firecrawl/nuq-postgres:latest
    container_name: nuq-postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=postgres
    ports:
      - "5432:5432"
    networks:
      - n8n_network

  playwright-service:
    image: ghcr.io/firecrawl/playwright-service:latest
    container_name: playwright-service
    restart: unless-stopped
    environment:
      - PORT=3000
    ports:
      - "3001:3000"
    networks:
      - n8n_network

volumes:
  n8n_data:

networks:
  n8n_network:
    driver: bridge

n8n 工作流

HTTP Request 节点

  • URL: http://firecrawl:3002/v1/scrape (容器内访问用 3002)
  • Method: POST
  • Headers: Content-Type: application/json
  • Body (JSON): {"url":"https://mechdoglab.cn/note/539.html","formats":["markdown"]}
  • Response: 选择 JSON ,后续从 data.markdown 读取内容

Function 节点(将 markdown 转二进制)

const md = $json.data?.markdown || ''; const buff = Buffer.from(md, 'utf8'); return [{ binary: { data: { data: buff, fileName: '539.md', mimeType: 'text/markdown' } } }];

Write Binary File 节点

- Property: data
- 路径: /LocalFile/firecrawl/539.md (确保目录存在)

配置步骤

准备目录

  • 在主机创建 D:\Software\n8n_firecrawl\LocalFile\firecrawl (若不存在)。

新建工作流

  • 添加节点 Manual Trigger
  • 添加 HTTP Request
    • ethod : POST
    • URL : http://firecrawl:3002/v1/scrape (容器内访问端口 3002)
    • Response : 选择 JSON
    • 勾选 Send Body as JSON 或 jsonParameters: true
    • Body 填入: {"url":"https://mechdoglab.cn/note/539.html","formats":["markdown"]}
  • 添加 Function 节点,代码:
    • const md = $json.data?.markdown || ''; const buff = Buffer.from(md, 'utf8'); return [{ binary: { data: { data: buff, fileName: '539.md', mimeType: 'text/markdown' } } }];
  • 添加 Write Binary File :
    • Binary Property : data
    • File Path : /LocalFile/firecrawl/539.md

工作流下载

链接: https://pan.baidu.com/s/1N80f1lMoiBulhDty22mH8w?pwd=w1qm

提取码: w1qm