本文研究一下基于 n8n (https://docs.n8n.io/) 搭建一个生产级别的知识库问答系统（通常称为 RAG - Retrieval Augmented Generation）。这个过程需要考虑数据的持续导入（Ingestion）、向量化质量、检索准确度以及会话记忆。官方参考教程见这里。

为了实现生产级别，需要将系统拆分为两个独立的核心工作流（Workflows）：

数据入库工作流 (ETL Pipeline): 负责将文档（PDF, 文本, 网页）读取、清洗、分块（Chunking）、向量化并存入向量数据库。
问答检索工作流 (QA Chat Pipeline): 负责接收用户问题，在向量库中检索相关片段，结合上下文记忆，通过 LLM 生成回答。

准备工作

在开始之前，需要准备向量数据库: 生产环境推荐使用 Pinecone (云原生，易维护) 或 Qdrant (性能强，可自托管)或。本教程以 Pinecone 为例：在 Pinecone 控制台创建一个 Index，Dimension 设置为 1536 (对应 OpenAI text-embedding-3-small/large)。

第一步：搭建数据入库工作流

这个工作流的目标是将知识（如公司手册、技术文档）变成机器可读的向量。

核心节点逻辑：

Trigger: 手动触发或 Webhook 触发（例如当文件上传到 Google Drive 时）。
File Loader: 读取文件（使用 Default Data Loader 或特定的 PDF Loader）。
Text Splitter: 关键步骤。生产环境不能把整本书塞进去。使用 Recursive Character Text Splitter。
- Chunk Size: 建议 500-1000 tokens。
- Chunk Overlap: 建议 100-200 tokens（防止上下文在切分处丢失）。
Embeddings: 使用 OpenAI Embeddings 节点。
Vector Store: 使用 Pinecone 节点，操作选择 “Upsert” (插入/更新)。

工作流 1 JSON (数据入库)

可以将以下 JSON 代码复制并粘贴到 n8n 编辑器中：

{
  "name": "My workflow 9",
  "nodes": [
    {
      "parameters": {
        "mode": "insert",
        "pineconeIndex": {
          "__rl": true,
          "value": "demo-index",
          "mode": "list",
          "cachedResultName": "demo-index"
        },
        "options": {}
      },
      "id": "142a7a1a-7edc-41ab-8761-a8f7a66cfedb",
      "name": "Pinecone Vector Store",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "typeVersion": 1,
      "position": [
        288,
        128
      ],
      "credentials": {
        "pineconeApi": {
          "id": "CmVFOAd6rUcWi5p5",
          "name": "PineconeApi account"
        }
      }
    },
    {
      "parameters": {
        "model": "text-embedding-3-small",
        "options": {}
      },
      "id": "13493271-4db9-4789-82c3-ab7de39e4aef",
      "name": "OpenAI Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "typeVersion": 1,
      "position": [
        368,
        304
      ],
      "credentials": {
        "openAiApi": {
          "id": "xfOE2dYd3ERD65Dr",
          "name": "OpenAi account"
        }
      }
    },
    {
      "parameters": {
        "dataType": "binary",
        "options": {}
      },
      "id": "4ae3651c-49d1-48ae-9063-814ff8b03ccc",
      "name": "Default Data Loader",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "typeVersion": 1,
      "position": [
        608,
        240
      ]
    },
    {
      "parameters": {
        "chunkOverlap": 200,
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
      "typeVersion": 1,
      "position": [
        720,
        352
      ],
      "id": "2f39656b-2c2c-419b-a218-9bcb9db198d7",
      "name": "Recursive Character Text Splitter"
    },
    {
      "parameters": {
        "formTitle": "Upload",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Upload",
              "fieldType": "file"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.formTrigger",
      "typeVersion": 2.3,
      "position": [
        -16,
        144
      ],
      "id": "7a1baaca-e371-4877-896f-4d0cadbea6ae",
      "name": "On form submission",
      "webhookId": "5715f7b8-b54c-4fbd-a5ad-1821ab1d2d12"
    }
  ],
  "pinData": {},
  "connections": {
    "OpenAI Embeddings": {
      "ai_embedding": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Default Data Loader": {
      "ai_document": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "Recursive Character Text Splitter": {
      "ai_textSplitter": [
        [
          {
            "node": "Default Data Loader",
            "type": "ai_textSplitter",
            "index": 0
          }
        ]
      ]
    },
    "On form submission": {
      "main": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "c4f2b8c5-9746-4d4f-91cd-741aa8a74c30",
  "meta": {
    "templateCredsSetupCompleted": true,
    "instanceId": "24e8ce0d76b3b71be4393c3e9c1e6f0c2fe2f9b84660c2d47973ae9ce115bd19"
  },
  "id": "XcGAC5iFRXoNBH8a",
  "tags": []
}

第二步：搭建问答检索工作流

这是用户交互的核心。在生产环境中，需要处理“内存”（记住之前的对话）和“系统提示词”（控制 AI 的行为）。

系统提示词:

在 AI agent 的参数中，务必设置 System Message，例如：

“你是一个专业的企业知识库助手。请仅根据以下提供的上下文内容回答用户的问题。如果上下文中没有答案，请直接回答‘我无法在知识库中找到相关信息’，不要编造答案。”

工作流 2 JSON (知识库问答)

{
  "name": "My workflow 10",
  "nodes": [
    {
      "parameters": {
        "options": {}
      },
      "id": "63f18a05-7a6c-49b6-9e32-3005d23311f7",
      "name": "When chat message received",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "typeVersion": 1.1,
      "position": [
        -448,
        48
      ],
      "webhookId": "your-webhook-id"
    },
    {
      "parameters": {
        "model": "gpt-4o",
        "options": {
          "temperature": 0.1
        }
      },
      "id": "341fe7a9-e97d-493d-a8e9-94008bd51d07",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "typeVersion": 1,
      "position": [
        -272,
        288
      ],
      "credentials": {
        "openAiApi": {
          "id": "xfOE2dYd3ERD65Dr",
          "name": "OpenAi account"
        }
      }
    },
    {
      "parameters": {
        "model": "text-embedding-3-small",
        "options": {}
      },
      "id": "db53abd0-dfb2-4deb-8f5c-381c1829637b",
      "name": "OpenAI Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "typeVersion": 1,
      "position": [
        -16,
        416
      ],
      "credentials": {
        "openAiApi": {
          "id": "xfOE2dYd3ERD65Dr",
          "name": "OpenAi account"
        }
      }
    },
    {
      "parameters": {
        "options": {
          "systemMessage": "你是一个专业的企业知识库助手。请从知识库中获取信息回答用户的问题。如果知识库中没有答案，请直接回答‘我无法在知识库中找到相关信息’，不要编造答案。"
        }
      },
      "type": "@n8n/n8n-nodes-langchain.agent",
      "typeVersion": 2.2,
      "position": [
        -208,
        80
      ],
      "id": "813598a1-f093-4b04-b757-117ace3bc014",
      "name": "AI Agent"
    },
    {
      "parameters": {
        "mode": "retrieve-as-tool",
        "toolDescription": "work with your data in Pinecone vector store",
        "pineconeIndex": {
          "__rl": true,
          "value": "demo-index",
          "mode": "list",
          "cachedResultName": "demo-index"
        },
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "typeVersion": 1.3,
      "position": [
        32,
        224
      ],
      "id": "e23d5f78-5d1f-4f03-9e86-c5ca6035bfc6",
      "name": "Pinecone Vector Store",
      "credentials": {
        "pineconeApi": {
          "id": "CmVFOAd6rUcWi5p5",
          "name": "PineconeApi account"
        }
      }
    },
    {
      "parameters": {},
      "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
      "typeVersion": 1.3,
      "position": [
        -160,
        336
      ],
      "id": "7a7b4353-3c45-4a79-9944-6353f62c1b04",
      "name": "Simple Memory"
    }
  ],
  "pinData": {},
  "connections": {
    "OpenAI Embeddings": {
      "ai_embedding": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "AI Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "When chat message received": {
      "main": [
        [
          {
            "node": "AI Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Pinecone Vector Store": {
      "ai_tool": [
        [
          {
            "node": "AI Agent",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "Simple Memory": {
      "ai_memory": [
        [
          {
            "node": "AI Agent",
            "type": "ai_memory",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "0caef26b-41ef-4244-86e2-15ea3ba34853",
  "meta": {
    "templateCredsSetupCompleted": true,
    "instanceId": "24e8ce0d76b3b71be4393c3e9c1e6f0c2fe2f9b84660c2d47973ae9ce115bd19"
  },
  "id": "b8PK3JswxoacMWk5",
  "tags": []
}

第三步：生产环境的关键配置

要达到“生产级别”，仅仅跑通是不够的，必须注意以下几点：

元数据过滤 (Metadata Filtering):
- 场景: 你的知识库包含“销售部”和“技术部”的文档。
- 做法: 在入库工作流中，加载文档时，给每个文档打上 Tag (如 { "department": "sales" })。
- 检索: 在 QA 工作流 的 Pinecone 节点中，开启 Metadata Filter，根据用户身份只检索对应部门的文档，防止权限泄露。
Top K 与阈值 (Top K & Threshold):
- 在检索节点的配置中，Top K 决定了抓取多少个片段给 AI。通常设置为 3-5。
- 如果支持，设置相似度阈值（Score Threshold）。如果检索出的内容相似度低于 0.7，则认为无关，直接不给 AI，避免 AI 强行胡说。
错误处理 (Error Handling):
- 在 n8n 中，为关键节点（如 OpenAI 请求、数据库请求）设置 “Error Output” 或者在工作流设置中配置 “Error Workflow”，以便在 API 挂掉时通知管理员，而不是让前端一直转圈。
LLM 参数微调:
- Temperature: 设为 0 或 0.1。知识库问答要求严谨，不需要 AI 发挥创造力。
文档更新策略:
- 考虑如何更新文档。最简单的生产策略是：当文档更新时，先根据文档 ID 在 Pinecone 中删除旧的向量，然后再插入新的向量，防止重复。

数字旗手

使用n8n搭建生产级别的知识库问答系统