Karte 19 · Kapitel safety

Evaluating outputs

n8n medium 55 min

Inhalt

Evaluation means defining what a good answer looks like before trusting the system. Criteria may include factual support, completeness, clarity, structure, safety, and usefulness for the target audience. A "generator + reviewer" loop is the simplest eval pattern. A good rubric has orthogonal criteria â€” accuracy, completeness, support-by-source, tone â€” and each is scored independently. Summing them gives a noisy signal; looking at per-criterion failure modes gives a useful one.

Beispiel: A methods-summary is scored against the source on {accuracy, missing details, jargon level, supported-by-text}. If score < 3/4, the reviewer sends it back with feedback.

✓ SELF-CHECK

Hast du das verstanden?

[ ] Your rubric has â‰¥3 orthogonal criteria written BEFORE any output
[ ] Reviewer uses a different (stronger) model than the generator
[ ] The loop has a retry cap
[ ] 5 runs are logged with per-criterion scores
[ ] You can explain in one sentence what you learned that you would tell a labmate tomorrow

🔗 LIVE-DEMO

Direkt ausprobieren

Dify Chat-App öffnen ↗ n8n Workflow öffnen ↗

Diese Links öffnen die laufenden Demos auf n8n.32dots.de + dify.32dots.de.

For the assistant you'd most like to trust, what would the 4 rubric items be â€” and which one are you least sure how to measure?

💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.

Karte 20 →

Evaluating outputs

Inhalt

Hast du das verstanden?

Direkt ausprobieren

Frag den Tutor zu dieser Karte

Observability and logs