⚗️ 32dots Learn ist ein experimenteller Prototyp — Inhalte und Funktionen ändern sich kurzfristig.
Karte 19 · Kapitel safety

Evaluating outputs

n8n medium 55 min

Inhalt

Evaluation means defining what a good answer looks like before trusting the system. Criteria may include factual support, completeness, clarity, structure, safety, and usefulness for the target audience. A "generator + reviewer" loop is the simplest eval pattern. A good rubric has orthogonal criteria — accuracy, completeness, support-by-source, tone — and each is scored independently. Summing them gives a noisy signal; looking at per-criterion failure modes gives a useful one.

Beispiel: A methods-summary is scored against the source on {accuracy, missing details, jargon level, supported-by-text}. If score < 3/4, the reviewer sends it back with feedback.

✓ SELF-CHECK

Hast du das verstanden?

  • [ ] Your rubric has ≥3 orthogonal criteria written BEFORE any output
  • [ ] Reviewer uses a different (stronger) model than the generator
  • [ ] The loop has a retry cap
  • [ ] 5 runs are logged with per-criterion scores
  • [ ] You can explain in one sentence what you learned that you would tell a labmate tomorrow
🔗 LIVE-DEMO

Direkt ausprobieren

For the assistant you'd most like to trust, what would the 4 rubric items be — and which one are you least sure how to measure?
💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.