Evaluating outputs
Inhalt
Evaluation means defining what a good answer looks like before trusting the system. Criteria may include factual support, completeness, clarity, structure, safety, and usefulness for the target audience. A "generator + reviewer" loop is the simplest eval pattern. A good rubric has orthogonal criteria — accuracy, completeness, support-by-source, tone — and each is scored independently. Summing them gives a noisy signal; looking at per-criterion failure modes gives a useful one.
Beispiel: A methods-summary is scored against the source on {accuracy, missing details, jargon level, supported-by-text}. If score < 3/4, the reviewer sends it back with feedback.
Hast du das verstanden?
- [ ] Your rubric has ≥3 orthogonal criteria written BEFORE any output
- [ ] Reviewer uses a different (stronger) model than the generator
- [ ] The loop has a retry cap
- [ ] 5 runs are logged with per-criterion scores
- [ ] You can explain in one sentence what you learned that you would tell a labmate tomorrow
Direkt ausprobieren
Diese Links öffnen die laufenden Demos auf n8n.32dots.de + dify.32dots.de.
Frag den Tutor zu dieser Karte
Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.
Stell eine erste Frage zu dieser Karte unten.