Your own literature review pipeline
Run a literature review on a topic you actually care about
You have already seen this pipeline in Session 8 — esearch → efetch → AI extract → filter → format. Today you run it for real, against a question from your own field, and judge whether the output is good enough to trust.
- Go to Downloads (curriculum.32dots.de/share) and re-import 'Session 8 — Multi-stage literature pipeline' if it's not already in your n8n.
- Pick a narrow scientific question from your own work — e.g. 'CRISPR screens in primary T cells', 'gut microbiome and Parkinson's', 'single-cell RNA-seq of glioblastoma'. Specific beats broad.
- Open the first node and change the PubMed query to your question. Keep retmax at 8 for the first run.
- Execute the workflow. Watch the structured extraction column-by-column: method, sample size, key finding, limitation.
- Read three of the rows critically. Find at least one place where the AI got the method wrong, missed a limitation, or invented a number.
Why this pipeline beats a single prompt — and where it still fails
The five-stage shape is not decorative. Each stage exists because doing the whole thing in one LLM call produces hallucinated PMIDs, dropped limitations, and inconsistent JSON. Look at each stage and ask: what would break if I removed it?
Probe-Fragen
- If you remove the Filter node, what kind of garbage shows up in the table?
- Why is the extraction step the only AI step? What goes wrong if you let the AI also choose which papers to include?
- Where would you add a second LLM call — and what would justify the extra cost?
Tighten the pipeline for your domain
A generic pipeline is a demo. A pipeline tuned to your field is a tool. Pick one of these extensions and ship it.
Aufgabe: Choose ONE concrete extension and implement it end-to-end. Keep the change small — one node added or modified — and test that it works on at least three papers.
- Option A — Add a domain filter. In the Filter node, additionally drop rows where method does not contain one of your domain's keywords (e.g. 'qPCR', 'flow cytometry', 'scRNA-seq', 'mass spec'). Aim for precision over recall.
- Option B — Add a 'sample size' threshold. Drop rows where sample size is below a number that makes sense for your field (n<10 for cell-line work, n<50 for clinical observational, etc.). Decide the cutoff before you look at the results.
- Option C — Add a second extraction field. Pick one: 'cell line / organism', 'statistical test', 'effect size', 'control group description'. Update the JSON schema in the AI Extract node and the table header in the Format node.
- Option D — Replace the AI Extract prompt with a stricter version that refuses to answer when the abstract does not contain the requested field — and verify it actually refuses on a few abstracts where the field is genuinely absent.
- Run the modified pipeline on the same query you used in the USE phase. Compare the before/after tables side by side.
Hast du das verstanden?
- I ran the pipeline against a question from my own field, not the demo one.
- I found at least one extraction error and can describe what went wrong.
- I can explain why search and filter are deterministic and extraction is the only AI step.
- I implemented one extension and verified it changes the output on three papers.
Frag den Tutor zu dieser Karte
Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.
Stell eine erste Frage zu dieser Karte unten.