Karte 08 · Kapitel tools

Multi-stage literature pipeline

n8n medium 90 min

🟢 USE — Run first

0 - 15 min

Run a five-stage literature pipeline — query to comparison table

This pipeline takes a research question and returns a formatted Markdown comparison table of papers — title, year, method, key finding, and limitation. Fully automated from query to structured output.

Go to Downloads (curriculum.32dots.de/share) and download 'Session 8 — Multi-stage literature pipeline'.
In n8n: ⋯ → Import from file. Open the chat panel.
Type: 'mTOR inhibitor resistance mechanisms in breast cancer'.
Wait — the pipeline runs 5 stages (watch the execution log on the right as each node lights up).
Read the Markdown table in the response. Check: does the AI correctly identify methods and limitations?
Run again with your own research topic.
Click into the 'Stage 3 — AI Extract' node in the execution log. Read the raw JSON it returned.

Done-Signal: You see a comparison table with at least 3-4 papers. You can identify which stage is Stage 3 (AI extraction) and describe what it does.

🔵 UNDERSTAND — Look inside

15 - 60 min

Five-stage pipeline design

Open the canvas. Each stage has a single, testable responsibility. A well-designed pipeline makes it obvious where something went wrong — you can run any stage in isolation to debug it.

🔎 Stage 1 — Search

Code node + PubMed esearch

Converts your question to a URL-encoded PubMed query. Retrieves up to 8 PMIDs. Returns the first 5 for processing (to keep the AI context manageable).

📥 Stage 2 — Fetch

Code node + PubMed efetch

Builds the efetch URL from the IDs. Retrieves full plain-text abstracts. The response can be several thousand characters — truncated to 10,000 for the AI context window.

🧩 Stage 3 — AI Extract

AI Agent (typeVersion 1.7) — structured extraction

System prompt instructs the AI to extract 4 fields per paper: title, year, method, key finding, limitation. Returns a JSON array. Temperature 0.2 for consistent structured output.

📊 Stage 4+5 — Filter and Format

Code node — JSON parsing, filtering, Markdown formatting

Parses the AI's JSON output. Stage 4: filters out papers where title or finding is null (extraction failed). Stage 5: formats surviving records as a Markdown table with 5 columns.

Keep each stage's responsibility obvious and testable. You can copy one abstract into the Stage 3 AI Agent and run it alone to check extraction quality. Separation of concerns is not just good engineering — it is good scientific workflow design.

Probe-Fragen

What happens if the AI returns slightly malformed JSON in Stage 3? Open the Code node and find where this is handled.
The extraction prompt asks for 5 fields. What happens if an abstract does not mention methodology? Is the result filtered out?
How would you extend Stage 5 to also produce a BibTeX citation file alongside the Markdown table?

🟠 BUILD — Make it yours

60 - 90 min

Add a sixth stage: citation counts

Extend the pipeline to retrieve citation counts from Semantic Scholar for each paper.

Aufgabe: After Stage 2 (Fetch), add a Semantic Scholar API call that retrieves citation counts, then incorporate them into the Stage 5 table.

After Stage 2 — Fetch (PubMed efetch), add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount — start with one PMID ($('Stage 2 — Fetch').first().json.ids.split(',')[0]).
Add a Set node that extracts citationCount and passes it forward alongside the abstracts.
Update the Stage 3 AI Extract system prompt: add a 'citations' field to the requested JSON (pass the count as context).
Update the Stage 4+5 Code node to include a Citations column in the Markdown table.
Test with a well-known paper. Does the count match Google Scholar?
Test with a paper from 2024. What happens when citation data is not yet available?

Deliverable: Screenshot of a comparison table with a Citations column, plus a one-sentence note on what happened with the newest paper.

✓ SELF-CHECK

Hast du das verstanden?

I can describe all five stages and the single responsibility of each.
I understand how the Code node handles malformed AI JSON output (Stage 4+5).
I added a sixth API call and integrated its data into the final table.

Your pipeline runs in about 60 seconds for 5 papers. How would you adapt it to run nightly on a saved PubMed search and send you a Mattermost message when new papers appear — without triggering it manually?

💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.

← Karte 07

Building a research assistant

Karte 09 →