⚗️ 32dots Learn ist ein experimenteller Prototyp — Inhalte und Funktionen ändern sich kurzfristig.
Karte 08 · Kapitel tools

Multi-stage literature pipeline

n8n medium 90 min
🟢 USE — Run first
0 - 15 min

Run a five-stage literature pipeline — query to comparison table

This pipeline takes a research question and returns a formatted Markdown comparison table of papers — title, year, method, key finding, and limitation. Fully automated from query to structured output.

  1. Go to Downloads (curriculum.32dots.de/share) and download 'Session 8 — Multi-stage literature pipeline'.
  2. In n8n: ⋯ → Import from file. Open the chat panel.
  3. Type: 'mTOR inhibitor resistance mechanisms in breast cancer'.
  4. Wait — the pipeline runs 5 stages (watch the execution log on the right as each node lights up).
  5. Read the Markdown table in the response. Check: does the AI correctly identify methods and limitations?
  6. Run again with your own research topic.
  7. Click into the 'Stage 3 — AI Extract' node in the execution log. Read the raw JSON it returned.
Done-Signal: You see a comparison table with at least 3-4 papers. You can identify which stage is Stage 3 (AI extraction) and describe what it does.
🔵 UNDERSTAND — Look inside
15 - 60 min

Five-stage pipeline design

Open the canvas. Each stage has a single, testable responsibility. A well-designed pipeline makes it obvious where something went wrong — you can run any stage in isolation to debug it.

🔎 Stage 1 — Search
Code node + PubMed esearch
Converts your question to a URL-encoded PubMed query. Retrieves up to 8 PMIDs. Returns the first 5 for processing (to keep the AI context manageable).
📥 Stage 2 — Fetch
Code node + PubMed efetch
Builds the efetch URL from the IDs. Retrieves full plain-text abstracts. The response can be several thousand characters — truncated to 10,000 for the AI context window.
🧩 Stage 3 — AI Extract
AI Agent (typeVersion 1.7) — structured extraction
System prompt instructs the AI to extract 4 fields per paper: title, year, method, key finding, limitation. Returns a JSON array. Temperature 0.2 for consistent structured output.
📊 Stage 4+5 — Filter and Format
Code node — JSON parsing, filtering, Markdown formatting
Parses the AI's JSON output. Stage 4: filters out papers where title or finding is null (extraction failed). Stage 5: formats surviving records as a Markdown table with 5 columns.
Keep each stage's responsibility obvious and testable. You can copy one abstract into the Stage 3 AI Agent and run it alone to check extraction quality. Separation of concerns is not just good engineering — it is good scientific workflow design.

Probe-Fragen

  • What happens if the AI returns slightly malformed JSON in Stage 3? Open the Code node and find where this is handled.
  • The extraction prompt asks for 5 fields. What happens if an abstract does not mention methodology? Is the result filtered out?
  • How would you extend Stage 5 to also produce a BibTeX citation file alongside the Markdown table?
🟠 BUILD — Make it yours
60 - 90 min

Add a sixth stage: citation counts

Extend the pipeline to retrieve citation counts from Semantic Scholar for each paper.

Aufgabe: After Stage 2 (Fetch), add a Semantic Scholar API call that retrieves citation counts, then incorporate them into the Stage 5 table.

  1. After Stage 2 — Fetch (PubMed efetch), add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount — start with one PMID ($('Stage 2 — Fetch').first().json.ids.split(',')[0]).
  2. Add a Set node that extracts citationCount and passes it forward alongside the abstracts.
  3. Update the Stage 3 AI Extract system prompt: add a 'citations' field to the requested JSON (pass the count as context).
  4. Update the Stage 4+5 Code node to include a Citations column in the Markdown table.
  5. Test with a well-known paper. Does the count match Google Scholar?
  6. Test with a paper from 2024. What happens when citation data is not yet available?
Deliverable: Screenshot of a comparison table with a Citations column, plus a one-sentence note on what happened with the newest paper.
✓ SELF-CHECK

Hast du das verstanden?

  • I can describe all five stages and the single responsibility of each.
  • I understand how the Code node handles malformed AI JSON output (Stage 4+5).
  • I added a sixth API call and integrated its data into the final table.
Your pipeline runs in about 60 seconds for 5 papers. How would you adapt it to run nightly on a saved PubMed search and send you a Mattermost message when new papers appear — without triggering it manually?
💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.