Karte 05 · Kapitel tools

Pulling data from scientific databases

n8n medium 90 min

🟢 USE — Run first

0 - 15 min

Query PubMed with natural language — no credentials needed

PubMed's API is free and needs no authentication. You will import a workflow that takes your research question, searches PubMed, and returns summarised paper abstracts.

Go to Downloads (curriculum.32dots.de/share) and download 'Session 5 — Pulling data from scientific databases'.
In n8n: ⋯ → Import from file. Open the chat panel.
Type: 'Find papers about CRISPR base editing from 2024'.
Wait for the response — the workflow makes two API calls before the AI answers.
Type: 'What are the main limitations mentioned in those papers?'
Type: 'Find me papers about mRNA vaccine immunogenicity'. Note: a new search runs.
Look at the execution log on the right — click each node to see its output at each stage.

Done-Signal: You see AI-summarised paper lists for both queries. You can explain what each node in the execution log produced.

🔵 UNDERSTAND — Look inside

15 - 60 min

The two-step PubMed API pattern

PubMed uses a two-endpoint pattern: search for IDs first, then fetch content. This is the same pattern used by most scientific databases — UniProt, ChEMBL, Semantic Scholar all work this way.

🔧 Build Search URL

Code node

Takes your chatInput and builds the full esearch URL with URL-encoded query. JavaScript: encodeURIComponent(q). Returns the URL and original search term as separate fields.

🔎 Search PubMed (esearch)

HTTP Request

Calls PubMed's esearch endpoint. Returns a JSON object with idlist — a list of PMIDs matching your query. No abstracts yet, just identifiers.

⚙️ Extract IDs

Code node

Parses the esearch JSON, takes the idlist array, joins IDs with commas, and builds the efetch URL. If no IDs found, returns an empty placeholder.

📄 Fetch Abstracts (efetch)

HTTP Request

Calls efetch with rettype=abstract&retmode=text. Returns all abstracts as a single plain-text blob. Configured to receive text response, not JSON.

🧠 AI Research Assistant

AI Agent (typeVersion 1.7)

Receives the abstract text (truncated to 8,000 chars) and the user question. System prompt: answer from the abstracts, list key papers and findings.

🧵 Simple Memory

Session memory

Correct session key: $('When chat message received').first().json.sessionId — required here because Build Search URL sits between the trigger and the agent.

Scientific databases are not magic — they are REST APIs returning structured data. The pattern is always: search → get IDs → fetch details → extract fields. Once you know this pattern, connecting to UniProt, ChEMBL, or Semantic Scholar is identical.

Probe-Fragen

What is the NCBI rate limit for unauthenticated API calls? Where in the workflow would you add a delay to avoid hitting it?
Change retmax=5 to retmax=20 in the Build Search URL Code node. What happens to response quality vs. cost?
UniProt also has a REST API. What would the esearch-equivalent URL look like to find all human proteins involved in apoptosis?

🟠 BUILD — Make it yours

60 - 90 min

Extend to a second database

Add a second HTTP call to combine PubMed with another scientific database.

Aufgabe: After the AI answer, add a Semantic Scholar API call to retrieve citation counts for the top papers, and include that data in the response.

Note the PMIDs in the Extract IDs node output.
After the Fetch Abstracts node, add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount
For simplicity, do this for just the first PMID (use $json.ids.split(',')[0]).
Add a Set node: citationCount = $json.citationCount, pmid = $json.paperId.
Update the Prepare Context Set node to include the citation count.
Update the AI system prompt to mention citation counts in the summary.
Test: does the AI now include citation data in its response?

Deliverable: Screenshot of a workflow run that includes citation count data in the AI's response.

✓ SELF-CHECK

Hast du das verstanden?

I can explain the two-step PubMed pattern: esearch for IDs, efetch for content.
I understand why the session key requires the full $('When chat message received') expression.
I connected to a second scientific API and included its data in the output.

Your workflow runs a fresh PubMed query on every message. What would you need to change to cache results — so repeat queries for the same topic don't burn rate limits?

💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.

Karte 06 →