Karte 05 · Kapitel tools
Pulling data from scientific databases
🟢 USE — Run first
0 - 15 min
Query PubMed with natural language — no credentials needed
PubMed's API is free and needs no authentication. You will import a workflow that takes your research question, searches PubMed, and returns summarised paper abstracts.
- Go to Downloads (curriculum.32dots.de/share) and download 'Session 5 — Pulling data from scientific databases'.
- In n8n: ⋯ → Import from file. Open the chat panel.
- Type: 'Find papers about CRISPR base editing from 2024'.
- Wait for the response — the workflow makes two API calls before the AI answers.
- Type: 'What are the main limitations mentioned in those papers?'
- Type: 'Find me papers about mRNA vaccine immunogenicity'. Note: a new search runs.
- Look at the execution log on the right — click each node to see its output at each stage.
Done-Signal: You see AI-summarised paper lists for both queries. You can explain what each node in the execution log produced.
🔵 UNDERSTAND — Look inside
15 - 60 min
The two-step PubMed API pattern
PubMed uses a two-endpoint pattern: search for IDs first, then fetch content. This is the same pattern used by most scientific databases — UniProt, ChEMBL, Semantic Scholar all work this way.
🔧 Build Search URL
Code node
Takes your chatInput and builds the full esearch URL with URL-encoded query. JavaScript: encodeURIComponent(q). Returns the URL and original search term as separate fields.
🔎 Search PubMed (esearch)
HTTP Request
Calls PubMed's esearch endpoint. Returns a JSON object with idlist — a list of PMIDs matching your query. No abstracts yet, just identifiers.
⚙️ Extract IDs
Code node
Parses the esearch JSON, takes the idlist array, joins IDs with commas, and builds the efetch URL. If no IDs found, returns an empty placeholder.
📄 Fetch Abstracts (efetch)
HTTP Request
Calls efetch with rettype=abstract&retmode=text. Returns all abstracts as a single plain-text blob. Configured to receive text response, not JSON.
🧠 AI Research Assistant
AI Agent (typeVersion 1.7)
Receives the abstract text (truncated to 8,000 chars) and the user question. System prompt: answer from the abstracts, list key papers and findings.
🧵 Simple Memory
Session memory
Correct session key: $('When chat message received').first().json.sessionId — required here because Build Search URL sits between the trigger and the agent.
Scientific databases are not magic — they are REST APIs returning structured data. The pattern is always: search → get IDs → fetch details → extract fields. Once you know this pattern, connecting to UniProt, ChEMBL, or Semantic Scholar is identical.
Probe-Fragen
- What is the NCBI rate limit for unauthenticated API calls? Where in the workflow would you add a delay to avoid hitting it?
- Change retmax=5 to retmax=20 in the Build Search URL Code node. What happens to response quality vs. cost?
- UniProt also has a REST API. What would the esearch-equivalent URL look like to find all human proteins involved in apoptosis?
🟠 BUILD — Make it yours
60 - 90 min
Extend to a second database
Add a second HTTP call to combine PubMed with another scientific database.
Aufgabe: After the AI answer, add a Semantic Scholar API call to retrieve citation counts for the top papers, and include that data in the response.
- Note the PMIDs in the Extract IDs node output.
- After the Fetch Abstracts node, add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount
- For simplicity, do this for just the first PMID (use $json.ids.split(',')[0]).
- Add a Set node: citationCount = $json.citationCount, pmid = $json.paperId.
- Update the Prepare Context Set node to include the citation count.
- Update the AI system prompt to mention citation counts in the summary.
- Test: does the AI now include citation data in its response?
Deliverable: Screenshot of a workflow run that includes citation count data in the AI's response.
✓ SELF-CHECK
Hast du das verstanden?
- I can explain the two-step PubMed pattern: esearch for IDs, efetch for content.
- I understand why the session key requires the full $('When chat message received') expression.
- I connected to a second scientific API and included its data in the output.
Your workflow runs a fresh PubMed query on every message. What would you need to change to cache results — so repeat queries for the same topic don't burn rate limits?
💬 KI-TUTOR
Frag den Tutor zu dieser Karte
Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.
Stell eine erste Frage zu dieser Karte unten.