⚗️ 32dots Learn ist ein experimenteller Prototyp — Inhalte und Funktionen ändern sich kurzfristig.
Karte 02 · Kapitel getting-started

Voice to Lab AI via Mattermost

n8n easy 90 min
🟢 USE — Run first
0 - 15 min

Talk to the lab — get an answer in the chat

Today the AI lives where you already talk. Mattermost (#lab-ai) is the chat surface where you post the question and read the reply; n8n is where the workflow that powers the bot lives. You'll try the bot in Mattermost first, then open the workflow in n8n to see how Whisper + the AI Agent are wired together.

  1. Open https://mattermost.32dots.de in a second tab, log in with the same email + cos2026, and join the channel #lab-ai if you are not already in it.
  2. Type any lab question in #lab-ai, e.g. 'What is PCR in two sentences?' and send it.
  3. Wait 2–5 seconds. The lab-bot replies in the channel with your question quoted and the answer below.
  4. On your phone or laptop: tap the microphone icon in Mattermost's message box, record a question out loud, send the voice clip.
  5. Watch the bot transcribe your speech and answer the transcribed question.
  6. Now click 'Copy credentials + open workflow in n8n' above — that's where the Mattermost webhook, Whisper call, AI Agent, Groq model, Simple Memory, and Post-to-Mattermost step are all wired together. You'll adapt it in the Build phase.
Done-Signal: You see a reply from lab-bot with '> **Input (text):**' or '> **Input (voice):**' followed by an answer — for both a typed and a spoken question.
🔵 UNDERSTAND — Look inside
15 - 60 min

One agent, two input modes — voice and text through a single pipeline

Last session you met the AI Agent: prompt, model, memory, tools. Today the same anatomy shows up again, but now humans talk to it through Mattermost and can speak instead of type. Whisper sits in front of the Agent and turns voice into text; the Agent answers; a post-process step drops the reply back into the channel. Same Agent shape, wider surface.

📥 Mattermost Webhook
Trigger
An outgoing webhook in Mattermost fires on every new post in #lab-ai. It POSTs user_id, text, channel_id, and any attached file_ids to n8n. A text-only post arrives with text filled in and file_ids empty. A voice note arrives with text empty and file_ids containing the audio file.
🎙️ Whisper STT (Groq)
Pre-process (voice path)
Before the Agent sees anything, voice has to become text. The workflow fetches the audio file from Mattermost (bot token) and POSTs it to Groq's whisper-large-v3 transcription endpoint. Groq returns the transcript. Text-only posts skip this box entirely. Either way, by the time the Agent is called, the input is just a string.
🧠 AI Agent
Orchestrator
Exact same node type as Card 01 — but now the 'text' input is the transcript from Whisper (for voice) or the typed message (for text). System prompt tells it it is the COS Lab AI: concise, infer intent charitably when the input is messy transcribed speech, refuse clinical advice, reply in the user's language.
⚡ Groq gpt-oss-120b
Chat Model
Same model as Card 01, plugged into the Agent's Model port. OpenAI open-weights, 117B total / 5.1B active via MoE, Apache 2.0. Temperature 0.3 for consistent lab answers. Shares the Groq API key with every student — no one needs their own.
🧵 Simple Memory (per channel)
Short-term memory
Same node as Card 01, but with one crucial change: the sessionKey is set to the Mattermost channel_id, not the default. Every channel gets its own rolling window of the last 8 turns. Two students in the same channel share memory; two students in different channels do not — so everyone can run their own thread without collisions.
📤 Post to Mattermost
Post-process (reply)
The Agent's .output field is stitched into a markdown reply that quotes your original input (so you can see what Whisper heard). An HTTP POST to Mattermost's /api/v4/posts with the bot token drops that reply back into the same channel. Everything the user sees happens inside Mattermost.
🔀 Multimodal input — why pre-process matters
Concept
gpt-oss-120b is text-only. Voice, images, PDFs, audio, video all have to be turned into text (or some other representation the model accepts) BEFORE the Agent. That's what pre-processing is. Add a vision model to read images, an OCR for PDFs, a transcriber for audio — the Agent in the middle stays the same. This is why we draw the pre-process box separately: changing it does not change the Agent.
🗝️ Memory keyed by channel
Design principle
Memory defaults to a single global session — fine for one-user demos, wrong for shared chat. Always key the session by whatever scope the conversation is in: Mattermost channel_id here, user_id for a 1:1 bot, thread_id for a Slack thread. Forgetting to set this is the most common cause of 'the bot thinks I am someone else'.
The Agent is the same shape you learned in Card 01. What changes between sessions is the plumbing around it: where input comes from (chat box → Mattermost channel), how input is prepared (typed → transcribed), where output lands (chat pane → channel post). Swap voice for images and you only touch pre-process. Swap Mattermost for Slack and you only touch trigger + post-process. A good Agent is portable because you isolated its context from the IO.

Probe-Fragen

  • Open the AI Agent node. Read its system prompt. Find the sentence that specifically addresses voice input. Why is that sentence there — and what would break without it?
  • Open the Simple Memory node. The sessionKey is set to the Mattermost channel_id. What would break if you left it as the default — and what is the correct key for a 1:1 DM bot instead of a channel bot?
  • Open the Whisper HTTP Request node. It POSTs to Groq's audio/transcriptions endpoint, while the Agent uses Groq's chat/completions. Why are these two different endpoints even though both are hosted by Groq?
  • Now imagine you need to accept PDFs too. Where in this workflow would the PDF-reading step go — and which existing nodes would you leave completely untouched?
🟠 BUILD — Make it yours
60 - 90 min

Extend the agent — personality, memory scope, one new capability

You will duplicate the workflow and make three changes: (1) rewrite the system prompt so the bot speaks in your voice, (2) verify memory is channel-scoped by running a two-turn conversation, and (3) add one small capability — pick a filter, a new pre-process step, or a new system-prompt rule.

Aufgabe: Duplicate the shared workflow. Personalise the Agent. Prove memory works. Add one small upgrade of your choice (details in steps).

  1. In n8n, open 'cos2-voice-to-lab-ai'. Click ⋯ → Duplicate. Work in your copy only.
  2. Open the AI Agent node. Rewrite the system prompt using the four-part anatomy from Card 01 (role / constraints / style / output format). Add one constraint specific to your field, e.g. 'Always mention at least one relevant technique when discussing protocols.' Keep under 150 words.
  3. Save. Open #lab-ai in Mattermost. Post a question (text or voice — voice comes from the microphone icon in the message bar). Wait for the reply. Then post a follow-up that only makes sense if the bot remembers turn 1 (e.g. 'What about the second step you mentioned?'). Confirm it answers without you re-explaining.
  4. Now post that same follow-up in a DIFFERENT Mattermost channel. The bot should have NO memory of the previous conversation there — that proves memory is correctly scoped to channel_id.
  5. Pick ONE small upgrade: (a) add an IF node in front of the Agent that drops messages containing a forbidden word of your choice; or (b) add a second pre-process step, e.g. detect the language of the input and pass it as context to the Agent; or (c) change the system prompt so the bot always ends its reply with a specific structured footer (⚠ Uncertainty / 📚 Suggested reading / 🔗 Related concept).
  6. Send five questions. Confirm your upgrade fires on every one.
Deliverable: Screenshot of: (1) your Agent system prompt, (2) the two-turn memory test in one channel, (3) the empty-memory proof from the second channel, (4) three replies showing your chosen upgrade in action.
✓ SELF-CHECK

Hast du das verstanden?

  • I posted both a typed and a spoken question in Mattermost and got replies from the Lab AI.
  • I can point at the Agent's three input ports in the diagram and name which sub-node plugs into each.
  • I can explain why Whisper is a separate pre-process step and not a tool of the Agent.
  • I verified memory is scoped to the Mattermost channel by testing across two different channels.
  • I duplicated the workflow, rewrote the system prompt in four parts, and added one small upgrade.
  • I can list one real-world change to this workflow (new input type, new output destination) and name which nodes I would touch.
🔗 LIVE-DEMO

Direkt ausprobieren

This session integrated three services. Real labs run on ten or more. If your PI asked you to add Slack support alongside Mattermost tomorrow, what is the smallest change to this workflow that would make it work? What is the biggest risk?
💬 KI-TUTOR

Frag den Tutor zu dieser Karte

Sokratisch: der Tutor antwortet mit Leitfragen statt fertigen Antworten — du erarbeitest die Lösung selbst.

Stell eine erste Frage zu dieser Karte unten.