GitHub Repository: bauer-jan/whatsapp-agent

James is a lightweight AI assistant that runs on your device. You chat with it through a WhatsApp self-chat. It connects via WhatsApp Web using a QR code. James can schedule tasks and access other systems through MCP, including email, the internet, services, and databases.

Similar to OpenClaw, James includes structured system files such as SOUL.md, USER.md, and HEARTBEAT.md to create an identity for both the user and the agent.

Built with Strands Agents and neonize (whatsmeow Python bindings).

MCP Servers

The agent can be extended with external tools by registering MCP servers in config.yaml. Tools are discovered automatically at startup and injected into the existing permission pipeline alongside native tools.

mcp_servers:
  - name: "filesystem"
    transport: "stdio"
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    role: "admin"

  - name: "weather"
    transport: "http"
    url: "http://localhost:3001/mcp"
    role: "public"

Each server entry requires a unique name and a transport type — either stdio for local processes or http for remote servers. For stdio servers you specify a command (and optionally args and env), while http servers need a url. The optional role field defaults to admin; tools from admin servers are only available to the admin user, while tools from public servers are available to everyone — same rules as native tools.

If a server fails to start, the agent logs the error and continues with the remaining servers. No MCP servers configured? The agent behaves exactly as before.

Configuration

admin_phone: "5511999999999"    # Your phone number (digits only, include country prefix, e.g. 49XXXX for Germany)
response_mode: "whitelist"      # all | admin_only | whitelist
whitelist:                      # phone numbers or group JIDs
  # - "5522888888888"
  # - "120363001234567890@g.us"
poll_interval: 5.0
persona_dir: "persona/"
session_storage_dir: "sessions/"
log_level: "INFO"
log_file: "agent.log"

The response_mode controls who gets a reply: all replies to everyone (not recommended — any number triggers LLM calls), admin_only restricts replies to just you, and whitelist opens it up to you plus listed numbers and groups.

Design Decisions

Session-per-number isolation — Each phone number gets its own Strands agent instance with separate conversation history via FileSessionManager. No cross-contamination between conversations.

Agent decides when to respond — The agent receives messages as context and explicitly calls reply or write_message tools to send. If it has nothing to say, it stays silent. No auto-forwarding.

It becomes someone — On first run, the agent starts a conversation with you (BOOTSTRAP.md) to figure out its name, personality, and vibe. Then it writes its own SOUL.md. From that point on, it has a persistent identity.

Filesystem as Persona — The agent’s identity and behavior are plain markdown files on disk. No database. The agent reads them on every message and can update them at runtime.

Per-task heartbeat scheduling — Background tasks defined in HEARTBEAT.md run on individual intervals ([every N min] syntax). The agent can check in or do anything else autonomously while nobody’s talking to it.

Tools

Tools are injected based on who’s talking to the agent via ToolManager.

The admin gets all tools: reply to respond to the current conversation (with baked-in target via closure), write_message to send to any phone number or group, lookup_contact to search contacts and groups by name or number, and update_soul / update_user_profile / update_heartbeat to edit persona files.

Public users get a reduced set: reply to respond to the current conversation, and read_message to acknowledge conversation context. The agent cannot call admin tools in a public session — ToolManager resolves the sender’s role and injects only the permitted set.