Building AI Teammates for Social Voice Games: A Practical Implementation Guide Using Goose Goose Duck

Introduction: Solving the Player Matching Challenge in Social Games

Social deduction games like Goose Goose Duck and Werewolf have captured millions of players worldwide, offering thrilling experiences of deception, deduction, and social interaction. However, these games face persistent challenges that limit player enjoyment: difficulty gathering enough participants, teammates going AFK during critical moments, and poor new player experiences due to skill gaps.

AI teammates present an elegant solution to these longstanding problems. By filling empty slots, providing practice partners, and lowering entry barriers, AI players enable high-quality gaming sessions anytime, anywhere. This comprehensive guide explores how to integrate AI teammates into Android-based social voice games using ZEGO AI Agent technology, with Goose Goose Duck serving as our primary example.

Technical Architecture: The Overlay Approach

Non-Invasive Integration Design

The AI Agent layer operates as an external overlay on top of existing game architecture, requiring no modifications to core game logic. This design philosophy ensures minimal disruption to existing codebases while enabling rapid AI integration.

┌─────────────────────────────────────────────────────────┐
│ Android Game Client                                      │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐     │
│ │ Game Logic  │ │ Game State  │ │ UI Presentation │     │
│ │ (Original)  │ │ Machine     │ │ (Roles/Voice)   │     │
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘     │
│        │                │                  │              │
│ ┌──────┴────────────────┴───────────────────┴──────┐     │
│ │ AI Agent Adaptation Layer (New)                  │     │
│ │ • State Sync Interface • Voice Data Forwarding   │     │
│ │ • AI Command Parsing                             │     │
│ └─────────────────────┬─────────────────────────────┘     │
└────────────────────────┼──────────────────────────────────┘
                         │
              ┌──────────┴──────────┐
              │ ZEGO AI Agent       │
              │ LLM + TTS + ASR     │
              └─────────────────────┘

This architecture separates concerns cleanly: the game handles gameplay mechanics while the AI Agent manages intelligent behavior, voice processing, and natural language interactions.

Core Module Breakdown

RTC Voice Module: Built on ZEGO Express SDK, this component enables real-time voice communication between human players and AI teammates. Critical features include:

ANS (Active Noise Suppression): Filters background noise for crystal-clear communication
AEC (Acoustic Echo Cancellation): Prevents audio feedback loops in multi-player scenarios
VAD (Voice Activity Detection): Identifies when participants are speaking, enabling natural conversation flow

AI Agent Module: Manages the complete intelligent agent lifecycle:

LLM (Large Language Model): Handles conversation understanding, reasoning decisions, and dialogue generation
TTS (Text-to-Speech): Converts AI responses into natural-sounding voice output
ASR (Automatic Speech Recognition): Transcribes player speech into text for LLM comprehension
State Synchronization: Injects game state information (rounds, roles, voting status) into AI context

SDK Integration Approach:

Android clients integrate ZEGO Express SDK (Android support confirmed)
AI Agent communicates via server-side API calls
Android communicates with business backend through HTTP/HTTPS
Business backend handles agent registration, instance creation, and game state forwarding

AI Teammate Functionality Breakdown

Character Configuration Through System Prompts

AI teammate personas are precisely controlled through carefully crafted SystemPrompts. Here's a comprehensive example for a "Duck" (antagonist) role in Goose Goose Duck:

public static String getDuckSystemPrompt(String playerName) {
    return String.format(
        "You are a Duck faction player in Goose Goose Duck, your name is %s.\n\n" +
        "[Character Identity]\n" +
        "- You are on the villain team, goal is to eliminate all Goose faction players\n" +
        "- You know other Ducks' identities (teammates), but don't expose them\n" +
        "- You need to disguise as a Goose to gain trust\n\n" +
        "[Personality Traits]\n" +
        "- Cunning, cautious, skilled at disguise\n" +
        "- Speech creates confusion, redirects suspicion to others\n" +
        "- When questioned, calmly defends,必要时 sacrifices teammates for self-preservation\n\n" +
        "[Speech Strategy]\n" +
        "- Opening phase: Observe primarily, occasionally agree with others\n" +
        "- Mid-game phase: Lead discussions, direct suspicion toward Goose or neutral players\n" +
        "- Late-game phase: If suspected, create 'I'm a white Goose' illusion\n\n" +
        "[Output Requirements]\n" +
        "- Keep each statement to 2-3 sentences\n" +
        "- Use conversational expressions, speak like real players\n" +
        "- Can say things like 'I think xx seems suspicious' 'I was doing task at xxx'\n" +
        "- Never say 'I am AI' or expose game mechanics",
        playerName
    );
}

Configuration Key Points:

Character Identity: Clearly defines faction, objectives, and information boundaries (AI doesn't know other players' true identities)
Personality Traits: Determines AI behavioral style (aggressive/conservative/cunning/honest)
Speech Strategy: Adjusts behavior patterns based on game phase (opening/mid-game/late-game)
Output Requirements: Controls statement length and language style to simulate human players

Speech Logic and Trigger Mechanisms

Trigger Timing:

Turn-Based Speaking: Game state machine detects when it's the AI player's turn, calls LLM interface proactively
Free Discussion Phase: After player statements, AI judges whether to interject based on context (using interruption mechanisms)
Emergency Events: AI speaks proactively when discovering bodies or triggering emergency tasks

Real-Time Context Injection:

public List<AIAgentMessage> buildContext(GameState state, String aiPlayerId) {
    List<AIAgentMessage> messages = new ArrayList<>();
    
    // 1. Inject system prompt (character setting)
    messages.add(new AIAgentMessage("system", getDuckSystemPrompt(aiPlayerId)));
    
    // 2. Inject game state (current round, alive players, etc.)
    String alivePlayers = String.join(",", state.getAlivePlayers());
    messages.add(new AIAgentMessage("user",
        String.format("[Game State] Current Round: %d, Alive Players: %s, Events After Your Last Statement: %s",
            state.getRound(), alivePlayers, state.getLastEvents())));
    
    // 3. Inject historical speech records (recent N messages)
    for (ChatMessage chat : state.getRecentChats()) {
        String role = chat.getPlayerId().equals(aiPlayerId) ? "assistant" : "user";
        messages.add(new AIAgentMessage(role,
            String.format("%s: %s", chat.getPlayerName(), chat.getContent())));
    }
    
    return messages;
}

Speech Control Strategies:

Statement Length: Prompts require AI to limit statements to 2-3 sentences, avoiding lengthy monologues
Speech Timing: Uses AI Agent's VADSilenceSegmentation parameter, setting 500ms silence threshold to ensure AI speaks after players finish
Interruption Handling: Enables voice interruption, AI immediately stops speaking when players urgently interject

Listening and Understanding: Processing Human Speech

Voice-to-Text Flow:

// In Unity, listen to RTC voice streams
public void OnRemoteAudioFrame(String streamId, AudioFrame frame) {
    // Forward voice data to AI Agent for ASR recognition
    // AI Agent automatically pushes recognized text to LLM
}

// Receive ASR results from AI Agent (via callback)
public void OnASRResult(String playerId, String text) {
    // Update player speech content to game state
    gameState.AddChatLog(playerId, text);
    
    // Notify all AI players to update context
    foreach (var ai in aiPlayers) {
        UpdateAIContext(ai.InstanceId, gameState);
    }
}

Decision Update Flow:

Player voice → RTC transmission → AI Agent ASR recognition → Text content
Text content injected into AI's MessageHistory context
LLM re-analyzes situation based on new information, updates internal reasoning state
When AI needs to speak, LLM generates responses based on latest situation

Voting and Action Logic

Voting Based on Situation Analysis:

AI voting isn't random—it's based on LLM reasoning:

public void requestAIVote(String aiInstanceId, GameState state, AIVoteCallback callback) {
    // Build voting request prompt
    String alivePlayers = String.join(",", state.getAlivePlayers());
    String votePrompt = String.format(
        "Now entering voting phase, you need to vote to eliminate a player.\n" +
        "Alive Players: %s\n" +
        "Previous Round Speech Summary: %s\n" +
        "Your Suspect: Analyze who seems most suspicious based on speeches\n\n" +
        "Please select one player from below for voting, return only player ID: %s",
        alivePlayers, state.getChatSummary(), alivePlayers
    );
    
    // Call AI Agent's proactive LLM interface
    aiAgentClient.sendLLMRequest(aiInstanceId, votePrompt, new AIAgentCallback() {
        @Override
        public void onSuccess(String response) {
            // Parse returned player ID
            String votedPlayerId = parseVoteResponse(response);
            callback.onResult(new VoteResult(aiInstanceId, votedPlayerId));
        }
        
        @Override
        public void onError(Exception e) {
            callback.onError(e);
        }
    });
}

State Machine Flow:

Daytime Speech → Voting Phase → Nighttime Actions → Daytime Speech
       ↓              ↓              ↓
   AI Listening   AI Voting    AI Skills
   Player Speech  Based on     (Eliminate/
   Update Context Reasoning     Investigate etc.)
                 Select Target

Development Workflow and Implementation Code

Preparation: ZEGO Console Configuration

Step 1: Create Project and Obtain AppID

Log into ZEGO Console, click "Create Project"
Select "Real-Time Interactive AI Agent" service, record generated AppID and AppSign

Step 2: Enable AI Agent Service

In project management page, find "Real-Time Interactive AI Agent" module
Click "Enable Now", complete service activation (new users get free trial)

Step 3: Obtain ServerSecret

Navigate to "Project Configuration" → "Key Management"
Copy ServerSecret for server-side API call signature generation

Step 4: Configure LLM and TTS (Optional)

In "AI Agent Configuration" page, configure default LLM and TTS parameters
Supports multiple vendors: Volcano Engine, MiniMax, Alibaba Cloud, etc.

AI Agent Initialization and Registration

public class ZegoAIAgentManager {
    private static final String API_BASE = "https://aigc-aiagent-api.zegotech.cn";
    private static final String TAG = "ZegoAIAgentManager";
    private static final MediaType JSON = MediaType.get("application/json; charset=utf-8");
    
    private final String appId;
    private final String serverSecret;
    private final OkHttpClient httpClient;
    
    public ZegoAIAgentManager(String appId, String serverSecret) {
        this.appId = appId;
        this.serverSecret = serverSecret;
        this.httpClient = new OkHttpClient.Builder()
            .connectTimeout(10, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .build();
    }
    
    // Register AI agent (typically completed on server-side, Android calls business backend)
    public void registerAgent(String agentId, String agentName, String systemPrompt,
                             AgentCallback callback) {
        String timestamp = getTimestamp();
        String signature = generateSignature();
        String url = String.format(
            "%s?Action=RegisterAgent&AppId=%s&Timestamp=%s&Signature=%s",
            API_BASE, appId, timestamp, signature
        );
        
        try {
            JSONObject body = new JSONObject();
            body.put("Name", agentName);
            
            // Configure LLM
            JSONObject llm = new JSONObject();
            llm.put("Url", "https://ark.cn-beijing.volces.com/api/v3/chat/completions");
            llm.put("ApiKey", "your_api_key");
            llm.put("Model", "doubao-1-5-pro-32k-250115");
            llm.put("SystemPrompt", systemPrompt);
            llm.put("Temperature", 0.7);
            llm.put("TopP", 0.9);
            body.put("LLM", llm);
            
            // Configure TTS
            JSONObject tts = new JSONObject();
            tts.put("Vendor", "ByteDance");
            JSONObject ttsParams = new JSONObject();
            JSONObject ttsApp = new JSONObject();
            ttsApp.put("appid", "your_tts_appid");
            ttsApp.put("token", "your_tts_token");
            ttsApp.put("cluster", "volcano_tts");
            ttsParams.put("app", ttsApp);
            JSONObject ttsAudio = new JSONObject();
            ttsAudio.put("voice_type", "zh_female_wanwanxiaohe_moon_bigtts");
            ttsAudio.put("speed_ratio", 1.0);
            ttsParams.put("audio", ttsAudio);
            tts.put("Params", ttsParams);
            body.put("TTS", tts);
            
            // Configure ASR
            JSONObject asr = new JSONObject();
            asr.put("VADSilenceSegmentation", 500); // 500ms silence segmentation
            asr.put("VADMinSpeechDuration", 100); // Minimum 100ms counts as valid speech
            body.put("ASR", asr);
            
            RequestBody requestBody = RequestBody.create(body.toString(), JSON);
            Request request = new Request.Builder()
                .url(url)
                .post(requestBody)
                .build();
            
            httpClient.newCall(request).enqueue(new Callback() {
                @Override
                public void onFailure(Call call, IOException e) {
                    Log.e(TAG, "Registration failed: " + e.getMessage());
                    callback.onError(e);
                }
                
                @Override
                public void onResponse(Call call, Response response) throws IOException {
                    if (response.isSuccessful()) {
                        Log.i(TAG, "Agent " + agentId + " registered successfully");
                        callback.onSuccess(agentId);
                    } else {
                        Log.e(TAG, "Registration failed: " + response.body().string());
                        callback.onError(new Exception("Registration failed: " + response.code()));
                    }
                }
            });
        } catch (JSONException e) {
            callback.onError(e);
        }
    }
}

Creating AI Teammate Instances

public void createAIAgentInstance(String agentId, String roomId, String aiUserId,
                                 CreateInstanceCallback callback) {
    String timestamp = getTimestamp();
    String signature = generateSignature();
    String url = String.format(
        "%s?Action=CreateAgentInstance&AppId=%s&Timestamp=%s&Signature=%s",
        API_BASE, appId, timestamp, signature
    );
    
    try {
        JSONObject body = new JSONObject();
        body.put("AgentId", agentId);
        
        // Configure RTC
        JSONObject rtc = new JSONObject();
        rtc.put("RoomId", roomId);
        rtc.put("UserId", aiUserId); // AI player's UserId
        rtc.put("StreamId", aiUserId + "_main"); // AI's stream ID
        body.put("RTC", rtc);
        
        // Configure message history
        JSONObject messageHistory = new JSONObject();
        messageHistory.put("SyncMode", 1); // Use MessageHistory mode
        messageHistory.put("Messages", new org.json.JSONArray()); // Initial empty context
        messageHistory.put("WindowSize", 20); // Use recent 20 messages for each LLM call
        body.put("MessageHistory", messageHistory);
        
        // Advanced configuration
        JSONObject advancedConfig = new JSONObject();
        advancedConfig.put("MaxIdleTime", 300); // Auto-destroy after 300 seconds of inactivity
        advancedConfig.put("InterruptMode", 0); // Enable voice interruption
        body.put("AdvancedConfig", advancedConfig);
        
        RequestBody requestBody = RequestBody.create(body.toString(), JSON);
        Request request = new Request.Builder()
            .url(url)
            .post(requestBody)
            .build();
        
        httpClient.newCall(request).enqueue(new Callback() {
            @Override
            public void onFailure(Call call, IOException e) {
                Log.e(TAG, "Create instance failed: " + e.getMessage());
                callback.onError(e);
            }
            
            @Override
            public void onResponse(Call call, Response response) throws IOException {
                if (response.isSuccessful()) {
                    Log.i(TAG, "AI instance created successfully, joined room " + roomId);
                    callback.onSuccess();
                } else {
                    Log.e(TAG, "Create instance failed: " + response.body().string());
                    callback.onError(new Exception("Create failed: " + response.code()));
                }
            }
        });
    } catch (JSONException e) {
        callback.onError(e);
    }
}

Complete Speech Invocation Chain

public void triggerAISpeak(String aiInstanceId, GameState state) {
    // 1. Build current context (including latest game state)
    List<AIAgentMessage> contextMessages = buildContext(state, aiInstanceId);
    
    // 2. Update AI context
    updateAIContext(aiInstanceId, contextMessages, new SimpleCallback() {
        @Override
        public void onSuccess() {
            // 3. Trigger AI speech (call LLM to generate response)
            String timestamp = getTimestamp();
            String signature = generateSignature();
            String url = String.format(
                "%s?Action=SendAgentInstanceLLM&AppId=%s&Timestamp=%s&Signature=%s",
                API_BASE, appId, timestamp, signature
            );
            
            try {
                JSONObject body = new JSONObject();
                body.put("InstanceId", aiInstanceId);
                body.put("Prompt", "Now it's your turn to speak, please share your thoughts based on current situation.");
                body.put("AddToHistory", true); // Add response to context history
                
                RequestBody requestBody = RequestBody.create(body.toString(), JSON);
                Request request = new Request.Builder()
                    .url(url)
                    .post(requestBody)
                    .build();
                
                httpClient.newCall(request).enqueue(new Callback() {
                    @Override
                    public void onFailure(Call call, IOException e) {
                        Log.e(TAG, "Trigger AI speech failed: " + e.getMessage());
                    }
                    
                    @Override
                    public void onResponse(Call call, Response response) {
                        // AI Agent automatically completes:
                        // 1. LLM generates response → 2. TTS synthesizes voice → 3. Push stream via RTC for playback
                        // Android only needs to pull AI's audio stream to hear AI speech
                        Log.i(TAG, "AI speech triggered successfully");
                    }
                });
            } catch (JSONException e) {
                Log.e(TAG, "Trigger AI speech failed: " + e.getMessage());
            }
        }
        
        @Override
        public void onError(Exception e) {
            Log.e(TAG, "Update AI context failed: " + e.getMessage());
        }
    });
}

Human Player Speech Monitoring and State Updates

public class GameVoiceManager {
    private ZegoExpressEngine engine;
    private ZegoAIAgentManager aiAgentManager;
    private GameState gameState;
    private UIManager uiManager;
    
    private static final String TAG = "GameVoiceManager";
    
    public void init(Context context, long appId, String appSign) {
        // Initialize ZEGO Express SDK
        ZegoEngineProfile profile = new ZegoEngineProfile();
        profile.appID = appId;
        profile.appSign = appSign;
        profile.scenario = ZegoScenario.GENERAL;
        profile.application = context.getApplicationContext();
        
        ZegoExpressEngine.createEngine(profile, new IZegoEventHandler() {
            @Override
            public void onRoomStateUpdate(String roomID, ZegoRoomState state, int errorCode, JSONObject extendedData) {
                Log.i(TAG, "Room state update: " + roomID + ", state: " + state);
            }
        });
        
        engine = ZegoExpressEngine.getEngine();
        // Register AI Agent related callbacks
        registerAIAgentCallbacks();
    }
    
    // Register AI Agent callbacks
    private void registerAIAgentCallbacks() {
        // AI starts speaking callback (via business backend callback or RTC SEI message)
        onAgentSpeakStart = instanceId -> {
            // Update UI: show AI is speaking
            uiManager.showSpeakingIndicator(instanceId);
        };
        
        // AI ends speaking callback
        onAgentSpeakEnd = instanceId -> {
            // Update UI: hide speaking indicator
            uiManager.hideSpeakingIndicator(instanceId);
        };
        
        // Receive AI subtitles (for in-game chat box display)
        onAgentSubtitle = (instanceId, text) -> {
            // Add AI speech content to game chat records
            gameState.addChatLog(instanceId, text);
            uiManager.updateChatBox(instanceId, text);
        };
        
        // Human player speech recognition results (forwarded via business backend)
        onPlayerASRResult = (playerId, text) -> {
            // Update game state
            gameState.addChatLog(playerId, text);
            
            // Notify all AI players to update context
            if (aiAgentManager != null) {
                aiAgentManager.broadcastToAIs(gameState);
            }
        };
    }
    
    // Player joins room and pulls AI voice streams
    public void joinRoom(String roomId, String userId) {
        ZegoUser user = new ZegoUser(userId);
        ZegoRoomConfig config = new ZegoRoomConfig();
        config.maxMemberCount = 16;
        
        engine.loginRoom(roomId, user, config);
        engine.startPublishingStream(userId + "_main");
        
        // Pull all AI player voice streams
        if (gameState != null && gameState.getAiPlayers() != null) {
            for (AIPlayer aiPlayer : gameState.getAiPlayers()) {
                engine.startPlayingStream(aiPlayer.getUserId() + "_main");
            }
        }
    }
}

Extension Possibilities

Expanding from Goose Goose Duck to Werewolf primarily involves character configuration differences:

Richer Character Roles: Werewolf features more diverse roles (Seer, Witch, Hunter, etc.), requiring SystemPrompt additions for skill usage logic
Multi-AI Interaction: Creating multiple agent instances enables AI players to naturally communicate with each other through RTC voice
Multimodal Upgrades: Adding digital human avatars to AI through ZEGO Digital Human SDK creates visual AI teammates, further enhancing immersion

Conclusion: Transforming Social Gaming with AI

Through ZEGO AI Agent, Android developers can rapidly integrate AI teammates into Goose Goose Duck, Werewolf, and similar social voice games without modifying original game logic.

Core Benefits Include:

Solving Player Matching Challenges: AI fills empty slots, ensuring games can start anytime
24/7 Practice Partners: Players can practice and improve skills regardless of time zones or player availability
Lowering New Player Barriers: AI teammates provide patient guidance for newcomers learning game mechanics
Enhanced Game Fun: AI players add unpredictability and variety to each gaming session

ZEGO provides complete SDK documentation, example code, and technical support. Developers can visit the official website for detailed integration guides, opening a new chapter in AI-enhanced social gaming experiences.

The future of social gaming isn't human versus AI—it's humans enhanced by AI, creating richer, more accessible, and more engaging experiences for players worldwide.