Building AI Teammates for Social Voice Games: A Practical Implementation Guide Using Goose Goose Duck

Introduction

Social deduction games like Goose Goose Duck and Werewolf have captured the hearts of millions of players worldwide. These games thrive on human interaction, psychological manipulation, and the thrill of uncovering hidden identities. However, players frequently encounter frustrating obstacles: insufficient player counts to start a match, teammates going AFK during critical moments, and steep learning curves that discourage newcomers from continuing.

The integration of AI teammates presents an elegant solution to these persistent challenges. AI-powered players can fill empty slots in lobbies, provide practice opportunities for beginners, and ensure that matches proceed smoothly even when human players disconnect. This comprehensive guide demonstrates how to rapidly integrate AI teammate functionality into Android-based social voice games using ZEGO AI Agent technology, with Goose Goose Duck serving as our primary example.

Technical Architecture Overview

Overlay Architecture Design

The AI Agent layer employs an overlay architecture that sits atop existing game infrastructure without requiring modifications to core game logic. This design philosophy ensures minimal disruption to established codebases while enabling rapid feature deployment.

The architecture consists of three primary layers working in harmony. The Android game client maintains its original game logic layer, game state machine handling turns and voting phases, and UI presentation layer displaying roles and speech interfaces. These components remain untouched, preserving existing functionality and reducing integration risk.

Bridging the game client and AI services is the AI Agent adaptation layer, a newly introduced component responsible for state synchronization interfaces, voice data forwarding, and AI command parsing. This layer translates game events into AI-understandable formats and converts AI decisions back into game actions.

At the top of the stack, ZEGO AI Agent provides the intelligence backbone, combining Large Language Models for reasoning and dialogue generation, Text-to-Speech for natural voice output, and Automatic Speech Recognition for understanding player communications.

Core Module Specifications

The RTC (Real-Time Communication) voice module leverages ZEGO Express SDK to facilitate real-time voice conversations between human players and AI teammates. This module incorporates sophisticated audio processing capabilities including AI-powered noise suppression (ANS) that filters out background noise, acoustic echo cancellation (AEC) preventing audio feedback loops, and voice activity detection (VAD) that identifies when speakers are actively talking versus remaining silent. These features collectively ensure crystal-clear audio quality even in challenging multi-speaker scenarios typical of social deduction games.

The AI Agent module manages the complete lifecycle of intelligent game participants. The Large Language Model component handles conversation understanding, strategic reasoning, decision-making processes, and dialogue generation that mimics human player behavior. Text-to-Speech technology converts AI-generated text responses into natural-sounding voice output, while Automatic Speech Recognition transforms player voice inputs into text for LLM comprehension. The state synchronization subsystem continuously injects game state information including current round numbers, player roles, and voting status into the AI's contextual understanding.

SDK integration follows a hybrid approach. The Android client integrates ZEGO Express SDK directly, which already supports Android platforms comprehensively. AI Agent communication occurs through server-side API calls, with the Android client communicating via HTTP/HTTPS protocols to the business backend. The business backend assumes responsibility for registering intelligent agents, creating agent instances, and forwarding real-time game state updates to AI systems.

AI Teammate Functionality Breakdown

Character Configuration and Persona Design

AI teammate personalities are meticulously controlled through SystemPrompt configurations that define behavior patterns, knowledge boundaries, and communication styles. The following example demonstrates a comprehensive SystemPrompt configuration for a Duck role in Goose Goose Duck:

public static String getDuckSystemPrompt(String playerName) {
    return String.format(
        "You are a Duck faction player in Goose Goose Duck, your name is %s.\n\n" +
        "[Character Identity]\n" +
        "- You belong to the villain faction, your objective is to eliminate all Goose faction players\n" +
        "- You know the identities of other Ducks (your teammates), but never expose them\n" +
        "- You must disguise yourself as a Goose to gain trust\n\n" +
        "[Personality Traits]\n" +
        "- Cunning and cautious, skilled at deception\n" +
        "- Creates confusion during discussions, redirects suspicion to others\n" +
        "- Remains calm when questioned,必要时 sacrifices teammates for self-preservation\n\n" +
        "[Speaking Strategy]\n" +
        "- Opening phase: Observe primarily, occasionally agree with others' viewpoints\n" +
        "- Mid-game phase: Lead discussions, direct suspicion toward Goose players or neutral roles\n" +
        "- Late-game phase: If suspected, create 'I'm a loyal Goose' false impression\n\n" +
        "[Output Requirements]\n" +
        "- Keep each statement to 2-3 sentences\n" +
        "- Use conversational expressions like real players\n" +
        "- Can say things like 'I think XX seems suspicious' or 'I was doing tasks at XXX'\n" +
        "- Never reveal 'I am AI' or expose game mechanics",
        playerName
    );
}

This configuration addresses several critical dimensions. Character identity establishes faction alignment, victory conditions, and information boundaries—the AI knows certain information but must act within defined constraints. Personality traits determine behavioral style, whether aggressive, conservative, cunning, or straightforward. Speaking strategy adapts behavior based on game phases, recognizing that early-game caution differs from late-game desperation. Output requirements control statement length and linguistic style, ensuring AI communications resemble authentic human player interactions.

Speech Logic and Timing Control

AI speech triggering mechanisms respond to specific game state transitions. When the game state machine detects that it's the AI player's turn to speak, it automatically invokes the LLM interface to generate appropriate responses. During free discussion phases, the AI monitors conversation flow and determines optimal moments to interject using sophisticated interruption mechanisms that mirror natural human conversation patterns. Emergency events such as discovering bodies or triggering urgent tasks prompt immediate AI responses that maintain immersion.

Real-time context injection ensures AI decisions reflect current game conditions. The context building process begins by injecting system prompts containing role definitions and behavioral guidelines. Game state information including current round numbers, alive player lists, and recent events provides situational awareness. Historical speech records from the last N exchanges establish conversation continuity and enable the AI to reference previous statements appropriately.

public List<AIAgentMessage> buildContext(GameState state, String aiPlayerId) {
    List<AIAgentMessage> messages = new ArrayList<>();
    
    // 1. Inject system prompt (role definition)
    messages.add(new AIAgentMessage("system", getDuckSystemPrompt(aiPlayerId)));
    
    // 2. Inject game state (current round, alive players, etc.)
    String alivePlayers = String.join(",", state.getAlivePlayers());
    messages.add(new AIAgentMessage("user",
        String.format("[Game State] Current Round: %d, Alive Players: %s, Events After Your Last Turn: %s",
            state.getRound(), alivePlayers, state.getLastEvents())));
    
    // 3. Inject historical speech records (recent N entries)
    for (ChatMessage chat : state.getRecentChats()) {
        String role = chat.getPlayerId().equals(aiPlayerId) ? "assistant" : "user";
        messages.add(new AIAgentMessage(role,
            String.format("%s: %s", chat.getPlayerName(), chat.getContent())));
    }
    
    return messages;
}

Speech control strategies employ multiple techniques for natural interaction. Statement length constraints enforced through prompts keep AI responses concise, typically 2-3 sentences to avoid monotonous monologues. Speech timing leverages AI Agent VAD (Voice Activity Detection) silence segmentation parameters, configuring 500ms silence thresholds to ensure AI speaks only after players complete their statements. Interruption handling enables voice interruption features, allowing AI to immediately cease speaking when players urgently interject, mimicking natural conversation dynamics.

Listening and Comprehension Mechanisms

AI systems acquire human player voice inputs and update decisions through sophisticated audio processing pipelines. The voice-to-text conversion process begins with RTC voice stream monitoring within the game client. Audio frames captured from remote players are forwarded to the AI Agent for ASR recognition. The AI Agent automatically converts recognized speech to text and pushes results to the LLM for understanding.

// Monitor RTC voice streams in Unity
public void OnRemoteAudioFrame(String streamId, AudioFrame frame) {
    // Forward voice data to AI Agent for ASR recognition
    // AI Agent automatically pushes recognized text to LLM
}

// Receive ASR results from AI Agent (via callback)
public void OnASRResult(String playerId, String text) {
    // Update player speech content to game state
    gameState.AddChatLog(playerId, text);
    
    // Notify all AI players to update context
    foreach (var ai in aiPlayers) {
        UpdateAIContext(ai.InstanceId, gameState);
    }
}

The decision update workflow follows a systematic progression. Player voice transmits through RTC infrastructure to AI Agent ASR recognition systems, producing text content. This text content injects into the AI's MessageHistory context, maintaining conversation continuity. The LLM reanalyzes the situation based on new information, updating internal reasoning states. When AI speech becomes necessary, the LLM generates responses grounded in the latest situational understanding.

Voting and Action Systems

AI voting decisions emerge from reasoned analysis rather than random selection, leveraging LLM inference capabilities. The voting request process constructs prompts that provide comprehensive situational context:

public void requestAIVote(String aiInstanceId, GameState state, AIVoteCallback callback) {
    // Build voting request prompt
    String alivePlayers = String.join(",", state.getAlivePlayers());
    String votePrompt = String.format(
        "Voting phase initiated, you must vote to eliminate one player.\n" +
        "Alive Players: %s\n" +
        "Previous Round Speech Summary: %s\n" +
        "Your Suspects: Analyze who appears most suspicious based on speeches\n\n" +
        "Select one player from the following for voting, return only player ID: %s",
        alivePlayers, state.getChatSummary(), alivePlayers
    );
    
    // Call AI Agent's active LLM request interface
    aiAgentClient.sendLLMRequest(aiInstanceId, votePrompt, new AIAgentCallback() {
        @Override
        public void onSuccess(String response) {
            // Parse returned player ID
            String votedPlayerId = parseVoteResponse(response);
            callback.onResult(new VoteResult(aiInstanceId, votedPlayerId));
        }
        
        @Override
        public void onError(Exception e) {
            callback.onError(e);
        }
    });
}

State machine transitions coordinate AI behavior across game phases. During daytime speech phases, AI listens to player statements and updates contextual understanding. Voting phases trigger AI voting logic based on accumulated reasoning. Night action phases enable AI skill usage including elimination attempts or investigation actions depending on assigned roles.

Development Workflow and Implementation

Preparation: ZEGO Console Configuration

Before beginning development, complete the following configurations in the ZEGO console. First, create a project and obtain AppID by logging into the ZEGO console, clicking "Create Project," selecting the "Real-Time Interactive AI Agent" service, and recording the generated AppID and AppSign credentials.

Next, activate the AI Agent service by navigating to the project management page, locating the "Real-Time Interactive AI Agent" module, and clicking "Activate Now." New users can access free trial periods for evaluation purposes.

Obtain ServerSecret by entering "Project Configuration" → "Key Management" and copying the ServerSecret value, which enables server-side API call signature generation for secure communications.

Optionally configure LLM and TTS settings in the "AI Agent Configuration" page. The system supports multiple vendors including Volcano Engine, MiniMax, and Alibaba Cloud, allowing customization based on performance requirements and cost considerations.

AI Agent Initialization and Registration

The following Android code demonstrates AI Agent registration implementation:

import android.util.Log;
import org.json.JSONException;
import org.json.JSONObject;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import okhttp3.Call;
import okhttp3.Callback;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

public class ZegoAIAgentManager {
    private static final String API_BASE = "https://aigc-aiagent-api.zegotech.cn";
    private static final String TAG = "ZegoAIAgentManager";
    private static final MediaType JSON = MediaType.get("application/json; charset=utf-8");
    
    private final String appId;
    private final String serverSecret;
    private final OkHttpClient httpClient;
    
    public ZegoAIAgentManager(String appId, String serverSecret) {
        this.appId = appId;
        this.serverSecret = serverSecret;
        this.httpClient = new OkHttpClient.Builder()
            .connectTimeout(10, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .build();
    }
    
    // Register AI agent (typically completed on server-side, Android calls business backend)
    public void registerAgent(String agentId, String agentName, String systemPrompt,
            AgentCallback callback) {
        String timestamp = getTimestamp();
        String signature = generateSignature();
        String url = String.format(
            "%s?Action=RegisterAgent&AppId=%s&Timestamp=%s&Signature=%s",
            API_BASE, appId, timestamp, signature
        );
        
        try {
            JSONObject body = new JSONObject();
            body.put("Name", agentName);
            
            // Configure LLM settings
            JSONObject llm = new JSONObject();
            llm.put("Url", "https://ark.cn-beijing.volces.com/api/v3/chat/completions");
            llm.put("ApiKey", "your_api_key");
            llm.put("Model", "doubao-1-5-pro-32k-250115");
            llm.put("SystemPrompt", systemPrompt);
            llm.put("Temperature", 0.7);
            llm.put("TopP", 0.9);
            body.put("LLM", llm);
            
            // Configure TTS settings
            JSONObject tts = new JSONObject();
            tts.put("Vendor", "ByteDance");
            JSONObject ttsParams = new JSONObject();
            JSONObject ttsApp = new JSONObject();
            ttsApp.put("appid", "your_tts_appid");
            ttsApp.put("token", "your_tts_token");
            ttsApp.put("cluster", "volcano_tts");
            ttsParams.put("app", ttsApp);
            JSONObject ttsAudio = new JSONObject();
            ttsAudio.put("voice_type", "zh_female_wanwanxiaohe_moon_bigtts");
            ttsAudio.put("speed_ratio", 1.0);
            ttsParams.put("audio", ttsAudio);
            tts.put("Params", ttsParams);
            body.put("TTS", tts);
            
            // Configure ASR settings
            JSONObject asr = new JSONObject();
            asr.put("VADSilenceSegmentation", 500);
            asr.put("VADMinSpeechDuration", 100);
            body.put("ASR", asr);
            
            RequestBody requestBody = RequestBody.create(body.toString(), JSON);
            Request request = new Request.Builder()
                .url(url)
                .post(requestBody)
                .build();
            
            httpClient.newCall(request).enqueue(new Callback() {
                @Override
                public void onFailure(Call call, IOException e) {
                    Log.e(TAG, "Registration failed: " + e.getMessage());
                    callback.onError(e);
                }
                
                @Override
                public void onResponse(Call call, Response response) throws IOException {
                    if (response.isSuccessful()) {
                        Log.i(TAG, "Agent " + agentId + " registered successfully");
                        callback.onSuccess(agentId);
                    } else {
                        Log.e(TAG, "Registration failed: " + response.body().string());
                        callback.onError(new Exception("Registration failed: " + response.code()));
                    }
                }
            });
        } catch (JSONException e) {
            callback.onError(e);
        }
    }
    
    private String getTimestamp() {
        return String.valueOf(System.currentTimeMillis() / 1000);
    }
    
    private String generateSignature() {
        // Signature generation logic (refer to ZEGO signature documentation)
        // Should be completed on server-side to avoid exposing serverSecret
        return "signature";
    }
    
    public interface AgentCallback {
        void onSuccess(String agentId);
        void onError(Exception e);
    }
}

Creating AI Teammate Instances

AI agent instance creation joins the AI to the RTC room:

public void createAIAgentInstance(String agentId, String roomId, String aiUserId,
        CreateInstanceCallback callback) {
    String timestamp = getTimestamp();
    String signature = generateSignature();
    String url = String.format(
        "%s?Action=CreateAgentInstance&AppId=%s&Timestamp=%s&Signature=%s",
        API_BASE, appId, timestamp, signature
    );
    
    try {
        JSONObject body = new JSONObject();
        body.put("AgentId", agentId);
        
        // Configure RTC settings
        JSONObject rtc = new JSONObject();
        rtc.put("RoomId", roomId);
        rtc.put("UserId", aiUserId);
        rtc.put("StreamId", aiUserId + "_main");
        body.put("RTC", rtc);
        
        // Configure message history
        JSONObject messageHistory = new JSONObject();
        messageHistory.put("SyncMode", 1);
        messageHistory.put("Messages", new org.json.JSONArray());
        messageHistory.put("WindowSize", 20);
        body.put("MessageHistory", messageHistory);
        
        // Configure advanced settings
        JSONObject advancedConfig = new JSONObject();
        advancedConfig.put("MaxIdleTime", 300);
        advancedConfig.put("InterruptMode", 0);
        body.put("AdvancedConfig", advancedConfig);
        
        RequestBody requestBody = RequestBody.create(body.toString(), JSON);
        Request request = new Request.Builder()
            .url(url)
            .post(requestBody)
            .build();
        
        httpClient.newCall(request).enqueue(new Callback() {
            @Override
            public void onFailure(Call call, IOException e) {
                Log.e(TAG, "Instance creation failed: " + e.getMessage());
                callback.onError(e);
            }
            
            @Override
            public void onResponse(Call call, Response response) throws IOException {
                if (response.isSuccessful()) {
                    Log.i(TAG, "AI instance created successfully, joined room " + roomId);
                    callback.onSuccess();
                } else {
                    Log.e(TAG, "Instance creation failed: " + response.body().string());
                    callback.onError(new Exception("Creation failed: " + response.code()));
                }
            }
        });
    } catch (JSONException e) {
        callback.onError(e);
    }
}

Complete Speech Invocation Chain

Active AI speech triggering occurs through the game state machine:

public void triggerAISpeak(String aiInstanceId, GameState state) {
    // 1. Build current context (including latest game state)
    List<AIAgentMessage> contextMessages = buildContext(state, aiInstanceId);
    
    // 2. Update AI context
    updateAIContext(aiInstanceId, contextMessages, new SimpleCallback() {
        @Override
        public void onSuccess() {
            // 3. Trigger AI speech (call LLM to generate response)
            String timestamp = getTimestamp();
            String signature = generateSignature();
            String url = String.format(
                "%s?Action=SendAgentInstanceLLM&AppId=%s&Timestamp=%s&Signature=%s",
                API_BASE, appId, timestamp, signature
            );
            
            try {
                JSONObject body = new JSONObject();
                body.put("InstanceId", aiInstanceId);
                body.put("Prompt", "It's your turn to speak, express your views based on current situation.");
                body.put("AddToHistory", true);
                
                RequestBody requestBody = RequestBody.create(body.toString(), JSON);
                Request request = new Request.Builder()
                    .url(url)
                    .post(requestBody)
                    .build();
                
                httpClient.newCall(request).enqueue(new Callback() {
                    @Override
                    public void onFailure(Call call, IOException e) {
                        Log.e(TAG, "AI speech trigger failed: " + e.getMessage());
                    }
                    
                    @Override
                    public void onResponse(Call call, Response response) {
                        // AI Agent automatically completes:
                        // 1. LLM generates response → 2. TTS synthesizes voice → 3. RTC streams audio
                        // Android client only needs to pull AI audio stream to hear AI speech
                        Log.i(TAG, "AI speech triggered successfully");
                    }
                });
            } catch (JSONException e) {
                Log.e(TAG, "AI speech trigger failed: " + e.getMessage());
            }
        }
        
        @Override
        public void onError(Exception e) {
            Log.e(TAG, "AI context update failed: " + e.getMessage());
        }
    });
}

Human Player Speech Monitoring and State Updates

The GameVoiceManager class orchestrates voice interactions:

public class GameVoiceManager {
    private ZegoExpressEngine engine;
    private ZegoAIAgentManager aiAgentManager;
    private GameState gameState;
    private UIManager uiManager;
    
    private static final String TAG = "GameVoiceManager";
    
    public void init(Context context, long appId, String appSign) {
        // Initialize ZEGO Express SDK
        ZegoEngineProfile profile = new ZegoEngineProfile();
        profile.appID = appId;
        profile.appSign = appSign;
        profile.scenario = ZegoScenario.GENERAL;
        profile.application = context.getApplicationContext();
        
        ZegoExpressEngine.createEngine(profile, new IZegoEventHandler() {
            @Override
            public void onRoomStateUpdate(String roomID, ZegoRoomState state, 
                    int errorCode, JSONObject extendedData) {
                Log.i(TAG, "Room state update: " + roomID + ", state: " + state);
            }
        });
        
        engine = ZegoExpressEngine.getEngine();
        registerAIAgentCallbacks();
    }
    
    private void registerAIAgentCallbacks() {
        // AI starts speaking callback
        onAgentSpeakStart = instanceId -> {
            uiManager.showSpeakingIndicator(instanceId);
        };
        
        // AI ends speaking callback
        onAgentSpeakEnd = instanceId -> {
            uiManager.hideSpeakingIndicator(instanceId);
        };
        
        // Receive AI subtitles (for in-game chat box display)
        onAgentSubtitle = (instanceId, text) -> {
            gameState.addChatLog(instanceId, text);
            uiManager.updateChatBox(instanceId, text);
        };
        
        // Human player speech recognition results
        onPlayerASRResult = (playerId, text) -> {
            gameState.addChatLog(playerId, text);
            if (aiAgentManager != null) {
                aiAgentManager.broadcastToAIs(gameState);
            }
        };
    }
    
    public void joinRoom(String roomId, String userId) {
        ZegoUser user = new ZegoUser(userId);
        ZegoRoomConfig config = new ZegoRoomConfig();
        config.maxMemberCount = 16;
        
        engine.loginRoom(roomId, user, config);
        engine.startPublishingStream(userId + "_main");
        
        // Pull all AI player voice streams
        if (gameState != null && gameState.getAiPlayers() != null) {
            for (AIPlayer aiPlayer : gameState.getAiPlayers()) {
                engine.startPlayingStream(aiPlayer.getUserId() + "_main");
            }
        }
    }
}

Extension Possibilities

Expanding from Goose Goose Duck to Werewolf primarily involves role configuration differences. Werewolf features richer character varieties including Seer, Witch, Hunter, and Guard roles, requiring enhanced SystemPrompt configurations that incorporate skill usage logic and special ability interactions.

Multi-AI interaction becomes possible through creating multiple intelligent agent instances, enabling AI players to communicate naturally with each other through RTC voice channels. This creates dynamic group conversations that mirror human player interactions.

Multimodal upgrades can introduce digital human avatars for AI teammates. Integrating ZEGO Digital Human SDK enables visual AI teammates that significantly enhance immersion, combining voice interactions with expressive facial animations and body language.

Conclusion

Through ZEGO AI Agent integration, Android developers can rapidly incorporate AI teammate functionality into social voice games like Goose Goose Duck and Werewolf without modifying original game logic. The overlay architecture preserves existing codebases while enabling powerful new capabilities.

Core benefits include solving player count challenges, providing 24/7 practice partners, lowering barriers for newcomers, and enhancing overall game enjoyment. ZEGO provides comprehensive SDK documentation, example code, and technical support. Developers can access detailed integration guides through the official ZEGO website, opening new chapters in AI-enhanced social gaming experiences.

The future of social deduction games lies in seamless human-AI collaboration, where intelligent teammates enhance rather than replace human interaction. By following the implementation patterns described in this guide, developers can create more accessible, engaging, and resilient gaming experiences that adapt to player needs while maintaining the psychological depth that makes social deduction games compelling.