Building AI Teammates for Social Voice Games: A Practical Implementation Guide Using Goose Goose Duck
Introduction: Solving the Player Matching Challenge in Social Games
Social deduction games like Goose Goose Duck and Werewolf have captured millions of players worldwide, offering thrilling experiences of deception, deduction, and social interaction. However, these games face persistent challenges that limit player enjoyment: difficulty gathering enough participants, teammates going AFK during critical moments, and poor new player experiences due to skill gaps.
AI teammates present an elegant solution to these longstanding problems. By filling empty slots, providing practice partners, and lowering entry barriers, AI players enable high-quality gaming sessions anytime, anywhere. This comprehensive guide explores how to integrate AI teammates into Android-based social voice games using ZEGO AI Agent technology, with Goose Goose Duck serving as our primary example.
Technical Architecture: The Overlay Approach
Non-Invasive Integration Design
The AI Agent layer operates as an external overlay on top of existing game architecture, requiring no modifications to core game logic. This design philosophy ensures minimal disruption to existing codebases while enabling rapid AI integration.
┌─────────────────────────────────────────────────────────┐
│ Android Game Client │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Game Logic │ │ Game State │ │ UI Presentation │ │
│ │ (Original) │ │ Machine │ │ (Roles/Voice) │ │
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │ │
│ ┌──────┴────────────────┴───────────────────┴──────┐ │
│ │ AI Agent Adaptation Layer (New) │ │
│ │ • State Sync Interface • Voice Data Forwarding │ │
│ │ • AI Command Parsing │ │
│ └─────────────────────┬─────────────────────────────┘ │
└────────────────────────┼──────────────────────────────────┘
│
┌──────────┴──────────┐
│ ZEGO AI Agent │
│ LLM + TTS + ASR │
└─────────────────────┘This architecture separates concerns cleanly: the game handles gameplay mechanics while the AI Agent manages intelligent behavior, voice processing, and natural language interactions.
Core Module Breakdown
RTC Voice Module: Built on ZEGO Express SDK, this component enables real-time voice communication between human players and AI teammates. Critical features include:
- ANS (Active Noise Suppression): Filters background noise for crystal-clear communication
- AEC (Acoustic Echo Cancellation): Prevents audio feedback loops in multi-player scenarios
- VAD (Voice Activity Detection): Identifies when participants are speaking, enabling natural conversation flow
AI Agent Module: Manages the complete intelligent agent lifecycle:
- LLM (Large Language Model): Handles conversation understanding, reasoning decisions, and dialogue generation
- TTS (Text-to-Speech): Converts AI responses into natural-sounding voice output
- ASR (Automatic Speech Recognition): Transcribes player speech into text for LLM comprehension
- State Synchronization: Injects game state information (rounds, roles, voting status) into AI context
SDK Integration Approach:
- Android clients integrate ZEGO Express SDK (Android support confirmed)
- AI Agent communicates via server-side API calls
- Android communicates with business backend through HTTP/HTTPS
- Business backend handles agent registration, instance creation, and game state forwarding
AI Teammate Functionality Breakdown
Character Configuration Through System Prompts
AI teammate personas are precisely controlled through carefully crafted SystemPrompts. Here's a comprehensive example for a "Duck" (antagonist) role in Goose Goose Duck:
public static String getDuckSystemPrompt(String playerName) {
return String.format(
"You are a Duck faction player in Goose Goose Duck, your name is %s.\n\n" +
"[Character Identity]\n" +
"- You are on the villain team, goal is to eliminate all Goose faction players\n" +
"- You know other Ducks' identities (teammates), but don't expose them\n" +
"- You need to disguise as a Goose to gain trust\n\n" +
"[Personality Traits]\n" +
"- Cunning, cautious, skilled at disguise\n" +
"- Speech creates confusion, redirects suspicion to others\n" +
"- When questioned, calmly defends,必要时 sacrifices teammates for self-preservation\n\n" +
"[Speech Strategy]\n" +
"- Opening phase: Observe primarily, occasionally agree with others\n" +
"- Mid-game phase: Lead discussions, direct suspicion toward Goose or neutral players\n" +
"- Late-game phase: If suspected, create 'I'm a white Goose' illusion\n\n" +
"[Output Requirements]\n" +
"- Keep each statement to 2-3 sentences\n" +
"- Use conversational expressions, speak like real players\n" +
"- Can say things like 'I think xx seems suspicious' 'I was doing task at xxx'\n" +
"- Never say 'I am AI' or expose game mechanics",
playerName
);
}Configuration Key Points:
- Character Identity: Clearly defines faction, objectives, and information boundaries (AI doesn't know other players' true identities)
- Personality Traits: Determines AI behavioral style (aggressive/conservative/cunning/honest)
- Speech Strategy: Adjusts behavior patterns based on game phase (opening/mid-game/late-game)
- Output Requirements: Controls statement length and language style to simulate human players
Speech Logic and Trigger Mechanisms
Trigger Timing:
- Turn-Based Speaking: Game state machine detects when it's the AI player's turn, calls LLM interface proactively
- Free Discussion Phase: After player statements, AI judges whether to interject based on context (using interruption mechanisms)
- Emergency Events: AI speaks proactively when discovering bodies or triggering emergency tasks
Real-Time Context Injection:
public List<AIAgentMessage> buildContext(GameState state, String aiPlayerId) {
List<AIAgentMessage> messages = new ArrayList<>();
// 1. Inject system prompt (character setting)
messages.add(new AIAgentMessage("system", getDuckSystemPrompt(aiPlayerId)));
// 2. Inject game state (current round, alive players, etc.)
String alivePlayers = String.join(",", state.getAlivePlayers());
messages.add(new AIAgentMessage("user",
String.format("[Game State] Current Round: %d, Alive Players: %s, Events After Your Last Statement: %s",
state.getRound(), alivePlayers, state.getLastEvents())));
// 3. Inject historical speech records (recent N messages)
for (ChatMessage chat : state.getRecentChats()) {
String role = chat.getPlayerId().equals(aiPlayerId) ? "assistant" : "user";
messages.add(new AIAgentMessage(role,
String.format("%s: %s", chat.getPlayerName(), chat.getContent())));
}
return messages;
}Speech Control Strategies:
- Statement Length: Prompts require AI to limit statements to 2-3 sentences, avoiding lengthy monologues
- Speech Timing: Uses AI Agent's VADSilenceSegmentation parameter, setting 500ms silence threshold to ensure AI speaks after players finish
- Interruption Handling: Enables voice interruption, AI immediately stops speaking when players urgently interject
Listening and Understanding: Processing Human Speech
Voice-to-Text Flow:
// In Unity, listen to RTC voice streams
public void OnRemoteAudioFrame(String streamId, AudioFrame frame) {
// Forward voice data to AI Agent for ASR recognition
// AI Agent automatically pushes recognized text to LLM
}
// Receive ASR results from AI Agent (via callback)
public void OnASRResult(String playerId, String text) {
// Update player speech content to game state
gameState.AddChatLog(playerId, text);
// Notify all AI players to update context
foreach (var ai in aiPlayers) {
UpdateAIContext(ai.InstanceId, gameState);
}
}Decision Update Flow:
- Player voice → RTC transmission → AI Agent ASR recognition → Text content
- Text content injected into AI's MessageHistory context
- LLM re-analyzes situation based on new information, updates internal reasoning state
- When AI needs to speak, LLM generates responses based on latest situation
Voting and Action Logic
Voting Based on Situation Analysis:
AI voting isn't random—it's based on LLM reasoning:
public void requestAIVote(String aiInstanceId, GameState state, AIVoteCallback callback) {
// Build voting request prompt
String alivePlayers = String.join(",", state.getAlivePlayers());
String votePrompt = String.format(
"Now entering voting phase, you need to vote to eliminate a player.\n" +
"Alive Players: %s\n" +
"Previous Round Speech Summary: %s\n" +
"Your Suspect: Analyze who seems most suspicious based on speeches\n\n" +
"Please select one player from below for voting, return only player ID: %s",
alivePlayers, state.getChatSummary(), alivePlayers
);
// Call AI Agent's proactive LLM interface
aiAgentClient.sendLLMRequest(aiInstanceId, votePrompt, new AIAgentCallback() {
@Override
public void onSuccess(String response) {
// Parse returned player ID
String votedPlayerId = parseVoteResponse(response);
callback.onResult(new VoteResult(aiInstanceId, votedPlayerId));
}
@Override
public void onError(Exception e) {
callback.onError(e);
}
});
}State Machine Flow:
Daytime Speech → Voting Phase → Nighttime Actions → Daytime Speech
↓ ↓ ↓
AI Listening AI Voting AI Skills
Player Speech Based on (Eliminate/
Update Context Reasoning Investigate etc.)
Select TargetDevelopment Workflow and Implementation Code
Preparation: ZEGO Console Configuration
Step 1: Create Project and Obtain AppID
- Log into ZEGO Console, click "Create Project"
- Select "Real-Time Interactive AI Agent" service, record generated AppID and AppSign
Step 2: Enable AI Agent Service
- In project management page, find "Real-Time Interactive AI Agent" module
- Click "Enable Now", complete service activation (new users get free trial)
Step 3: Obtain ServerSecret
- Navigate to "Project Configuration" → "Key Management"
- Copy ServerSecret for server-side API call signature generation
Step 4: Configure LLM and TTS (Optional)
- In "AI Agent Configuration" page, configure default LLM and TTS parameters
- Supports multiple vendors: Volcano Engine, MiniMax, Alibaba Cloud, etc.
AI Agent Initialization and Registration
public class ZegoAIAgentManager {
private static final String API_BASE = "https://aigc-aiagent-api.zegotech.cn";
private static final String TAG = "ZegoAIAgentManager";
private static final MediaType JSON = MediaType.get("application/json; charset=utf-8");
private final String appId;
private final String serverSecret;
private final OkHttpClient httpClient;
public ZegoAIAgentManager(String appId, String serverSecret) {
this.appId = appId;
this.serverSecret = serverSecret;
this.httpClient = new OkHttpClient.Builder()
.connectTimeout(10, TimeUnit.SECONDS)
.readTimeout(10, TimeUnit.SECONDS)
.build();
}
// Register AI agent (typically completed on server-side, Android calls business backend)
public void registerAgent(String agentId, String agentName, String systemPrompt,
AgentCallback callback) {
String timestamp = getTimestamp();
String signature = generateSignature();
String url = String.format(
"%s?Action=RegisterAgent&AppId=%s&Timestamp=%s&Signature=%s",
API_BASE, appId, timestamp, signature
);
try {
JSONObject body = new JSONObject();
body.put("Name", agentName);
// Configure LLM
JSONObject llm = new JSONObject();
llm.put("Url", "https://ark.cn-beijing.volces.com/api/v3/chat/completions");
llm.put("ApiKey", "your_api_key");
llm.put("Model", "doubao-1-5-pro-32k-250115");
llm.put("SystemPrompt", systemPrompt);
llm.put("Temperature", 0.7);
llm.put("TopP", 0.9);
body.put("LLM", llm);
// Configure TTS
JSONObject tts = new JSONObject();
tts.put("Vendor", "ByteDance");
JSONObject ttsParams = new JSONObject();
JSONObject ttsApp = new JSONObject();
ttsApp.put("appid", "your_tts_appid");
ttsApp.put("token", "your_tts_token");
ttsApp.put("cluster", "volcano_tts");
ttsParams.put("app", ttsApp);
JSONObject ttsAudio = new JSONObject();
ttsAudio.put("voice_type", "zh_female_wanwanxiaohe_moon_bigtts");
ttsAudio.put("speed_ratio", 1.0);
ttsParams.put("audio", ttsAudio);
tts.put("Params", ttsParams);
body.put("TTS", tts);
// Configure ASR
JSONObject asr = new JSONObject();
asr.put("VADSilenceSegmentation", 500); // 500ms silence segmentation
asr.put("VADMinSpeechDuration", 100); // Minimum 100ms counts as valid speech
body.put("ASR", asr);
RequestBody requestBody = RequestBody.create(body.toString(), JSON);
Request request = new Request.Builder()
.url(url)
.post(requestBody)
.build();
httpClient.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(Call call, IOException e) {
Log.e(TAG, "Registration failed: " + e.getMessage());
callback.onError(e);
}
@Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
Log.i(TAG, "Agent " + agentId + " registered successfully");
callback.onSuccess(agentId);
} else {
Log.e(TAG, "Registration failed: " + response.body().string());
callback.onError(new Exception("Registration failed: " + response.code()));
}
}
});
} catch (JSONException e) {
callback.onError(e);
}
}
}Creating AI Teammate Instances
public void createAIAgentInstance(String agentId, String roomId, String aiUserId,
CreateInstanceCallback callback) {
String timestamp = getTimestamp();
String signature = generateSignature();
String url = String.format(
"%s?Action=CreateAgentInstance&AppId=%s&Timestamp=%s&Signature=%s",
API_BASE, appId, timestamp, signature
);
try {
JSONObject body = new JSONObject();
body.put("AgentId", agentId);
// Configure RTC
JSONObject rtc = new JSONObject();
rtc.put("RoomId", roomId);
rtc.put("UserId", aiUserId); // AI player's UserId
rtc.put("StreamId", aiUserId + "_main"); // AI's stream ID
body.put("RTC", rtc);
// Configure message history
JSONObject messageHistory = new JSONObject();
messageHistory.put("SyncMode", 1); // Use MessageHistory mode
messageHistory.put("Messages", new org.json.JSONArray()); // Initial empty context
messageHistory.put("WindowSize", 20); // Use recent 20 messages for each LLM call
body.put("MessageHistory", messageHistory);
// Advanced configuration
JSONObject advancedConfig = new JSONObject();
advancedConfig.put("MaxIdleTime", 300); // Auto-destroy after 300 seconds of inactivity
advancedConfig.put("InterruptMode", 0); // Enable voice interruption
body.put("AdvancedConfig", advancedConfig);
RequestBody requestBody = RequestBody.create(body.toString(), JSON);
Request request = new Request.Builder()
.url(url)
.post(requestBody)
.build();
httpClient.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(Call call, IOException e) {
Log.e(TAG, "Create instance failed: " + e.getMessage());
callback.onError(e);
}
@Override
public void onResponse(Call call, Response response) throws IOException {
if (response.isSuccessful()) {
Log.i(TAG, "AI instance created successfully, joined room " + roomId);
callback.onSuccess();
} else {
Log.e(TAG, "Create instance failed: " + response.body().string());
callback.onError(new Exception("Create failed: " + response.code()));
}
}
});
} catch (JSONException e) {
callback.onError(e);
}
}Complete Speech Invocation Chain
public void triggerAISpeak(String aiInstanceId, GameState state) {
// 1. Build current context (including latest game state)
List<AIAgentMessage> contextMessages = buildContext(state, aiInstanceId);
// 2. Update AI context
updateAIContext(aiInstanceId, contextMessages, new SimpleCallback() {
@Override
public void onSuccess() {
// 3. Trigger AI speech (call LLM to generate response)
String timestamp = getTimestamp();
String signature = generateSignature();
String url = String.format(
"%s?Action=SendAgentInstanceLLM&AppId=%s&Timestamp=%s&Signature=%s",
API_BASE, appId, timestamp, signature
);
try {
JSONObject body = new JSONObject();
body.put("InstanceId", aiInstanceId);
body.put("Prompt", "Now it's your turn to speak, please share your thoughts based on current situation.");
body.put("AddToHistory", true); // Add response to context history
RequestBody requestBody = RequestBody.create(body.toString(), JSON);
Request request = new Request.Builder()
.url(url)
.post(requestBody)
.build();
httpClient.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(Call call, IOException e) {
Log.e(TAG, "Trigger AI speech failed: " + e.getMessage());
}
@Override
public void onResponse(Call call, Response response) {
// AI Agent automatically completes:
// 1. LLM generates response → 2. TTS synthesizes voice → 3. Push stream via RTC for playback
// Android only needs to pull AI's audio stream to hear AI speech
Log.i(TAG, "AI speech triggered successfully");
}
});
} catch (JSONException e) {
Log.e(TAG, "Trigger AI speech failed: " + e.getMessage());
}
}
@Override
public void onError(Exception e) {
Log.e(TAG, "Update AI context failed: " + e.getMessage());
}
});
}Human Player Speech Monitoring and State Updates
public class GameVoiceManager {
private ZegoExpressEngine engine;
private ZegoAIAgentManager aiAgentManager;
private GameState gameState;
private UIManager uiManager;
private static final String TAG = "GameVoiceManager";
public void init(Context context, long appId, String appSign) {
// Initialize ZEGO Express SDK
ZegoEngineProfile profile = new ZegoEngineProfile();
profile.appID = appId;
profile.appSign = appSign;
profile.scenario = ZegoScenario.GENERAL;
profile.application = context.getApplicationContext();
ZegoExpressEngine.createEngine(profile, new IZegoEventHandler() {
@Override
public void onRoomStateUpdate(String roomID, ZegoRoomState state, int errorCode, JSONObject extendedData) {
Log.i(TAG, "Room state update: " + roomID + ", state: " + state);
}
});
engine = ZegoExpressEngine.getEngine();
// Register AI Agent related callbacks
registerAIAgentCallbacks();
}
// Register AI Agent callbacks
private void registerAIAgentCallbacks() {
// AI starts speaking callback (via business backend callback or RTC SEI message)
onAgentSpeakStart = instanceId -> {
// Update UI: show AI is speaking
uiManager.showSpeakingIndicator(instanceId);
};
// AI ends speaking callback
onAgentSpeakEnd = instanceId -> {
// Update UI: hide speaking indicator
uiManager.hideSpeakingIndicator(instanceId);
};
// Receive AI subtitles (for in-game chat box display)
onAgentSubtitle = (instanceId, text) -> {
// Add AI speech content to game chat records
gameState.addChatLog(instanceId, text);
uiManager.updateChatBox(instanceId, text);
};
// Human player speech recognition results (forwarded via business backend)
onPlayerASRResult = (playerId, text) -> {
// Update game state
gameState.addChatLog(playerId, text);
// Notify all AI players to update context
if (aiAgentManager != null) {
aiAgentManager.broadcastToAIs(gameState);
}
};
}
// Player joins room and pulls AI voice streams
public void joinRoom(String roomId, String userId) {
ZegoUser user = new ZegoUser(userId);
ZegoRoomConfig config = new ZegoRoomConfig();
config.maxMemberCount = 16;
engine.loginRoom(roomId, user, config);
engine.startPublishingStream(userId + "_main");
// Pull all AI player voice streams
if (gameState != null && gameState.getAiPlayers() != null) {
for (AIPlayer aiPlayer : gameState.getAiPlayers()) {
engine.startPlayingStream(aiPlayer.getUserId() + "_main");
}
}
}
}Extension Possibilities
Expanding from Goose Goose Duck to Werewolf primarily involves character configuration differences:
- Richer Character Roles: Werewolf features more diverse roles (Seer, Witch, Hunter, etc.), requiring SystemPrompt additions for skill usage logic
- Multi-AI Interaction: Creating multiple agent instances enables AI players to naturally communicate with each other through RTC voice
- Multimodal Upgrades: Adding digital human avatars to AI through ZEGO Digital Human SDK creates visual AI teammates, further enhancing immersion
Conclusion: Transforming Social Gaming with AI
Through ZEGO AI Agent, Android developers can rapidly integrate AI teammates into Goose Goose Duck, Werewolf, and similar social voice games without modifying original game logic.
Core Benefits Include:
- Solving Player Matching Challenges: AI fills empty slots, ensuring games can start anytime
- 24/7 Practice Partners: Players can practice and improve skills regardless of time zones or player availability
- Lowering New Player Barriers: AI teammates provide patient guidance for newcomers learning game mechanics
- Enhanced Game Fun: AI players add unpredictability and variety to each gaming session
ZEGO provides complete SDK documentation, example code, and technical support. Developers can visit the official website for detailed integration guides, opening a new chapter in AI-enhanced social gaming experiences.
The future of social gaming isn't human versus AI—it's humans enhanced by AI, creating richer, more accessible, and more engaging experiences for players worldwide.