Supported Engines
1. Baidu Intelligent Cloud
- Engine Code:
baidu - Official Website: https://cloud.baidu.com/
- Documentation: https://cloud.baidu.com/doc/SPEECH/index.html
2. Xunfei Open Platform
- Engine Code:
xunfei - Official Website: https://www.xfyun.cn/
- Documentation: https://www.xfyun.cn/doc/
3. Volcano Engine
- Engine Code:
volcano - Official Website: https://www.volcengine.com/
- Documentation: https://www.volcengine.com/docs/6561/79817
4. Alibaba Cloud
- Engine Code:
aliyun - Official Website: https://www.aliyun.com/
- Documentation: https://help.aliyun.com/product/30413.html
Required Configuration
Baidu Intelligent Cloud
ASR (Speech-to-Text)
The following environment variables need to be configured:SUPERUN_BAIDU_API_KEY- API KeySUPERUN_BAIDU_SECRET_KEY- Secret Key
TTS (Text-to-Speech)
The following environment variables need to be configured:SUPERUN_BAIDU_API_KEY- API KeySUPERUN_BAIDU_SECRET_KEY- Secret Key
0- Du Xiaoyu (Female)1- Du Xiaomei (Male)3- Du Xiaoyao (Female)4- Du Yaya (Male)
Xunfei Open Platform
ASR (Speech-to-Text)
The following environment variables need to be configured:SUPERUN_XUNFEI_APP_ID- App IDSUPERUN_XUNFEI_API_KEY- API KeySUPERUN_XUNFEI_API_SECRET- API Secret
TTS (Text-to-Speech)
The following environment variables need to be configured:SUPERUN_XUNFEI_APP_ID- App IDSUPERUN_XUNFEI_API_KEY- API KeySUPERUN_XUNFEI_API_SECRET- API Secret
xiaoyan- Xunfei Xiaoyan (Female)xiaoyu- Xunfei Xiaoyu (Male)xiaomei- Xunfei Xiaomei (Female)xiaoqi- Xunfei Xiaoqi (Male)
Volcano Engine
ASR (Speech-to-Text)
The following environment variables need to be configured:SUPERUN_VOLCANO_APP_ID- App IDSUPERUN_VOLCANO_ACCESS_TOKEN- Access TokenSUPERUN_VOLCANO_SECRET_KEY- Secret Key (for WebSocket authentication)SUPERUN_VOLCANO_ASR_CLUSTER- ASR Cluster (optional, default:volcengine_input_common)
TTS (Text-to-Speech)
The following environment variables need to be configured:SUPERUN_VOLCANO_APP_ID- App IDSUPERUN_VOLCANO_ACCESS_TOKEN- Access Token
BV700_V2_streaming- Fresh Female VoiceBV001_V2_streaming- General Male VoiceBV705_streaming- Sweet Female VoiceBV701_V2_streaming- Rich Male Voice
Alibaba Cloud
ASR (Speech-to-Text)
The following environment variables need to be configured:SUPERUN_ALIYUN_ACCESS_KEY_ID- Access Key IDSUPERUN_ALIYUN_ACCESS_KEY_SECRET- Access Key SecretSUPERUN_ALIYUN_APP_KEY- App Key
TTS (Text-to-Speech)
The following environment variables need to be configured:SUPERUN_ALIYUN_ACCESS_KEY_ID- Access Key IDSUPERUN_ALIYUN_ACCESS_KEY_SECRET- Access Key SecretSUPERUN_ALIYUN_APP_KEY- App Key
aixia- Aixia (Female)aiwei- Aiwei (Male)aida- Aida (Female)kenny- Kenny (Male)
Configuration Method
Supabase Edge Functions (Production Environment)
Configure environment variables in Supabase project:Code Implementation Architecture
Frontend Components
ASR Module (Speech-to-Text)
TTS Module (Text-to-Speech)
Engine Selector
Backend Implementation (Supabase Edge Functions)
ASR Conversion Service
File Location:supabase/functions/asr-convert/index.ts
Core Logic:
- Select corresponding engine implementation based on
engineparameter - Read corresponding API credentials from environment variables
- Call each engine’s ASR API
- Return standardized recognition results
TTS Conversion Service
File Location:supabase/functions/tts-convert/index.ts
Core Logic:
- Select corresponding engine implementation based on
engineparameter - Read corresponding API credentials from environment variables
- Map
voiceparameter to each engine’s voice code - Call each engine’s TTS API
- Return base64 encoded audio data
Baidu ASR Common Errors and Solutions
Error Code 3311: param rate invalid
This is the most common error, usually caused by the following:| Issue | Solution |
|---|---|
| Token placement error | Token must be in request body, not in URL parameters |
| cuid duplication | cuid only in request body, don’t repeat in URL |
| Using dev_pid | Don’t use dev_pid parameter, let Baidu auto-detect language |
| rate type error | Ensure rate is number type, not string |
| len calculation error | len must be actual byte length of WAV file |
Correct len Parameter Calculation
Calculate actual byte length from Base64 string:Frontend Audio Processing Points
1. Recording Format
Browser usually uses webm/opus:2. Must Resample to 16kHz (Baidu Requirement)
3. Convert to 16bit PCM
4. Add WAV Header (44 bytes)
Environment Variable Configuration
Configure in Supabase Edge Function Secrets:Debugging Checklist
When encountering 3311 error, check in order:- ✅ Is Token in request body (not URL parameter)
- ✅ Is rate number type (
typeof rate === 'number') - ✅ Is len equal to WAV file actual size
- ✅ Has dev_pid parameter been removed
- ✅ Is sample rate in WAV header 16000
- ✅ Is audio duration within 0.5-60 seconds range
Complete Request Example
Correct ✓:Technical Points
ASR (Speech-to-Text)
- Unified Audio Format: All engines use WAV format, 16kHz sample rate, mono channel
- Base64 Encoding: Audio data converted to base64 in frontend before passing to backend
- Protocol Differences:
- Baidu, Alibaba Cloud: REST API
- Xunfei, Volcano Engine: WebSocket protocol
- Standardized Results: Unified return format
{ text, confidence, duration }
TTS (Text-to-Speech)
- Voice Mapping: Frontend uses unified voice IDs (
female_1,male_1, etc.), backend maps to each engine’s actual voice codes - Parameter Conversion:
- Speed: Frontend range 0.5-2.0x, each engine converts to corresponding range
- Volume: Frontend range 0-100%, each engine converts to corresponding range
- Output Format: All engines uniformly return MP3 format base64 encoded audio
- Protocol Differences:
- Baidu, Alibaba Cloud, Volcano Engine: REST API
- Xunfei: WebSocket protocol (needs to receive multiple audio chunks)
Testing Recommendations
- API Credential Testing: Ensure all environment variables are correctly configured
- Audio Format Testing: Test different audio file formats (WAV, MP3, M4A)
- Duration Limit Testing: Pay special attention to Alibaba Cloud’s 60-second limit
- Error Handling Testing: Test network errors, API errors, and other exceptional cases
- Concurrency Testing: Test multiple users using different engines simultaneously
Notes
- Cost Control: Each engine has its own billing rules, monitor API call volume
- Rate Limits: Each engine has call frequency limits, avoid exceeding limits
- Audio Size: Recommend limiting uploaded audio file size (e.g., 10MB)
- Timeout Settings: Set reasonable timeout for WebSocket connections (e.g., 30 seconds)
- Error Logging: Record detailed error information for troubleshooting
superun Official Website
Browse the official website to learn more features and usage examples.

