AGI-SAC Production Deployment Guide¶
This guide covers production deployment of AGI-SAC simulations and federation infrastructure.
Table of Contents¶
- Prerequisites
- Installation Options
- Configuration
- Standalone Deployment
- Federation Deployment
- Docker Deployment
- Monitoring & Health Checks
- Performance Tuning
- Troubleshooting
- Security Considerations
Prerequisites¶
System Requirements¶
Minimum (Development/Testing): - CPU: 4 cores - RAM: 8 GB - Python: 3.9+ - OS: Linux, macOS, Windows
Recommended (Production): - CPU: 16+ cores - RAM: 64 GB - GPU: NVIDIA with CUDA support (optional, for large simulations) - Python: 3.10 or 3.11 - OS: Ubuntu 22.04 LTS or similar
Large-Scale (Research): - CPU: 32+ cores - RAM: 128+ GB - GPU: Multiple NVIDIA A100/H100 (for GPU acceleration) - Python: 3.11 - Network: Low-latency for federation mode
Software Dependencies¶
# Ubuntu/Debian
sudo apt update
sudo apt install -y python3.11 python3.11-venv python3-pip build-essential
# RHEL/CentOS
sudo yum install -y python311 python311-devel gcc
# macOS
brew install python@3.11
Installation Options¶
Option 1: Basic Installation¶
For simple simulations without federation or chaos testing:
# Create virtual environment
python3.11 -m venv agisa-env
source agisa-env/bin/activate # Windows: agisa-env\Scripts\activate
# Install AGI-SAC
pip install agisa-sac
# Verify installation
agisa-sac --help
Option 2: Full Installation¶
All features including federation, chaos testing, and GCP integration:
pip install agisa-sac[all]
# Verify all CLIs are available
agisa-sac --help
agisa-federation --help
agisa-chaos --help
Option 3: Feature-Specific Installation¶
Install only needed features:
# Production monitoring (Prometheus metrics + resource monitoring)
pip install agisa-sac[monitoring]
# Federation server only
pip install agisa-sac[federation]
# Chaos engineering only
pip install agisa-sac[chaos]
# Google Cloud Platform integration
pip install agisa-sac[gcp]
# Visualization tools
pip install agisa-sac[visualization]
# Development tools
pip install agisa-sac[dev]
Option 4: From Source¶
For development or latest features:
git clone https://github.com/topstolenname/agisa_sac.git
cd agisa_sac
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
Configuration¶
Environment Variables¶
Create a .env file or export environment variables:
# Logging
export LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
export AGISA_LOG_FILE=/var/log/agisa-sac/simulation.log
# GCP Integration (optional)
export GCP_PROJECT_ID=your-project-id
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
# Federation (optional)
export AGISA_FEDERATION_HOST=0.0.0.0
export AGISA_FEDERATION_PORT=8000
# Performance
export OMP_NUM_THREADS=16 # OpenMP threads
export CUDA_VISIBLE_DEVICES=0,1 # GPU devices
Configuration File Template¶
Create config/production.json:
{
"num_agents": 500,
"num_epochs": 200,
"random_seed": 42,
"use_gpu": true,
"agent_capacity": 100,
"use_semantic": true,
"tda_max_dimension": 1,
"tda_run_frequency": 10,
"community_check_frequency": 20,
"epoch_log_frequency": 10,
"personalities": []
}
Logging Configuration¶
Console Logging (Development):
agisa-sac run --preset default --log-level INFO
File Logging (Production):
agisa-sac run \
--preset large \
--log-level INFO \
--log-file /var/log/agisa-sac/simulation.log
JSON Structured Logging (Production):
agisa-sac run \
--preset large \
--json-logs \
--log-file /var/log/agisa-sac/simulation.json
Standalone Deployment¶
Basic Simulation¶
# Run with preset
agisa-sac run --preset medium
# Run with custom config
agisa-sac run --config config/production.json
Long-Running Simulation¶
Use nohup or screen for background execution:
# Using nohup
nohup agisa-sac run \
--preset large \
--log-file simulation.log \
--json-logs \
> output.log 2>&1 &
# Using screen
screen -S agisa-simulation
agisa-sac run --preset large --json-logs
# Detach: Ctrl+A, D
# Reattach: screen -r agisa-simulation
Systemd Service¶
Create /etc/systemd/system/agisa-sac.service:
[Unit]
Description=AGI-SAC Simulation Service
After=network.target
[Service]
Type=simple
User=agisa
Group=agisa
WorkingDirectory=/opt/agisa-sac
Environment="LOG_LEVEL=INFO"
ExecStart=/opt/agisa-sac/.venv/bin/agisa-sac run \
--config /etc/agisa-sac/production.json \
--json-logs \
--log-file /var/log/agisa-sac/simulation.json
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable agisa-sac
sudo systemctl start agisa-sac
sudo systemctl status agisa-sac
Federation Deployment¶
Single-Node Federation Server¶
agisa-federation server --host 0.0.0.0 --port 8000 --verbose
Production Federation Server¶
With Systemd¶
Create /etc/systemd/system/agisa-federation.service:
[Unit]
Description=AGI-SAC Federation Server
After=network.target
[Service]
Type=simple
User=agisa
Group=agisa
WorkingDirectory=/opt/agisa-sac
Environment="LOG_LEVEL=INFO"
ExecStart=/opt/agisa-sac/.venv/bin/agisa-federation server \
--host 0.0.0.0 \
--port 8000
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable agisa-federation
sudo systemctl start agisa-federation
Behind Nginx Reverse Proxy¶
Install Nginx:
sudo apt install nginx
Create /etc/nginx/sites-available/agisa-federation:
upstream agisa_federation {
server 127.0.0.1:8000;
}
server {
listen 80;
server_name federation.example.com;
# Redirect HTTP to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name federation.example.com;
ssl_certificate /etc/letsencrypt/live/federation.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/federation.example.com/privkey.pem;
location / {
proxy_pass http://agisa_federation;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
location /health {
proxy_pass http://agisa_federation;
access_log off;
}
}
Enable site:
sudo ln -s /etc/nginx/sites-available/agisa-federation /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
Docker Deployment¶
Build Image¶
docker build -t agisa-sac:1.0.0-alpha .
Run Simulation in Docker¶
docker run \
--name agisa-simulation \
--rm \
-v $(pwd)/config:/config \
-v $(pwd)/logs:/logs \
agisa-sac:1.0.0-alpha \
agisa-sac run --config /config/production.json --log-file /logs/simulation.log
Run Federation Server in Docker¶
docker run \
--name agisa-federation \
-d \
-p 8000:8000 \
--restart unless-stopped \
agisa-sac:1.0.0-alpha \
agisa-federation server --host 0.0.0.0 --port 8000
Docker Compose¶
Create docker-compose.yml:
version: '3.8'
services:
federation-server:
image: agisa-sac:1.0.0-alpha
container_name: agisa-federation
command: agisa-federation server --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
environment:
- LOG_LEVEL=INFO
volumes:
- ./logs:/logs
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
simulation:
image: agisa-sac:1.0.0-alpha
container_name: agisa-simulation
command: agisa-sac run --config /config/production.json --json-logs --log-file /logs/simulation.json
depends_on:
- federation-server
environment:
- LOG_LEVEL=INFO
volumes:
- ./config:/config
- ./logs:/logs
restart: on-failure
Start services:
docker-compose up -d
docker-compose logs -f
Monitoring & Health Checks¶
Health Check Endpoints¶
# Check federation server health
curl http://localhost:8000/health
# Expected response
{
"status": "healthy",
"service": "agisa-sac-federation",
"timestamp": "2025-11-08T12:34:56.789Z",
"registered_nodes": 42,
"uptime_seconds": 3600.5,
"identity_initialized": true,
"version": "1.0.0-alpha"
}
Log Monitoring¶
Tail Logs:
tail -f /var/log/agisa-sac/simulation.log
Parse JSON Logs:
cat /var/log/agisa-sac/simulation.json | jq '.level' | sort | uniq -c
Filter Errors:
cat /var/log/agisa-sac/simulation.json | jq 'select(.level == "ERROR")'
Process Monitoring¶
Check Running Processes:
ps aux | grep agisa
Monitor Resource Usage:
top -p $(pgrep -f agisa-sac)
htop -p $(pgrep -f agisa-sac)
Watch System Resources:
watch -n 1 'nvidia-smi' # GPU monitoring
watch -n 1 'free -h' # Memory monitoring
Prometheus Metrics¶
AGI-SAC includes built-in Prometheus metrics for production monitoring.
Enable Monitoring:
# Install monitoring dependencies (prometheus-client + psutil)
pip install agisa-sac[monitoring]
Optional Dependencies
Metrics collection gracefully degrades when dependencies are unavailable:
- Without
prometheus-client: All metrics collection is disabled - Without
psutil: System resource metrics are unavailable, other metrics continue to work
The simulation will run normally in all cases.
Available Metrics:
| Metric Name | Type | Description |
|---|---|---|
| Simulation Metrics | ||
agisa_simulation_duration_seconds |
Histogram | Time spent per simulation epoch |
agisa_simulation_epochs_total |
Counter | Total epochs completed |
agisa_simulation_errors_total |
Counter | Total simulation errors by type |
| Agent Metrics | ||
agisa_agent_count |
Gauge | Current number of active agents |
agisa_agent_interactions_total |
Counter | Total agent interactions |
agisa_agent_state_changes_total |
Counter | Agent state changes by type |
| Memory Metrics | ||
agisa_memory_operations_total |
Counter | Memory operations by type (read/write/delete) |
agisa_memory_size_bytes |
Gauge | Memory usage in bytes by type |
agisa_memory_items_count |
Gauge | Number of items in memory stores by type |
| TDA Metrics | ||
agisa_tda_persistence_features |
Gauge | Topological features by dimension (β₀, β₁, β₂) |
agisa_tda_computation_duration_seconds |
Histogram | TDA computation time |
| Social Graph Metrics | ||
agisa_social_graph_edges |
Gauge | Number of edges in social graph |
agisa_social_graph_density |
Gauge | Density of the social graph (0-1) |
agisa_social_clustering_coefficient |
Gauge | Average clustering coefficient |
| System Resource Metrics | ||
agisa_system_cpu_percent |
Gauge | CPU usage percentage |
agisa_system_memory_bytes |
Gauge | Memory usage in bytes (rss/vms) |
agisa_system_memory_percent |
Gauge | Memory usage percentage |
| Federation Metrics | ||
agisa_federation_nodes_count |
Gauge | Number of federation nodes |
agisa_federation_messages_total |
Counter | Federation messages by type |
agisa_federation_sync_duration_seconds |
Histogram | Federation synchronization time |
| Consciousness Metrics | ||
agisa_consciousness_phi |
Gauge | Integrated information (Φ) |
agisa_consciousness_recursive_depth |
Gauge | Meta-cognitive recursion depth |
| Ethical Metrics | ||
agisa_ethics_coexistence_score |
Gauge | Harmony/coexistence score (0-1) |
agisa_ethics_violations_total |
Counter | Ethical violations by type |
Expose Metrics Endpoint:
from agisa_sac.utils.metrics import get_metrics
from fastapi import FastAPI, Response
app = FastAPI()
@app.get("/metrics")
async def metrics():
metrics_data = get_metrics().get_metrics()
content_type = get_metrics().get_content_type()
return Response(content=metrics_data, media_type=content_type)
Prometheus Configuration:
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'agisa-sac'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
Grafana Dashboard:
Query examples for visualization:
# Average epoch duration
rate(agisa_simulation_duration_seconds_sum[5m]) / rate(agisa_simulation_duration_seconds_count[5m])
# Agent interactions per second
rate(agisa_agent_interactions_total[1m])
# TDA features over time
agisa_tda_persistence_features
# System resource usage
agisa_system_cpu_percent
agisa_system_memory_percent
Performance Tuning¶
CPU Optimization¶
# Set CPU affinity
taskset -c 0-15 agisa-sac run --preset large
# Increase process priority
nice -n -10 agisa-sac run --preset large
Memory Optimization¶
# Increase memory limits (systemd)
[Service]
MemoryMax=96G
MemoryHigh=80G
GPU Optimization¶
# Specify GPU devices
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Run with GPU acceleration
agisa-sac run --preset large --gpu
# Monitor GPU usage
nvidia-smi -l 1
Disk I/O Optimization¶
# Use SSD for checkpoints
agisa-sac run --config production.json --output-dir /mnt/nvme/agisa-sac
Troubleshooting¶
Common Issues¶
Import Errors¶
Problem: ModuleNotFoundError: No module named 'agisa_sac'
Solution:
pip install --force-reinstall agisa-sac[all]
Memory Errors¶
Problem: MemoryError during large simulations
Solution:
# Reduce agent count or epochs
agisa-sac run --preset medium --agents 100
# Enable memory-efficient mode (if available)
# Or increase system swap
sudo fallocate -l 32G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
GPU Not Detected¶
Problem: use_gpu=True but simulation runs on CPU
Solution:
# Verify CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
# Install PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118
Federation Server Connection Refused¶
Problem: Cannot connect to federation server
Solution:
# Check if server is running
systemctl status agisa-federation
# Check port is listening
sudo netstat -tulpn | grep 8000
# Check firewall
sudo ufw status
sudo ufw allow 8000/tcp
Debug Mode¶
Enable verbose logging:
agisa-sac run --preset default --log-level DEBUG -v
Log Analysis¶
# Count log levels
grep -oP '"level":\s*"\K[^"]+' simulation.json | sort | uniq -c
# Find errors
grep ERROR simulation.log | tail -20
# Track simulation progress
grep "Epoch.*completed" simulation.log | tail -10
Security Considerations¶
Network Security¶
- Use HTTPS/TLS for federation servers
- Implement rate limiting on endpoints
- Use firewall rules to restrict access
- Enable authentication for production endpoints
Authentication¶
Implement token-based auth for federation:
# In your federation client
headers = {
"Authorization": f"Bearer {token}",
"X-API-Key": api_key
}
Data Protection¶
- Encrypt sensitive simulation data at rest
- Use secure channels for inter-node communication
- Implement audit logging for all API calls
- Regular backup of simulation checkpoints
Resource Limits¶
# In systemd service
[Service]
CPUQuota=80%
MemoryMax=64G
TasksMax=1000
Backup & Recovery¶
State Checkpointing¶
# In simulation code
orchestrator.save_state(
"checkpoints/simulation_epoch_100.pkl",
include_memory_embeddings=True
)
Automated Backups¶
#!/bin/bash
# backup-agisa.sh
BACKUP_DIR="/backups/agisa-sac"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Backup configuration
cp -r /etc/agisa-sac "$BACKUP_DIR/config_$TIMESTAMP"
# Backup logs
cp -r /var/log/agisa-sac "$BACKUP_DIR/logs_$TIMESTAMP"
# Backup simulation state
cp -r /opt/agisa-sac/checkpoints "$BACKUP_DIR/checkpoints_$TIMESTAMP"
# Cleanup old backups (keep last 7 days)
find "$BACKUP_DIR" -type d -mtime +7 -exec rm -rf {} +
Add to crontab:
0 */6 * * * /opt/agisa-sac/scripts/backup-agisa.sh
Performance Benchmarks¶
Expected Performance¶
| Configuration | Agents | Epochs | Time (CPU) | Time (GPU) | Memory |
|---|---|---|---|---|---|
| quick_test | 10 | 20 | ~2 min | ~1 min | 2 GB |
| default | 30 | 50 | ~15 min | ~8 min | 8 GB |
| medium | 100 | 100 | ~2 hours | ~45 min | 16 GB |
| large | 500 | 200 | ~24 hours | ~6 hours | 64 GB |
Benchmarks on Intel Xeon 16-core, 64GB RAM, NVIDIA A100
Support¶
For deployment issues:
- Check troubleshooting section
- Review logs with
--log-level DEBUG - Search existing issues: https://github.com/topstolenname/agisa_sac/issues
- Create new issue with logs and configuration
Next Steps¶
After deployment:
- Set up monitoring dashboards (Grafana)
- Configure alerting (Prometheus)
- Implement chaos testing
- Scale federation to multiple nodes
- Optimize for your specific use case
See TODO.md for roadmap and upcoming features.