Component Architecture¶
This document provides detailed specifications for each component in the Federated Learning Platform, including their responsibilities, interfaces, and implementation details.
Frontend Components¶
Next.js Application¶
Technology: Next.js 15 + TypeScript + Tailwind CSS
graph TB
subgraph "Frontend Architecture"
APP[App Router<br/>Next.js 15]
subgraph "Pages"
HOME[Home Page]
LOGIN[Login/Register]
DASHBOARD[Dashboard]
TRAINING[Training Control]
ANSIBLE[Ansible Management]
PROJECTS[Project Management]
end
subgraph "Components"
LAYOUT[Layout Components]
UI[UI Components]
FORMS[Form Components]
CHARTS[Chart Components]
NOTIFICATIONS[Notification System]
end
subgraph "State Management"
AUTH_CTX[Auth Context]
PROJECT_CTX[Project Context]
NOTIFICATION_CTX[Notification Context]
WS_HOOKS[WebSocket Hooks]
end
subgraph "Services"
API_CLIENT[API Client]
WS_MANAGER[WebSocket Manager]
AUTH_SERVICE[Auth Service]
end
end
APP --> HOME
APP --> LOGIN
APP --> DASHBOARD
APP --> TRAINING
APP --> ANSIBLE
APP --> PROJECTS
HOME --> LAYOUT
DASHBOARD --> UI
TRAINING --> FORMS
DASHBOARD --> CHARTS
APP --> NOTIFICATIONS
LAYOUT --> AUTH_CTX
TRAINING --> PROJECT_CTX
NOTIFICATIONS --> NOTIFICATION_CTX
TRAINING --> WS_HOOKS
AUTH_CTX --> API_CLIENT
WS_HOOKS --> WS_MANAGER
AUTH_CTX --> AUTH_SERVICE
Key Responsibilities: - User interface rendering and interaction - Real-time updates via WebSocket connections - Authentication state management - Project and configuration management - Training job monitoring and control
API Integration: - REST API calls to FastAPI backend - WebSocket connections for real-time updates - File upload handling for ML project configurations
Component Specifications¶
Authentication Context¶
interface AuthContext {
user: User | null;
token: string | null;
login: (credentials: LoginCredentials) => Promise<void>;
logout: () => void;
register: (userData: RegisterData) => Promise<void>;
isAuthenticated: boolean;
}
WebSocket Manager¶
interface WebSocketManager {
connect: (url: string) => void;
disconnect: () => void;
subscribe: (event: string, callback: Function) => void;
unsubscribe: (event: string) => void;
send: (event: string, data: any) => void;
isConnected: boolean;
}
Backend Components¶
FastAPI Application¶
Technology: FastAPI + Python 3.10 + MongoDB
graph TB
subgraph "FastAPI Backend"
MAIN[main.py<br/>Application Entry]
subgraph "Routes"
AUTH_ROUTE[Auth Routes]
TRAINING_ROUTE[Training Routes]
NODES_ROUTE[Nodes Routes]
ANSIBLE_ROUTE[Ansible Routes]
PROJECT_ROUTE[Project Routes]
WS_ROUTE[WebSocket Routes]
end
subgraph "Services"
AUTH_SVC[Auth Service]
TRAINING_SVC[Training Service]
NODE_SVC[Node Service]
ANSIBLE_SVC[Ansible Service]
PROJECT_SVC[Project Service]
WS_SVC[WebSocket Service]
end
subgraph "Models"
USER_MODEL[User Model]
CONFIG_MODEL[Config Model]
TRAINING_MODEL[Training Model]
NODE_MODEL[Node Model]
end
subgraph "Database"
MONGO[MongoDB]
COLLECTIONS[Collections]
end
subgraph "External"
FLOWER[Flower Services]
ANSIBLE_EXT[Ansible Playbooks]
OTEL[OpenTelemetry]
end
end
MAIN --> AUTH_ROUTE
MAIN --> TRAINING_ROUTE
MAIN --> NODES_ROUTE
MAIN --> ANSIBLE_ROUTE
MAIN --> PROJECT_ROUTE
MAIN --> WS_ROUTE
AUTH_ROUTE --> AUTH_SVC
TRAINING_ROUTE --> TRAINING_SVC
NODES_ROUTE --> NODE_SVC
ANSIBLE_ROUTE --> ANSIBLE_SVC
PROJECT_ROUTE --> PROJECT_SVC
WS_ROUTE --> WS_SVC
AUTH_SVC --> USER_MODEL
ANSIBLE_SVC --> CONFIG_MODEL
TRAINING_SVC --> TRAINING_MODEL
NODE_SVC --> NODE_MODEL
USER_MODEL --> MONGO
CONFIG_MODEL --> MONGO
TRAINING_MODEL --> MONGO
NODE_MODEL --> MONGO
MONGO --> COLLECTIONS
TRAINING_SVC --> FLOWER
ANSIBLE_SVC --> ANSIBLE_EXT
AUTH_SVC --> OTEL
TRAINING_SVC --> OTEL
Key Responsibilities: - REST API endpoint management - Authentication and authorization - Database operations and data persistence - Federated learning orchestration - Infrastructure automation via Ansible - Real-time communication via WebSockets
Service Specifications¶
Authentication Service¶
class AuthService:
async def authenticate_user(self, credentials: LoginCredentials) -> User
async def create_user(self, user_data: UserCreate) -> User
async def generate_token(self, user: User) -> str
async def verify_token(self, token: str) -> User
async def refresh_token(self, refresh_token: str) -> str
Training Service¶
class TrainingService:
async def start_training(self, config: TrainingConfig) -> TrainingJob
async def stop_training(self, job_id: str) -> bool
async def get_training_status(self, job_id: str) -> TrainingStatus
async def get_training_metrics(self, job_id: str) -> TrainingMetrics
Ansible Service¶
class AnsibleService:
async def deploy_configuration(self, config: AnsibleConfig) -> DeploymentJob
async def execute_playbook(self, playbook: str, inventory: str) -> PlaybookResult
async def get_deployment_status(self, job_id: str) -> DeploymentStatus
async def manage_inventory(self, inventory_data: InventoryData) -> bool
Federated Learning Components¶
Flower Framework Integration¶
graph TB
subgraph "Flower Architecture"
SUPERLINK[Superlink<br/>Communication Hub]
subgraph "Server Side"
AGGREGATOR[Aggregator<br/>ServerApp]
SERVER_LOGIC[Server Logic<br/>server_app.py]
MODEL_AGG[Model Aggregation]
end
subgraph "Client Side"
SUPERNODE1[Supernode 1<br/>Partition 0]
SUPERNODE2[Supernode 2<br/>Partition 1]
CLIENT1[ClientApp 1<br/>client_app.py]
CLIENT2[ClientApp 2<br/>client_app.py]
CLIENTN[ClientApp N<br/>client_app.py]
end
subgraph "Monitoring"
MONITOR[Monitor Stats<br/>monitor_stats.py]
OTEL_FL[OpenTelemetry<br/>FL Metrics]
end
subgraph "Data"
DATASETS[Federated Datasets<br/>FashionMNIST]
PARTITIONS[Data Partitions]
end
end
SUPERLINK --> AGGREGATOR
AGGREGATOR --> SERVER_LOGIC
SERVER_LOGIC --> MODEL_AGG
SUPERLINK --> SUPERNODE1
SUPERLINK --> SUPERNODE2
SUPERNODE1 --> CLIENT1
SUPERNODE1 --> CLIENT2
SUPERNODE2 --> CLIENTN
CLIENT1 --> MONITOR
CLIENT2 --> MONITOR
CLIENTN --> MONITOR
MONITOR --> OTEL_FL
CLIENT1 --> DATASETS
CLIENT2 --> DATASETS
CLIENTN --> DATASETS
DATASETS --> PARTITIONS
Federated Learning Workflow¶
sequenceDiagram
participant Orchestrator
participant Superlink
participant Aggregator
participant Supernode
participant Client
participant Monitor
Orchestrator->>Superlink: Initialize FL Session
Superlink->>Aggregator: Start Server App
Aggregator->>Aggregator: Initialize Global Model
Superlink->>Supernode: Connect Supernodes
Supernode->>Client: Start Client Apps
Client->>Client: Load Local Data
loop Training Rounds
Aggregator->>Supernode: Send Global Model
Supernode->>Client: Distribute Model
Client->>Client: Local Training
Client->>Monitor: Log Metrics
Client->>Supernode: Send Updates
Supernode->>Aggregator: Aggregate Updates
Aggregator->>Aggregator: Update Global Model
Monitor->>Orchestrator: Report Progress
end
Aggregator->>Superlink: Training Complete
Superlink->>Orchestrator: Final Results
Component Specifications¶
Server App (Aggregator)¶
class ServerApp:
def __init__(self, config: ServerConfig):
self.model = initialize_model()
self.strategy = FedAvg()
def fit(self, parameters, config):
# Aggregate client updates
# Return updated global model
def evaluate(self, parameters, config):
# Evaluate global model
# Return metrics
Client App¶
class ClientApp:
def __init__(self, client_id: str, partition_id: int):
self.client_id = client_id
self.data = load_partition(partition_id)
self.model = initialize_model()
def fit(self, parameters, config):
# Train local model
# Return model updates
def evaluate(self, parameters, config):
# Evaluate local model
# Return local metrics
Database Components¶
MongoDB Schema Design¶
erDiagram
USERS {
ObjectId _id
string username
string email
string password_hash
datetime created_at
datetime updated_at
boolean is_active
array roles
}
PROJECTS {
ObjectId _id
string name
string description
ObjectId owner_id
datetime created_at
datetime updated_at
object configuration
string status
}
CONFIGS {
ObjectId _id
string name
string type
ObjectId project_id
ObjectId user_id
object config_data
string file_path
datetime created_at
datetime updated_at
}
TRAINING_JOBS {
ObjectId _id
string job_id
ObjectId project_id
ObjectId user_id
string status
datetime started_at
datetime completed_at
object metrics
object configuration
}
ANSIBLE_JOBS {
ObjectId _id
string job_id
string playbook
string inventory
string status
datetime started_at
datetime completed_at
object result
array logs
}
USERS ||--o{ PROJECTS : owns
USERS ||--o{ CONFIGS : creates
USERS ||--o{ TRAINING_JOBS : initiates
USERS ||--o{ ANSIBLE_JOBS : executes
PROJECTS ||--o{ CONFIGS : contains
PROJECTS ||--o{ TRAINING_JOBS : runs
Collection Specifications¶
Users Collection¶
{
"_id": ObjectId,
"username": "string",
"email": "string",
"password_hash": "string",
"created_at": ISODate,
"updated_at": ISODate,
"is_active": boolean,
"roles": ["user", "admin"],
"profile": {
"first_name": "string",
"last_name": "string",
"organization": "string"
}
}
Training Jobs Collection¶
{
"_id": ObjectId,
"job_id": "string",
"project_id": ObjectId,
"user_id": ObjectId,
"status": "pending|running|completed|failed",
"started_at": ISODate,
"completed_at": ISODate,
"configuration": {
"rounds": number,
"clients": number,
"model_type": "string"
},
"metrics": {
"accuracy": number,
"loss": number,
"round_metrics": []
},
"logs": []
}
Observability Components¶
OpenTelemetry Integration¶
graph TB
subgraph "Application Services"
FRONTEND[Frontend<br/>Browser Telemetry]
BACKEND[Backend<br/>FastAPI Telemetry]
FL_SERVER[FL Server<br/>Aggregator Telemetry]
FL_CLIENT[FL Clients<br/>Client Telemetry]
end
subgraph "OpenTelemetry Collector"
RECEIVER[OTLP Receiver<br/>gRPC + HTTP]
PROCESSOR[Processors<br/>Batch + Memory Limiter]
EXPORTER[Exporters<br/>Tempo + Grafana]
end
subgraph "Observability Backend"
TEMPO[Tempo<br/>Distributed Tracing]
GRAFANA[Grafana<br/>Visualization]
PROMETHEUS[Prometheus<br/>Metrics Storage]
end
FRONTEND --> RECEIVER
BACKEND --> RECEIVER
FL_SERVER --> RECEIVER
FL_CLIENT --> RECEIVER
RECEIVER --> PROCESSOR
PROCESSOR --> EXPORTER
EXPORTER --> TEMPO
EXPORTER --> PROMETHEUS
TEMPO --> GRAFANA
PROMETHEUS --> GRAFANA
Telemetry Specifications¶
Trace Instrumentation¶
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Initialize tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
# Instrument FastAPI endpoints
@app.post("/training/start")
async def start_training(config: TrainingConfig):
with tracer.start_as_current_span("start_training") as span:
span.set_attribute("training.config", str(config))
# Training logic
return result
Metrics Collection¶
from opentelemetry import metrics
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
# Initialize metrics
meter = metrics.get_meter(__name__)
# Create metrics
training_duration = meter.create_histogram(
"training_duration_seconds",
description="Duration of training rounds"
)
active_clients = meter.create_up_down_counter(
"active_clients_total",
description="Number of active FL clients"
)
Infrastructure Components¶
Docker Architecture¶
graph TB
subgraph "Docker Compose Services"
subgraph "Orchestrator Profile"
FRONTEND_C[frontend<br/>Next.js Container]
BACKEND_C[backend-fastapi<br/>FastAPI Container]
MONGO_C[mongodb<br/>Database Container]
OTEL_C[otel-collector<br/>Telemetry Container]
TEMPO_C[tempo<br/>Tracing Container]
GRAFANA_C[grafana<br/>Monitoring Container]
end
subgraph "Aggregator Profile"
SUPERLINK_C[superlink<br/>Flower Hub Container]
AGGREGATOR_C[aggregator<br/>FL Server Container]
end
subgraph "Client Profile"
SUPERNODE_C[supernode<br/>FL Node Container]
CLIENT_C[clientapp<br/>FL Client Container]
INFERENCE_C[inference<br/>Model Inference Container]
end
end
subgraph "Docker Networks"
FL_NETWORK[federated-learning-network<br/>Bridge Network]
end
subgraph "Docker Volumes"
MONGO_VOL[mongodb_data<br/>Persistent Volume]
GRAFANA_VOL[grafana_data<br/>Dashboard Storage]
TEMPO_VOL[tempo_data<br/>Trace Storage]
FL_VOL[fl_data<br/>Model Storage]
end
FRONTEND_C --> FL_NETWORK
BACKEND_C --> FL_NETWORK
MONGO_C --> FL_NETWORK
SUPERLINK_C --> FL_NETWORK
AGGREGATOR_C --> FL_NETWORK
CLIENT_C --> FL_NETWORK
MONGO_C --> MONGO_VOL
GRAFANA_C --> GRAFANA_VOL
TEMPO_C --> TEMPO_VOL
AGGREGATOR_C --> FL_VOL
Ansible Architecture¶
graph TB
subgraph "Ansible Control Node"
CONTROL[Ansible Controller<br/>Orchestrator Machine]
INVENTORY[Inventory<br/>devices.ini]
PLAYBOOKS[Playbooks<br/>deploy.yml, start.yml, stop.yml]
end
subgraph "Ansible Roles"
COMMON[Common Role<br/>Base Setup]
AGGREGATOR_ROLE[Aggregator Role<br/>FL Server Setup]
CLIENT_ROLE[Client Role<br/>FL Client Setup]
end
subgraph "Target Nodes"
AGG_NODE[Aggregator Node<br/>Remote Machine]
CLIENT_NODE1[Client Node 1<br/>Remote Machine]
CLIENT_NODE2[Client Node 2<br/>Remote Machine]
CLIENT_NODEN[Client Node N<br/>Remote Machine]
end
CONTROL --> INVENTORY
CONTROL --> PLAYBOOKS
PLAYBOOKS --> COMMON
PLAYBOOKS --> AGGREGATOR_ROLE
PLAYBOOKS --> CLIENT_ROLE
COMMON --> AGG_NODE
COMMON --> CLIENT_NODE1
COMMON --> CLIENT_NODE2
COMMON --> CLIENT_NODEN
AGGREGATOR_ROLE --> AGG_NODE
CLIENT_ROLE --> CLIENT_NODE1
CLIENT_ROLE --> CLIENT_NODE2
CLIENT_ROLE --> CLIENT_NODEN
Component Communication Patterns¶
Synchronous Communication¶
- REST APIs: HTTP/HTTPS for request-response patterns
- gRPC: High-performance communication for FL components
- Database Queries: Direct MongoDB connections
Asynchronous Communication¶
- WebSockets: Real-time bidirectional communication
- Event Streaming: Future enhancement with message queues
- Background Tasks: Async task processing with FastAPI
Security Patterns¶
- JWT Authentication: Stateless token-based auth
- TLS Encryption: All external communication encrypted
- Network Isolation: Docker network segmentation
- Secret Management: Environment-based configuration
Next: Continue to Data Flow to understand how data moves through the system.