Skip to content

Component Architecture

This document provides detailed specifications for each component in the Federated Learning Platform, including their responsibilities, interfaces, and implementation details.

Frontend Components

Next.js Application

Technology: Next.js 15 + TypeScript + Tailwind CSS

graph TB
    subgraph "Frontend Architecture"
        APP[App Router<br/>Next.js 15]

        subgraph "Pages"
            HOME[Home Page]
            LOGIN[Login/Register]
            DASHBOARD[Dashboard]
            TRAINING[Training Control]
            ANSIBLE[Ansible Management]
            PROJECTS[Project Management]
        end

        subgraph "Components"
            LAYOUT[Layout Components]
            UI[UI Components]
            FORMS[Form Components]
            CHARTS[Chart Components]
            NOTIFICATIONS[Notification System]
        end

        subgraph "State Management"
            AUTH_CTX[Auth Context]
            PROJECT_CTX[Project Context]
            NOTIFICATION_CTX[Notification Context]
            WS_HOOKS[WebSocket Hooks]
        end

        subgraph "Services"
            API_CLIENT[API Client]
            WS_MANAGER[WebSocket Manager]
            AUTH_SERVICE[Auth Service]
        end
    end

    APP --> HOME
    APP --> LOGIN
    APP --> DASHBOARD
    APP --> TRAINING
    APP --> ANSIBLE
    APP --> PROJECTS

    HOME --> LAYOUT
    DASHBOARD --> UI
    TRAINING --> FORMS
    DASHBOARD --> CHARTS
    APP --> NOTIFICATIONS

    LAYOUT --> AUTH_CTX
    TRAINING --> PROJECT_CTX
    NOTIFICATIONS --> NOTIFICATION_CTX
    TRAINING --> WS_HOOKS

    AUTH_CTX --> API_CLIENT
    WS_HOOKS --> WS_MANAGER
    AUTH_CTX --> AUTH_SERVICE

Key Responsibilities: - User interface rendering and interaction - Real-time updates via WebSocket connections - Authentication state management - Project and configuration management - Training job monitoring and control

API Integration: - REST API calls to FastAPI backend - WebSocket connections for real-time updates - File upload handling for ML project configurations

Component Specifications

Authentication Context

interface AuthContext {
  user: User | null;
  token: string | null;
  login: (credentials: LoginCredentials) => Promise<void>;
  logout: () => void;
  register: (userData: RegisterData) => Promise<void>;
  isAuthenticated: boolean;
}

WebSocket Manager

interface WebSocketManager {
  connect: (url: string) => void;
  disconnect: () => void;
  subscribe: (event: string, callback: Function) => void;
  unsubscribe: (event: string) => void;
  send: (event: string, data: any) => void;
  isConnected: boolean;
}

Backend Components

FastAPI Application

Technology: FastAPI + Python 3.10 + MongoDB

graph TB
    subgraph "FastAPI Backend"
        MAIN[main.py<br/>Application Entry]

        subgraph "Routes"
            AUTH_ROUTE[Auth Routes]
            TRAINING_ROUTE[Training Routes]
            NODES_ROUTE[Nodes Routes]
            ANSIBLE_ROUTE[Ansible Routes]
            PROJECT_ROUTE[Project Routes]
            WS_ROUTE[WebSocket Routes]
        end

        subgraph "Services"
            AUTH_SVC[Auth Service]
            TRAINING_SVC[Training Service]
            NODE_SVC[Node Service]
            ANSIBLE_SVC[Ansible Service]
            PROJECT_SVC[Project Service]
            WS_SVC[WebSocket Service]
        end

        subgraph "Models"
            USER_MODEL[User Model]
            CONFIG_MODEL[Config Model]
            TRAINING_MODEL[Training Model]
            NODE_MODEL[Node Model]
        end

        subgraph "Database"
            MONGO[MongoDB]
            COLLECTIONS[Collections]
        end

        subgraph "External"
            FLOWER[Flower Services]
            ANSIBLE_EXT[Ansible Playbooks]
            OTEL[OpenTelemetry]
        end
    end

    MAIN --> AUTH_ROUTE
    MAIN --> TRAINING_ROUTE
    MAIN --> NODES_ROUTE
    MAIN --> ANSIBLE_ROUTE
    MAIN --> PROJECT_ROUTE
    MAIN --> WS_ROUTE

    AUTH_ROUTE --> AUTH_SVC
    TRAINING_ROUTE --> TRAINING_SVC
    NODES_ROUTE --> NODE_SVC
    ANSIBLE_ROUTE --> ANSIBLE_SVC
    PROJECT_ROUTE --> PROJECT_SVC
    WS_ROUTE --> WS_SVC

    AUTH_SVC --> USER_MODEL
    ANSIBLE_SVC --> CONFIG_MODEL
    TRAINING_SVC --> TRAINING_MODEL
    NODE_SVC --> NODE_MODEL

    USER_MODEL --> MONGO
    CONFIG_MODEL --> MONGO
    TRAINING_MODEL --> MONGO
    NODE_MODEL --> MONGO
    MONGO --> COLLECTIONS

    TRAINING_SVC --> FLOWER
    ANSIBLE_SVC --> ANSIBLE_EXT
    AUTH_SVC --> OTEL
    TRAINING_SVC --> OTEL

Key Responsibilities: - REST API endpoint management - Authentication and authorization - Database operations and data persistence - Federated learning orchestration - Infrastructure automation via Ansible - Real-time communication via WebSockets

Service Specifications

Authentication Service

class AuthService:
    async def authenticate_user(self, credentials: LoginCredentials) -> User
    async def create_user(self, user_data: UserCreate) -> User
    async def generate_token(self, user: User) -> str
    async def verify_token(self, token: str) -> User
    async def refresh_token(self, refresh_token: str) -> str

Training Service

class TrainingService:
    async def start_training(self, config: TrainingConfig) -> TrainingJob
    async def stop_training(self, job_id: str) -> bool
    async def get_training_status(self, job_id: str) -> TrainingStatus
    async def get_training_metrics(self, job_id: str) -> TrainingMetrics

Ansible Service

class AnsibleService:
    async def deploy_configuration(self, config: AnsibleConfig) -> DeploymentJob
    async def execute_playbook(self, playbook: str, inventory: str) -> PlaybookResult
    async def get_deployment_status(self, job_id: str) -> DeploymentStatus
    async def manage_inventory(self, inventory_data: InventoryData) -> bool

Federated Learning Components

Flower Framework Integration

graph TB
    subgraph "Flower Architecture"
        SUPERLINK[Superlink<br/>Communication Hub]

        subgraph "Server Side"
            AGGREGATOR[Aggregator<br/>ServerApp]
            SERVER_LOGIC[Server Logic<br/>server_app.py]
            MODEL_AGG[Model Aggregation]
        end

        subgraph "Client Side"
            SUPERNODE1[Supernode 1<br/>Partition 0]
            SUPERNODE2[Supernode 2<br/>Partition 1]

            CLIENT1[ClientApp 1<br/>client_app.py]
            CLIENT2[ClientApp 2<br/>client_app.py]
            CLIENTN[ClientApp N<br/>client_app.py]
        end

        subgraph "Monitoring"
            MONITOR[Monitor Stats<br/>monitor_stats.py]
            OTEL_FL[OpenTelemetry<br/>FL Metrics]
        end

        subgraph "Data"
            DATASETS[Federated Datasets<br/>FashionMNIST]
            PARTITIONS[Data Partitions]
        end
    end

    SUPERLINK --> AGGREGATOR
    AGGREGATOR --> SERVER_LOGIC
    SERVER_LOGIC --> MODEL_AGG

    SUPERLINK --> SUPERNODE1
    SUPERLINK --> SUPERNODE2

    SUPERNODE1 --> CLIENT1
    SUPERNODE1 --> CLIENT2
    SUPERNODE2 --> CLIENTN

    CLIENT1 --> MONITOR
    CLIENT2 --> MONITOR
    CLIENTN --> MONITOR
    MONITOR --> OTEL_FL

    CLIENT1 --> DATASETS
    CLIENT2 --> DATASETS
    CLIENTN --> DATASETS
    DATASETS --> PARTITIONS

Federated Learning Workflow

sequenceDiagram
    participant Orchestrator
    participant Superlink
    participant Aggregator
    participant Supernode
    participant Client
    participant Monitor

    Orchestrator->>Superlink: Initialize FL Session
    Superlink->>Aggregator: Start Server App
    Aggregator->>Aggregator: Initialize Global Model

    Superlink->>Supernode: Connect Supernodes
    Supernode->>Client: Start Client Apps
    Client->>Client: Load Local Data

    loop Training Rounds
        Aggregator->>Supernode: Send Global Model
        Supernode->>Client: Distribute Model
        Client->>Client: Local Training
        Client->>Monitor: Log Metrics
        Client->>Supernode: Send Updates
        Supernode->>Aggregator: Aggregate Updates
        Aggregator->>Aggregator: Update Global Model
        Monitor->>Orchestrator: Report Progress
    end

    Aggregator->>Superlink: Training Complete
    Superlink->>Orchestrator: Final Results

Component Specifications

Server App (Aggregator)

class ServerApp:
    def __init__(self, config: ServerConfig):
        self.model = initialize_model()
        self.strategy = FedAvg()

    def fit(self, parameters, config):
        # Aggregate client updates
        # Return updated global model

    def evaluate(self, parameters, config):
        # Evaluate global model
        # Return metrics

Client App

class ClientApp:
    def __init__(self, client_id: str, partition_id: int):
        self.client_id = client_id
        self.data = load_partition(partition_id)
        self.model = initialize_model()

    def fit(self, parameters, config):
        # Train local model
        # Return model updates

    def evaluate(self, parameters, config):
        # Evaluate local model
        # Return local metrics

Database Components

MongoDB Schema Design

erDiagram
    USERS {
        ObjectId _id
        string username
        string email
        string password_hash
        datetime created_at
        datetime updated_at
        boolean is_active
        array roles
    }

    PROJECTS {
        ObjectId _id
        string name
        string description
        ObjectId owner_id
        datetime created_at
        datetime updated_at
        object configuration
        string status
    }

    CONFIGS {
        ObjectId _id
        string name
        string type
        ObjectId project_id
        ObjectId user_id
        object config_data
        string file_path
        datetime created_at
        datetime updated_at
    }

    TRAINING_JOBS {
        ObjectId _id
        string job_id
        ObjectId project_id
        ObjectId user_id
        string status
        datetime started_at
        datetime completed_at
        object metrics
        object configuration
    }

    ANSIBLE_JOBS {
        ObjectId _id
        string job_id
        string playbook
        string inventory
        string status
        datetime started_at
        datetime completed_at
        object result
        array logs
    }

    USERS ||--o{ PROJECTS : owns
    USERS ||--o{ CONFIGS : creates
    USERS ||--o{ TRAINING_JOBS : initiates
    USERS ||--o{ ANSIBLE_JOBS : executes
    PROJECTS ||--o{ CONFIGS : contains
    PROJECTS ||--o{ TRAINING_JOBS : runs

Collection Specifications

Users Collection

{
  "_id": ObjectId,
  "username": "string",
  "email": "string",
  "password_hash": "string",
  "created_at": ISODate,
  "updated_at": ISODate,
  "is_active": boolean,
  "roles": ["user", "admin"],
  "profile": {
    "first_name": "string",
    "last_name": "string",
    "organization": "string"
  }
}

Training Jobs Collection

{
  "_id": ObjectId,
  "job_id": "string",
  "project_id": ObjectId,
  "user_id": ObjectId,
  "status": "pending|running|completed|failed",
  "started_at": ISODate,
  "completed_at": ISODate,
  "configuration": {
    "rounds": number,
    "clients": number,
    "model_type": "string"
  },
  "metrics": {
    "accuracy": number,
    "loss": number,
    "round_metrics": []
  },
  "logs": []
}

Observability Components

OpenTelemetry Integration

graph TB
    subgraph "Application Services"
        FRONTEND[Frontend<br/>Browser Telemetry]
        BACKEND[Backend<br/>FastAPI Telemetry]
        FL_SERVER[FL Server<br/>Aggregator Telemetry]
        FL_CLIENT[FL Clients<br/>Client Telemetry]
    end

    subgraph "OpenTelemetry Collector"
        RECEIVER[OTLP Receiver<br/>gRPC + HTTP]
        PROCESSOR[Processors<br/>Batch + Memory Limiter]
        EXPORTER[Exporters<br/>Tempo + Grafana]
    end

    subgraph "Observability Backend"
        TEMPO[Tempo<br/>Distributed Tracing]
        GRAFANA[Grafana<br/>Visualization]
        PROMETHEUS[Prometheus<br/>Metrics Storage]
    end

    FRONTEND --> RECEIVER
    BACKEND --> RECEIVER
    FL_SERVER --> RECEIVER
    FL_CLIENT --> RECEIVER

    RECEIVER --> PROCESSOR
    PROCESSOR --> EXPORTER

    EXPORTER --> TEMPO
    EXPORTER --> PROMETHEUS
    TEMPO --> GRAFANA
    PROMETHEUS --> GRAFANA

Telemetry Specifications

Trace Instrumentation

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Initialize tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

# Instrument FastAPI endpoints
@app.post("/training/start")
async def start_training(config: TrainingConfig):
    with tracer.start_as_current_span("start_training") as span:
        span.set_attribute("training.config", str(config))
        # Training logic
        return result

Metrics Collection

from opentelemetry import metrics
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

# Initialize metrics
meter = metrics.get_meter(__name__)

# Create metrics
training_duration = meter.create_histogram(
    "training_duration_seconds",
    description="Duration of training rounds"
)

active_clients = meter.create_up_down_counter(
    "active_clients_total",
    description="Number of active FL clients"
)

Infrastructure Components

Docker Architecture

graph TB
    subgraph "Docker Compose Services"
        subgraph "Orchestrator Profile"
            FRONTEND_C[frontend<br/>Next.js Container]
            BACKEND_C[backend-fastapi<br/>FastAPI Container]
            MONGO_C[mongodb<br/>Database Container]
            OTEL_C[otel-collector<br/>Telemetry Container]
            TEMPO_C[tempo<br/>Tracing Container]
            GRAFANA_C[grafana<br/>Monitoring Container]
        end

        subgraph "Aggregator Profile"
            SUPERLINK_C[superlink<br/>Flower Hub Container]
            AGGREGATOR_C[aggregator<br/>FL Server Container]
        end

        subgraph "Client Profile"
            SUPERNODE_C[supernode<br/>FL Node Container]
            CLIENT_C[clientapp<br/>FL Client Container]
            INFERENCE_C[inference<br/>Model Inference Container]
        end
    end

    subgraph "Docker Networks"
        FL_NETWORK[federated-learning-network<br/>Bridge Network]
    end

    subgraph "Docker Volumes"
        MONGO_VOL[mongodb_data<br/>Persistent Volume]
        GRAFANA_VOL[grafana_data<br/>Dashboard Storage]
        TEMPO_VOL[tempo_data<br/>Trace Storage]
        FL_VOL[fl_data<br/>Model Storage]
    end

    FRONTEND_C --> FL_NETWORK
    BACKEND_C --> FL_NETWORK
    MONGO_C --> FL_NETWORK
    SUPERLINK_C --> FL_NETWORK
    AGGREGATOR_C --> FL_NETWORK
    CLIENT_C --> FL_NETWORK

    MONGO_C --> MONGO_VOL
    GRAFANA_C --> GRAFANA_VOL
    TEMPO_C --> TEMPO_VOL
    AGGREGATOR_C --> FL_VOL

Ansible Architecture

graph TB
    subgraph "Ansible Control Node"
        CONTROL[Ansible Controller<br/>Orchestrator Machine]
        INVENTORY[Inventory<br/>devices.ini]
        PLAYBOOKS[Playbooks<br/>deploy.yml, start.yml, stop.yml]
    end

    subgraph "Ansible Roles"
        COMMON[Common Role<br/>Base Setup]
        AGGREGATOR_ROLE[Aggregator Role<br/>FL Server Setup]
        CLIENT_ROLE[Client Role<br/>FL Client Setup]
    end

    subgraph "Target Nodes"
        AGG_NODE[Aggregator Node<br/>Remote Machine]
        CLIENT_NODE1[Client Node 1<br/>Remote Machine]
        CLIENT_NODE2[Client Node 2<br/>Remote Machine]
        CLIENT_NODEN[Client Node N<br/>Remote Machine]
    end

    CONTROL --> INVENTORY
    CONTROL --> PLAYBOOKS
    PLAYBOOKS --> COMMON
    PLAYBOOKS --> AGGREGATOR_ROLE
    PLAYBOOKS --> CLIENT_ROLE

    COMMON --> AGG_NODE
    COMMON --> CLIENT_NODE1
    COMMON --> CLIENT_NODE2
    COMMON --> CLIENT_NODEN

    AGGREGATOR_ROLE --> AGG_NODE
    CLIENT_ROLE --> CLIENT_NODE1
    CLIENT_ROLE --> CLIENT_NODE2
    CLIENT_ROLE --> CLIENT_NODEN

Component Communication Patterns

Synchronous Communication

  • REST APIs: HTTP/HTTPS for request-response patterns
  • gRPC: High-performance communication for FL components
  • Database Queries: Direct MongoDB connections

Asynchronous Communication

  • WebSockets: Real-time bidirectional communication
  • Event Streaming: Future enhancement with message queues
  • Background Tasks: Async task processing with FastAPI

Security Patterns

  • JWT Authentication: Stateless token-based auth
  • TLS Encryption: All external communication encrypted
  • Network Isolation: Docker network segmentation
  • Secret Management: Environment-based configuration

Next: Continue to Data Flow to understand how data moves through the system.