ECS: How to reference high level ECS fabric layer and components
Summary: Architectural guide Fabric details for quick reference.
Instructions
For more details, reference ECS Architectural-guide whitepaper four code version.
Fabric
The Fabric layer provides clustering, system health, software management, configuration management,
Upgrade capabilities and alerting. It is responsible for keeping services running and managing resources such
as disks, containers and the network. It tracks and reacts to environment changes such as failure detection
and provides alerts related to system health. The Fabric layer has the following components:
- Node agent runs on each node
- Manages host resources
- Install Services
- Containers
- Disks
- ECS Firewall
- ECS Network - Nile Area Network (NAN)
- Used to control Maintenance (Node Maintenance Mode, Disks, and so forth) and Upgrades for Fabric
- Fabric Agent relies on Hardware Manager / HAL (Hardware Abstraction Layer) component to get the disk health status
- Allows operator to manage cluster and nodes, using fabric CLI (Command Line Interface) - /opt/emc/caspian/fabric/cli/bin/fcli
- System, application health, failure detection and alerting
- Tracks and reacts to environment changes
Lifecycle Manager: Application life cycle management, which involves starting services, recovery,
notification and failure detection.
- Multiple life-cycle manager instances run on a subset of nodes
- Each life-cycle instance manages a subset of nodes
- If a life-cycle instance fails, another takes over
- Cluster primary orders cluster level events
Persistence Manager - Coordinates and synchronizes the ECS distributed environment.
Registry - Docker image store for ECS software
Event Library - Holds the set of events occurring on the system.
Hardware Manager - Provides status, event information and provisioning of the hardware layer to
higher-level services. These services have been integrated to support commodity hardware.
Docker
ECS runs on top of the operating system as a Java application and is encapsulated within several Docker
containers. The containers are isolated but share the underlying operating system resources and hardware.
Some parts of ECS software run on all nodes and some run on one or some nodes. The components running
within a Docker container include:
and the portal and provisioning services. Runs on every node in ECS.
Fabric-lifecycle - Contains the processes, information, and resources required for system-level
monitoring, configuration management and health management. An odd number of fabric-lifecycle
Instances will always be running. For example, there are three instances running on a four-node
system and five instances for an eight-node system.
Fabric-zookeeper - Centralized service for coordinating and synchronizing distributed processes,
configuration information, groups, and naming services. It is seen as the persistence manager
and runs on odd number of nodes, for instance, five in an eight-node system.
Fabric-registry - Registry of the ECS Docker images. Only one instance runs per ECS rack.
Additional Information
Node Agent
The node agent is a lightweight agent written in Java that runs natively on all ECS nodes. Its main duties
include managing and controlling host resources (Docker containers, disks, the firewall, the network) and
monitoring system processes. Examples of management include formatting and mounting disks, opening
required ports, ensuring all processes are running, and determining public and private network interfaces. It
has an event stream that provides ordered events to a life-cycle manager to indicate events occurring on the
system. A Fabric CLI is useful to diagnose issues and look at overall system state.
Lifecycle Manager
The life-cycle manager runs on a subset of three or five nodes and manages the life cycle of applications
running on nodes. Each life-cycle manager is responsible for tracking several nodes. Its main goal is to
manage the entire life cycle of the ECS application from boot to deployment, including failure detection,
recovery, notification, and migration. It looks at the node agent streams and drives the agent to handle the
situation. When a node is down, it responds to failures or inconsistencies in the state of the node by restoring
the system to a known good state. If a life-cycle manager instance is down, another takes its place.
Registry
The registry contains the ECS Docker images used during installation, upgrade, and node replacement. A
Docker container called fabric-registry runs on one node within the ECS rack and holds the repository of ECS
Docker images and information required for installations and upgrades. Although the registry is available on
one node at a time, all Docker images are locally cached on every node, so any may serve the registry.
Event Library
The event library is used within the Fabric layer to expose the life cycle and node agent event streams. Events
generated by the system are persisted onto shared memory and disk to provide historical information about the
state and health of the ECS system. These ordered event streams can be used to restore the system to a
specific state by replaying the ordered events stored. Some examples of events include node events such as
started, stopped, or degraded.
Hardware Manager
The hardware manager is integrated to the Fabric Agent to support industry standard hardware. Its main
purpose is to provide hardware-specific status and event information, and provisioning of the hardware layer
to higher-level services within ECS.
Infrastructure
ECS appliance nodes run SUSE Linux Enterprise Server 12 for the infrastructure. For ECS software
deployed on custom industry standard hardware, the operating system can also be RedHat Enterprise Linux
or CoreOS. Custom deployments are done using a formal request and validation process. Docker is installed on
the infrastructure to deploy the encapsulated ECS layers. ECS software is written in Java so the Java Virtual
Machine is installed as part of the infrastructure.