Fault Tolerant Staking

    Fault Tolerant Staking Right now, it is too hard to run a backup system for a validator safely. Odds are, you'll blow up and get your validators slashed. Don't try it. What typically happens is, sooner or later both your primary validator and your backup will end up online at the same time and double-sign something. This means your one validator key was running in both places, and signed two messages saying different things to different parts of the network for a given task. The network interprets this as an attack on its ability to come to consensus, and punishes you severely for it.

    This model of having two independent validator systems that share the same private key and are meant to operate at different times is called Active/Passive redundancy.

    The best advice so far in the staking space has been "just don't run a backup, outages are no big deal". This is not a sufficient solution for a market that will some day measure in the trillions of dollars. Not having a fault-tolerant staking system creates risk for all sizes of staking operators. If you're a solo node operator, you can't be expected to be on call 24/7/365, large operators on the other hand have to decide how many on-call engineers they need if their validator failover process is manual rather than automated. If one machine dies during a shift, no big deal, but if 100 nodes die; can one engineer triage and failover all of them safely and quickly without any coming back to life and causing a slashing?

    Collaborative Staking

    Obol teamare researching and building an infrastructure primitive called Distributed Validator Technology. DVT enables a new kind of validator, one that runs across multiple machines and clients simultaneously but behaves like a single validator to the network. This enables your validator to stay online even if a subset of the machines fail, this is called Active/Active fault tolerance. Think of it like engines on a plane, they all work together to fly the plane, but if one fails, the plane isn't doomed.

    Obol's mission is to enable and empower people to share the responsibility of running the network. If you are part of a distributed validator cluster, and your machine dies overnight, the other operators in your cluster will have your back. You'll cover for them some other time when they go on holidays for a week and their node falls out of sync. If we can share the responsibility of running nodes, we can open a new frontier of decentralisation.

    Solo validators can have backup. Staking firms can share risk and reward. DeFi protocols can diversify their staked ether exposure. Major institutions can hedge cloud provider risk. There's a benefit to everyone for building fault tolerant, distributed validator tech.

    The Staking Problem

    So how does high-availability validators help stake centralisation Oisín?

    Here's my take:

    Right now you take a massive bet on the person/team that is running your validator for you. If they do everything right, they make you a couple percent of interest a year, if they do everything wrong, they lose it all.

    The decentralised staking industry is extremely nascent, and we haven't figured out how best to build trust-minimised staking for the community. Projects like Lido pool risk across everyone, projects like RocketPool isolate risk into individual pools. One gates entry with humans and votes, the other gates entry with tokens and bonding.

    My belief is if we can remove the single point of failure in validator operation, we can place more trust in smaller node operators. I believe a DAO wouldn't trust a single member to stake it's treasury's ether, but a DAO might entrust a group of members to run validators together with shared accountability.

    You and your buddies might not have 32 ether alone, but together you could go splits on a validator as a group, and all share the reward.

    A custodian might not trust a single operator to stake their client's ether, but they would trust a group of operators collaborating together.

    If we can share risk, we can share stake. If we want to solve the staking problem, we need to make Ethereum staking safe and profitable for groups of humans together.

    Fault Tolerant Staking