Explanation of Messages (IMsg)

Secured
#1
This one is overdue: detailing what exactly system messages ("IMsg") do. These are the missing links that tie the functionality of the channel diagrams I've posted together. Combining both of them should give you a pretty good overview of how the factomd node works internally.

All IMsgs, from now on referred to as "messages", are defined by the IMsg interface in common/interfaces/msg.go. A complete list of messages is defined in common/constants/constants.go. Those are the ones I'll cover. (This includes both regular messages and election messages, but election messages will come later)

The important functions that distinguish messages are:
  • Network Functions: Messages can be local, meant for a specific peer, or full broadcast. Local messages are never sent over the network even if given to the network controller
  • Various Hashes: These are used for identification for ACKs, keeping track of messages, as well as ChainID which is the identity chain
  • Validate(State) int: All messages are validated before anything else happens and all validation functions return -1 for messages that are not usable, 0 for messages that may be usable in the future, and 1 for valid messages.
  • LeaderExecute(State): This is the functionality that only leaders perform. It is sometimes the same as FollowerExecute, which just calls the other function.
  • FollowerExecute(State): This is the functionality that non-leaders perform
  • Process(height, State) bool: Messages that land in the ProcessList will have this called when it's their turn. For this it's very important to know that each processlist is processed in order ("Height" from 0 to n) and this function has a return value. If it returns "false" then the processing of that VM halts, resuming at same Height the next time around. This means that messages are called repeatedly until finished and this behavior is used as small state machines. I'm going to use the term "waiting" for this which indicates a "return false", not a sleep().

Notes:
  • When a message is "added to the processlist", this will broadcast the message and an ack to the network, as well as remove the message from Holding.
  • When a message is sent out it means it's added to the NetworkOutMsgQueue queue, which will eventually send the message to the specified nodes in the network. It can be directed (single peer), broadcast (up to 8 random peers), or full broadcast (everyone the node is connected to).

There is also a message type IElectionMsg that extends IMsg with the following functions:
  • ElectionValidate(Election): This works similar to the above Validate(), returning -1, 0, and 1, but in the context of an election.
  • ElectionProcess(State, Election): Messages inside the election process will have this function called during the election process, running their functionality

There are three broad categories of messages: Normal messages, ProcessList messages, and Election messages. A single message can be more than one at once and will be passed around between different systems using channels. Hence I'll be referencing those channels a lot, particularly Consensus and Elections.

Format:
Message Name
  • Origin: Which area generated it (vaguely)
  • Function:
    Verbal explanation of what the function does​


There are a few generic functions called for messages, which I don't want to repeat over and over: state.LeaderExecute(IMsg) and state.FollowerExecuteMsg(IMsg).
  • Generic state.LeaderExecute: If the message is not a Replay, it adds it to the ProcessList.
  • Generic state.FollowerExecuteMsg: This adds the message to Holding. If there already is an Ack for this message, it adds it to the ProcessList.
 
Secured
#2
messages.EOM
  • Origin: Execution of an electionMsg.SyncMsg
  • Validation:
    1 if it's Local or if the message is from a federated server and in the future
    -1 otherwise​
  • LeaderExecute: Calls state.LeaderExecuteEOM.
    If the message is NOT Local, it uses Follower Execute instead of this.
    Otherwise, if the state is not already syncing, initiate several state-wide variables, count the number of servers, and set every VM (including itself) as not synced.
    If the current VM for this state is already on this step or ahead, don't do anything else.
    Otherwise, create an Ack for EOM, make the EOM NOT Local, and send both EOM and Ack out.
    Run state.FollowerExecuteEOM.
  • FollowerExecute: Calls state.FollowerExecuteEOM.
    If it's a Local message, add it to Holding and stop.
    If the message is in the past, drop it.
    Add it to Holding.
    If an Ack exists already, add the EOM to the ProcessList. This will also send EOM and Ack out as broadcast.
  • Process(): Calls state.ProcessEOM().
    This waits for syncing and database signing to finish, as well as being the right minute before running. Then it starts a complex state statemachine:
    1. This only happens once for all EOMs inside a minute. Initiate several state-wide variables, count the number of servers, and set every VM (including itself) as not synced
    2. Create election message EomSigInternal[type=eom] for this height, minute, vm, and vmheight and send it to the Elections Queue.
      Consider this minute finished and mark the VM as synced and fault free
    3. This step waits until step 2 has finished for all VMs. Once that has happened, add the end-of-minute markers to entry blocks, factoid blocks, and ec blocks (Note: this is where the Timer() would come in if it takes too long)
    4. Waits to make sure any previous block is saved and then indicates this VM is done (and returns "true")
    5. This only happens for the very last VM after every VM has indicated they are done:
      • If this node is an audit server, it creates a message Heartbeat and sends it out
      • Tell the State to advance to the next height/minute. If this moves to the next height, it also adds the new height to channel dbheights and creates message AuthorityListInternal containing the current feds and audit servers, adding it to the Elections Queue
      • If it's the same block, make sure all the VMs are still the same and if this was the first minute, make sure the previous dbstate gets saved
      • If it's now a new block, create a new dbstate from the data and add it to State
      • Remove expired Commits
      • Remove old Acks



messages.Ack
  • Origin: Many
  • Validation:
    Note: in addition to validating, this will notify state of the highest ack seen.
    -1 means the message is too far in the future, in the past, or a duplicate
    0 means the message isn't signed by an authority but it's also in the future so that might change
    1 means the message is signed by a current authority node​
  • Leader Execute: Same as Follower
  • Follower Execute:
    Retrieve the appropriate VM in the ProcessList and check if the message has already been added.
    If the slot is open, save the ack to state.Acks and if the message is already in state.Holding, execute the message as a Follower
  • Process: Panic - Acks don't go into the processlist

messages.CommitChainMsg
  • Origin: API (+developer SimControl, Testing)
  • Validation:
    -1 if the EC amount isn't right or the signature is invalid
    0 if the EC Address doesn't have enough balance
    1 otherwise
  • Leader Execute: Call state.LeaderExecuteCommitChain
    Stop if there is a commit chain with an equal or higher EC amount.
    Call generic state.LeaderExecute.
    If there was chain reveal in Holding waiting on the chain commit, send it out.
  • Follower Execute: Call state.FollowerExecuteCommitChain
    Call generic state.FollowerExecuteMsg.
    If there is a corresponding Entry in holding, call the Entry's FollowerExecute and send it out.
    Send out the message.
  • Process:Calls state.ProcessCommitChain
    Wait until the address has enough EC (includes transactions happening inside the processlist).
    Once that happens, it will add the commit to state.Commits.
    If there is an entry in the holding queue, run Follower Execute on the entry, send it out, and add it to state.XReview
    Finally, add the chain commit to the entrycredit block


messages.CommitEntryMsg
  • Origin: API (+developer SimControl, Testing)
  • Validation:
    -1 if the EC amount isn't right or the signature is invalid
    0 if the EC Address doesn't have enough balance
    1 otherwise
  • Leader Execute: Calls state.LeaderExecuteCommitEntry
    Checks if it's the highest commit.
    Call generic state.LeaderExecute.
    If there was an entry reveal in Holding waiting on the entry commit, send it out
  • Follower Execute: Calls state.FollowerExecuteCommitEntry
    Call generic state.FollowerExecuteMsg.
    If there is a corresponding reveal in holding, call the reveal's FollowerExecute and send it out
    Send out the message
  • Process: This works exactly the same as CommitChainMsg.Proces, using ProcessCommitEntry instead.

messages.DirectoryBlockSignature (DBSig)
  • Origin: At the end of a block, Leaders send a DBSig instead of EOM. This will run as the first entry in the next processlist.
  • Validation:
    -1 if the height is outdated or any of the signatures are invalid
    0 if the VM doesn't exist
    1 if the DBSig is local or signed by a federated server
  • Leader Execute:Calls state.LeaderExecuteDBSig
    If the signature is not for the current height, it executes it as a follower.

    Ensures that Slot 0 in the VM is free, stop if it's not.
    Add DBSig to the processlist.
  • Followers: Call generic state.FollowerExecuteMsg
  • Process: Calls state.ProcessDBSig
    Wait until all EOM Syncs are over first, then start a small state machine:

    1. Only once for all DBSigs this Minute: Calculate the number of federated servers and initialize all VMs as not Synced
      • Wait until all previous processlists have completed
      • If it's the VM with index 0, update state.LeaderTimestamp with the timestamp from the message.
      • Wait until we have the previous DBState
      • If the message's merkle root does not match this node's previous block's merkle root, increase a "diff" counter and throw out the message from the processlist to retrieve one from other nodes
      • If the signature of the DBSig doesn't match this node's previous block, throw out the message from the processlist to retrieve it again
      • Add the Signature to the ProcessList and increase signature counter. Consider this VM synced.
      • Create election message EomSigInternal[type=sig] for this VM and send it to the elections queue
    2. Wait until all the signatures are in (Sidenote: this is where the Timer() would jump in if it goes too long)
      • If any processlists have a bad message, wait
      • If the majority of peers have a different merkle root, decrease the number of valid signatures, which goes back to step #3
      • Review the holding queue
      • Consider signature process done
      • Reset state machine variables
      • Consider this VM Signed
      • Return true


messages.FactoidTransaction
  • Origin: API
  • Validation:
    -1 if the message isn't well formed or signed
    0 if there isn't enough balance on the input to cover the transaction
    1 otherwise
  • Leader: Generic state.LeaderExecute
  • Follower: Generic state.FollowerExecuteMsg
  • Process: Attempt to add the transaction. If it doesn't work (ie insufficient funds), skip it.

messages.Heartbeat
  • Origin: Sent by audit servers while Processing EOMs
  • Validation:
    -1 if it's old or the signature is invalid
    1 otherwise​
  • Leader: Same as follower
  • Follower: Set the sender's audit server as "online" (note: timestamps of last contact are kept track of elsewhere)
  • Process: Do nothing.

messages.RevealEntryMsg
  • Origin: API, Testing
  • Validation:
    -1 if the reveal is >10Kb
    0 if there's no commit yet, the commit is underpaid, or the chain doesn't exist/match
    1 otherwise
  • Leader: calls state.LeaderExecuteRevealEntry
    Add an Ack for this message. Add the entry to the ProcessList. If that didn't work (for example processlist doesn't exist yet), execute the reveal as a follower.​
  • Follower: calls state.FollowerExecuteRevealEntry
    Add message to Holding. If an Ack exists, send out Reveal and add it to the ProcessList.
    If that worked, add the Reveal to the processlists's pending chain heads​
  • Process: calls state.ProcessRevealEntry
    If it's a chain reveal for a chain that doesn't exist, create and add a new EBlock for it.
    If a current EBlock for this chain already exists, just add it to that one.
    If a current EBlock doesn't exist, create create and add a new EBlock and link it to the previous one.

messages.MissingMsg
  • Origin: MMR
  • Validation:
    -1 if there's no origin of who asked for it
    1 otherwise​
  • Leader: same as Follower
  • Follower: calls state.FollowerExecuteMissingMsg
    Ignore request if node is too busy or we don't have the processlist.
    Otherwise, create a MissingMsgResponse for every msg+ack in the processlist we do have and send them out​
  • Process: panic -- doesn't go into processlist

messages.MissingMsgResponse
  • Origin: reply to MissingMsg
  • Validation:
    -1 if there is no Ack or Message
    1 otherwise​
  • Leader: Same as Follower
  • Follower: calls state.FollowerExecuteMMR
    Ignore message if node is too busy or recently booted up, it's a replay, it's too old, doesn't have an ack/message, no processlist
    Otherwise, execute both Message and Ack as follower
  • Process: Does nothing


messages.MissingData
  • Origin: Entry Syncing
  • Validation: always valid
  • Leader: same as Follower
  • Follower:
    Attempt to load the specified hash. If it exists, craft a DataResponse (type 0 = entry, type 1 = entryblock) message and send it out​
  • Process: Does nothing


messages.DataResponse
  • Origin: Response to MissingData
  • Validation:
    -1 if the type (0 = entry, 1 = entryblock) mismatches the data type, or the data doesn't match the hash
    1 otherwise​
  • Leader: same as follower
  • Follower: calls state.FollowerExecuteDataResponse
    For type = 0: (IEBEntry)
    Add the data to channel state.WriteEntry

    For type = 1: (IEntryBlock)
    Go through the list of all entry blocks the node is missing
    If there is a record for the data, remove it from the missing list and add it to the database
  • Process: panic -- does not go into processlist

messages.DBStateMsg
  • Origin: loading the database, creating a new database, response to DBStateMissing, torrent plugin (deprecated)
  • Validation:
    -1 if any data is missing, dbheight is too low, wrong network, fails a checkpoint (mainnet only)
    1 if it's from loading database, it's a genesis block, or passes the above​
  • Leader: same as follower
  • Follower: calls state.FollowerExecuteDBState
    Verifies the signatures and other things. If everything checks out, process the dbstate. It's a lengthy process that essentially boils down to taking all the values from the DBState and applying it to state.
    When you're loading from a database, it applies all the DBStates in order.​
  • Process: panic -- doesn't go into processlist


messages.DBStateMissing
  • Origin: Catchup() which is called periodically as part of the ValidatorLoop
  • Validation:
    -1 if asking start > end
    1 otherwise​
  • Leader: Same as follower
  • Follower:
    Sends out DBStateMsg to the asker for the requested blocks. Caps at 200 (50 if load is medium, 0 if load is high) or 1MiB whichever comes first.
  • Process: panic -- doesn't go into processlist


messages.AddServerMsg
  • Origin: SimControl, possibly outside tool
  • Validation:
    -1 if the message is not signed properly by the network skeleton key
    1 otherwise​
  • Leader: generic state.LeaderExecute
  • Follower: generic state.FollowerExecuteMsg
  • Process: calls state.ProcessAddServer
    This adds (if necessary) the chain+entries, creates an entry in the admin block based on type (audit or fed), and registers the identity in the system​

messages.ChangeServerKeyMsg
  • Origin: unused or outside of factomd
  • Validation:
    -1 if not from an authority node or not signed properly, 1 otherwise​
  • Leader: generic state.LeaderExecute
  • Follower: generic state.FollowerExecuteMsg
  • Process: calls state.ProcessChangeServerKey
    Verify signature again. Based on the type, update the adminblock with either a new BTC key, a signing key, or a matryoshka hash for this identity​


messages.RemoveServerMsg
  • Origin: SimControl, possibly outside tool
  • Validation:
    -1 if the message is not signed properly by the network skeleton key or the specified server is not part of the authority set
    1 otherwise​
  • Leader: generic state.LeaderExecute
  • Follower: generic state.FollowerExecuteMsg
  • Process: calls state.ProcessRemoveServer
    Doublechecks the message and, unless it's the last federated server in the network, adds an entry to the adminblock to remove the server

messages.Bounce
messages.BounceReply
  • Origin: These are developer messages for debugging the network, not used for the protocol


The following messages are deprecated:
FULL_SERVER_FAULT_MSG
EOM_TIMEOUT_MSG
INVALID_ACK_MSG
INVALID_DIRECTORY_BLOCK_MSG
REQUEST_BLOCK_MSG (implemented but with no functionality at all)
SIGNATURE_TIMEOUT_MSG
ENTRY_BLOCK_RESPONSE
 
Secured
#3
Election Messages
Election messages are added to the election queue while processing VMs. In addition to the functions the other messages have, election messages also have ElectionValidate and ElectionProcess. The implementation for the above functionality is usually not implemented as the messages don't leave the election system. For that reason, I'm going to leave them out unless they do something.

Additionally, elections have an "adapter" which controls the inner election state. This election adapter is somewhat independent of the rest of the factomd node and has its own messages and message types that are not IMsg. The factomd "FedVote" election messages act as carriers. They push internal election messages into the system and (if they are leaders) receive different messages in return, which are then broadcasted to the network (Special thanks to @Steven Masley for providing some insight into this)

For this post, I'm going to treat the election adapter as a black box as I don't fully understand the way it works yet myself.

Note: Internal messages are called "Internal" because they can't be sent over the network, as they typically lack unmarshalling support.

electionMsgs.AddLeaderInternal
  • Origin: A new federated server entry in the adminblock, genesis
  • Election Validation:
    Always valid​
  • Election Process:
    If the specified server is not already a federated server, add it to the election's list of federated servers​

electionMsgs.RemoveLeaderInternal
  • Origin: Adminblock entry to remove leader, or to change it to audit
  • Election Validation:
    Always valid​
  • Election Process:
    If the specified server is a federated server, remove it from the election's list of federated servers​



electionMsgs.AddAuditInternal
  • Origin: A new audit server entry in the adminblock, genesis
  • Election Validation:
    Always valid​
  • Election Process:
    If the specified server is not already an audit server, add it to the election's list of audit servers​

electionMsgs.RemoveAuditInternal
  • Origin: Adminblock entry to remove audit, or to change it to fed
  • Election Validation:
    Always valid​
  • Election Process:
    If the specified server is an audit server, remove it from the election's list of audit servers​


electionMsgs.TimeoutInternal
  • Origin: VM Processing timeout ("Fault Timer")
  • Election Validation:
    Always valid​
  • Election Process:
    If the system moved on, don't do anything​
    • If there is no election or the message is up to date, count the stored messages (EomSigInternal) for the election. If all federated servers sent either an EOM or a DBsig, stop the timeout.
      Otherwise, start an election for the first server with a missing EomSigInternal. This creates a StartElectionInternal message which is sent to state.InMsgQueue and also re-adds all the messages from the election wait queue to the elections queue.
    • If there is an ongoing election, increase the current round and start a new fault timer but with a timer of 30 seconds (if that triggers, it will send another TimeoutInternal)
      If this node is an audit server and it's in line to volunteer for this round, create a SyncMsg (local) and add it to state.InMsgQueue.


electionMsgs.EomSigInternal
Can be of type EOM or type DBSig, hence EomSig
  • Origin: message FedVoteLevelMsg ElectionProcess, or message EOM's Process, or message DBSig's Process
  • Election Validation:
    always valid​
  • Election Process:
    Ignore messages not from a federated server.
    If the message is from a future, stop any elections going on, reset the message store and set all servers as unsynced. Store this message in the elections (see TimeoutInternal), start another "Fault Timer" (2 minutes) and re-add all messages from election wait queue to the elections queue
    If the message is NOT from a future minute, just store this message in the elections
    In either case, mark the leader that spawned this message as synced
    If all leaders are synced, reset the rounds and feedback variables​



electionMsgs.AuthorityListInternal
  • Origin: When the State moves to a new Height
  • Election Validation:
    -1 if the height is lower than the elections' height
    1 otherwise​
  • Election Process:
    Update the elections with a fresh list of fed and auth servers​
  • Validation:
    Always valid​
  • Leader: Same as Follower
  • Follower:
    Enqueue the message to Elections Queue​


electionMsgs.FedVoteMsg
  • Origin: This is not a real message per se but rather a "generic" message that implements shared functionality among some of the following messages. those messages are considered this message's "super"
  • Election Validation:
    -1 if it's from the past, or the message tries to vote for servers that don't exist
    0 if it's from the future or a different election
    1 otherwise​
  • Election Process: does nothing
  • Validation:
    -1 if it's too old, signature is invalid
    0 if the message is for a future height
    1 otherwise​
  • Leader: same as follower
  • Follower:
    add to Election Queue
  • Process: panic -- doesn't go into processlist



electionMsgs.FedVoteVolunteerMsg
(full broadcast)
  • Origin: Created by SyncMsg
  • Election Validation: Delegate validation to the attached FedVoteMsg
  • Election Process:
    Excute the message in the election adapter. If that produces a result, send the result out​
  • Validation:
    -1 if the FedVoteMsg being carried is not valid (old, not signed by a fed server)
    0 if there are no elections going or the FedVoteMsg is for a higher processlist
    1 otherwise​
  • Leader: Same as follower
  • Follower:
    If an adapter doesn't exist, hold it. Otherwise, add it to the Election Queue​
  • Process: panic -- doesn't go into the processlist

electionMsgs.FedVoteLevelMsg
(full broadcast)
  • Origin: response from Election Adapter message execution
  • Election Validation: Delegate validation to the attached FedVoteMsg
  • Election Process:
    If the message is committed and the election adapter is not finished​
    Swap the Federated and Auth server in Elections
    Consider the adapter finished
    Set the message's valid cache as valid
    Add a special reminder for "Follower Execute" (local only)
    Create a new message electionMsgs.EomSigInternal and add it to state.InMsgQueue
    Stop the election​

    Let the election adapter execute this message
    Send out the result
    If the result is a FedVoteLevelMsg, it's committed, and the election adapter is not finished, perform the same steps as in the indent above​
  • Validation:
    Use cached valid value if possible
    Delegate validation to the attached FedVoteMsg​
  • Leader: same as follower
  • Follower:
    If there's no election adapter or processlist yet, add it to holding
    If this is a special message (see Election Process)​
    Swap the fed and auth servers inside the processlist
    Add an EOM and Ack to the processlist
    Run state.UpdateState once​

    Otherwise, add it to the Elections Queue
    If the state's status as Leader changed, update it​
  • Process: panic -- does not go into processlist

electionMsgs.FedVoteProposalMsg
(full broadcast)
  • Origin: response from Election Adapter message execution
  • Election Validation: Delegate validation to the attached FedVoteMsg
  • Election Process:
    Execute the message inside the Election Adapter
    Send out the resulting message​
  • Validation: delegate validation to the attached FedVoteMsg
  • Leader: same as follower
  • Follower:
    If there is no Election Adapter, put it into Holding
    Otherwise add it to the Elections Queue
  • Process: panic -- does not go into processlist


electionMsgs.SyncMsg
(local only)
  • Origin: Audit servers send these to state.inMsgQueue when an election round starts (see electionMsgs.TimeoutInternal)
  • Election Validation: always valid
  • Election Process: nothing
  • Validation: always valid
  • Leader: same as follower
  • Follower:
    Create a electionMsgs.FedVoteVolunteerMsg for the election round. Send it out (full broadcast) and immediately Follower Execute it.​
  • Process: panic -- doesn't go into processlist

electionMsgs.StartElectionInternal
  • Origin: Started by TimeoutInternal
  • Election Validation:
    -1 if the DBHeight is in the past
    1 otherwise​
  • Election Process:
    If there is no election going on, start a "Fault Timer" (60s) and stop
    Otherwise, create an election adapter
    If this node is neither a fed nor an auth server, make the adapter observe only
    Start a "Fault Timer" (60s)​
  • Validation: always valid
  • Leader: same as follower
  • Follower:
    Run state.Process() until the processlists can't advance anymore
    Add this message to the Elections Queue
  • Process: panic -- does not go into processlist