Some thoughts on refactoring on the road to sharding

A lot of simplifications will come out of refactoring with new control flows. Paul, Clay and myself just got out of a meeting figuring out what kind of things we wanted to do with the refactoring hopefully over the next few months, if there is some community support for it. It would involve building a router that takes in p2p messages and then parcels them out to different threads that handle different aspects of the code. As part of this threading to have channels between more parts of the code, so that the actions are more message initiated rather than looking for things in queues.
One of the big problems according to Paul S is the mixing of follower and leader activities in the control flows, which has made for some awkward to debug code. The first step would be to delete the leader code (which is a small part of what the thing does, since all nodes need to do validation, which is harder.) The leader aspect sets the order of validated items. After getting the updated control flows to be message activated, adding back in the leader code would be done in a more controlled manner than it currently is in. Having things be more message based than timer based is one of the first steps on the way to sharding the system.
We at are currently working to get up to speed with the code and the protocol to be able to give a hand with the refactoring plans.
From what I understand one of the main goals is for the system to become more scalable and to be able to do that the code needs to be more modularized. I see you mention "parcels them out to different threads that handle different aspects of the code" but are your plans limited to just extra threads? If you need a high volume "super node" the system has to be pulled apart to be able to run as a swarm of separate containers in a cluster in the likes of Kubernetes. With something like a message router or queue system at the front door which only does "the talking" and divides tasks like validation and reaching consensus over a number of parallel workers on one or more physical machines (together acting like one supernode).
Do you have a ticket system, a place on the forum or other place where the known issues with the current code base are described and solutions can be brainstormed/collaborated on?
We're eager to get our hands dirty, either by already starting some refactor work or by experimenting and evaluating with frameworks which seem interesting to use in a code rewrite.
I personally have experience in writing software for containerized systems, actually deploying it to clusters on both Azure & AWS. I will be able run load tests on those.