What’s the best way to learn a new codebase? That’s a question I have asked myself many times leading up to this project. I’m used to working in already established areas, like Symfony, Node.js, vBulletin and XenForo systems, that have vast and readily-available documentation, whether it’s from the original developers or user-generated. This is the first time I went delving mostly blind.
This is roughly the process I laid out for myself:
- Get accustomed to working with the product
- Gather all available documentation
- Read code
- Attempt to explain my understanding and have that reviewed by the experts
This happened over the span of a couple of months as I worked on procedural solutions for the Factomize forum, including working on and creating solutions like Canonical Ledger’s Factomize, FuseNet, and Governance Factomizing. I may write a “Developer’s Introduction to Factom” guide in the future but this is an area that is well-documented with a plethora of APIs and Clients to help facilitate that.
- Top mention goes to the white paper, which is a good introduction to the concepts implemented in factomd at a high level. Unfortunately, it is a theoretical document only and says nothing about the implementation of factomd in Golang.
- Factom Inc factomd Fast High Level Code Walkthrough is a video where Clay covers some of the basic aspects of the factomd codebase. The two parts cover the startup aspects of the node, command line parsing, the simulated network, some networking, and other stuff. It’s not a very in-depth explanation of anything, it’s more of a broad orientation of where different parts are located in the folder structure.
- Factomd Control Panel feat. StevenM is a fantastic video that explains one of the hardest aspects of the Factom Blockchain: minutes, process lists, VMs, and acks. It’s targetted at Authority Node Operators and doesn’t go over any code, but understanding the process of how the virtual servers inside the code are formed is crucial.
- Consensus Code Tour is another video by Clay that goes over how the different nodes build consensus. This is somewhat of a counterpart to the video above, going over the actual implementation of the concepts illustrated in the other one (and I would recommend watching the Control Panel video first).
There are also some documents available on GitHub:
- https://github.com/FactomProject/FactomDocs Contains the whitepaper, some info on the upcoming identity system, and the highly interesting Data Structure Details for any endpoint developers
- https://github.com/FactomProject/factomd The main repo contains documentation on running simulations (README.md) and the /scripts/ folder has many things you can use to test.
I like to start at the beginning with the main() function and see where that leads me. It’s a very explorative process with no real goal in mind. Most of the code doesn’t make sense and there are mountains hiding behind every function call but that’s where personal experience comes in. I’ve lost count of how many different systems I’ve worked on but every time it gets a little easier identifying the underlying patterns.
As I mentioned in my previous blog post, this process is driven by my curiosity. There were several areas I was particularly interested in (consensus and the P2P network), which unfortunately turned out to also be the hardest to grasp on their own.
This step is definitely the hardest to put into words, as most of the heavy lifting happens in the subconscious, and it never ends. I could probably spend months just reading code, which is why I moved on fairly quickly to the next step, doing them both in tandem.
Everything I create is going to be published either in this blog or the core developer forum. I don’t want to repeat myself, so let’s start right away with the goods:
Factomd Overview: This is a very general overview of the main structures inside a Factomd node, meant for someone just starting to get familiar with the code as a lifeline.
Factomd Major Process Flow: This is the bootup-process of factomd in a pseudo-code diagram.
This diagram was created by searching the code for every inside of “go ” (
^\s*go\s) to find all the goroutines inside factomd. There are over 90 of them in total. From there, I looked at all of them to figure out what they do, where they’re situated, and how important they are, taking plenty of notes along the way.
With that information, I made the diagram, going back over my notes and comparing them to the code for a third overview and understanding. It was this step that taught me the most and afterward, I felt like I finally understood the framework of the node.
If you’re trying to learn from this, I would recommend taking these steps in reverse. Take the diagram and look at the source files, comparing the code with the image, and using it as a red thread to not lose the path.
After being satisfied with mapping out the goroutines, I turned to the channels. The processes don’t exist in a vacuum but constantly exchange messages. So I once again went back to the source code and searched for all the channels (
make(()?chan), of which there are about 60.
This time, I was less interested in the surrounding code but rather the producers and the consumers. Every channel has (with some exceptions) one consumer and one or more producers. It was just a matter of noting down the names of all the channels, then searching the source for all the instances.
The first area I covered, mostly because that was the one that confused me the most when reading the code, was P2P and Networking. Creating the diagram helped me immensely in understanding which channels did what and now I have a fairly compact way of looking up channel names whenever I come across them.
I’m going to continue in the same spirit with other channel systems (Consensus and Elections are the other big ones), mostly to make sense of the code for myself and hopefully help others in the process.