CryptoVikings

Matt Osborne

Go Immutable
Exchange Working Group
Legal Working Group
Hi everyone. I appreciate the honesty about what you'll actually bring to the table (basically just doing a great job of running servers) and not filling it with other fluff.
 

Niels Klomp

BI Foundation
Core Committee
Governance Working Group
Thank you for your application.

Agreed with MattO. Refreshing to see :)

NK01)
Regarding guard nodes for the initial authority set; we do not deem the individual nodes as mission critical at this stage, and in our suggested network design these will be held to a somewhat lower standard than the Authority Set servers in the interim.
Could you give rough specs compared to auth nodes? (In principle I agree with your assessment)
 

Niels Klomp

BI Foundation
Core Committee
Governance Working Group
NK02)
II) We will set up an automated system for spinning up additional (pre-configured, updated to current block height once every 24 hours) AWS guard nodes in the event of a guard node stalling, becoming unresponsive or saturated, and;
Could you elaborate on how you would accomplish this? With emphasis on the pre-configured, current block part
 
Thank you for your application.

Agreed with MattO. Refreshing to see :)

NK01)

Could you give rough specs compared to auth nodes? (In principle I agree with your assessment)
Thank you for the question.

The self-hosted guard nodes we will use from the outset are 2x HP Proliant G8 and 2x G9 servers. They are Xeon 8-core enterprise servers with 16-48 GB RAM and have 4-8 SAS-disks in RAID-10 configuration.

The VPS’ servers will be specced in line with our Authority servers as these will also be utilized for brain-transplants (we know the terminology is technically «swaps»; but transplants sounds more fun), as well as backup servers for our Authority nodes.

Our opinion, solely from a technical standpoint, is that the guard nodes as of today could have been specced much lower and still performed their tasks well due to the low mainnet usage. We, however, also acknowledge the fact that even a single large network user could turn this observation obsolete in an instant, and we will closely monitor the network usage and our guard- and authority-servers to ensure that we stay ahead of the curve.

Through our work we are proponents of building our systems to fit the task at hand (with a sizable buffer) instead of over-speccing (and overspending) «just to be sure» without actually considering the situation and requirements and making an informed decision.

In the future we believe the guard nodes will be the servers that handle the highest amount of requests, and these will have to match (or surpass) the specifications of the Authority servers in the Factom Network.

There is also the very real possibility of denial of service attack as the ecosystem matures which should be taken into consideration when speccing future guard nodes, as these will be the first line of defense filtering incoming packets and requests.

We believe testing different attack vectors to see how the network reacts will be very important going forward, and we are most concerned about how an attack by «valid packets» will be handled by the network and the different servers. With the very low EC-price (1 FCT = 30k-ish EC according to «free factomizer»), an attack with valid entries could be performed and possibly «bypass» the guard nodes as these would not filter out the bad requests but pass all of these along to the authority server. This is something we would like to look into during the next few months by leveraging the community testnet. Sorry about veering a bit off topic by the way :)
 
NK02)

Could you elaborate on how you would accomplish this? With emphasis on the pre-configured, current block part
First of all, we want to be frank about not having prioritized too much time looking in to the specifics of how to accomplish this yet. The reason behind this is that the Factom network will go live without guard nodes, and when it becomes relevant we are confident in the ability of the 4 guard nodes we are starting out with, and we would prefer to implement and test those in our setup before introducing more complexity.

Our current high-level overview of the process looks like this:
(please keep in mind that this is a work in progress)

Initial state of the system:
- 4 self-hosted Guard nodes (running).
- 2 AWS (or similar) instances (mentioned in previous reply) in down state.
- 1/2 Authority servers (running).
- ACL of The Authority server(s) is limited to the running 4 guards, the 2 AWS/VPS in down state (and the computers we use to administer the servers).
- A server monitoring the health of the 4 running guard nodes (and the authority server(s).

Process:
- Every 24 hours (initial frequency placeholder) the AWS-nodes would be spun up automatically as to catch up to the current block height by the use of AWS Instance Scheduler. This ensures that the AWS-nodes are never more than 144 blocks behind the current height, as to be able to quickly sync up and join the guard network.
- If the monitoring system detects a problem with one of the guard nodes it would trigger a script that spin up both the 2 AWS/VPS guard nodes as well as trigger an alarm with the admins.

Notes:
- We would like to discuss the best way of monitoring health metrics with the broader community. Currently we use (a simple) uptimerobot to ping our testnet servers, as well as the monitoring system designed by Anton Ilzheev. The latter parses information via port 8090, but we believe these two options would not be good candidates for monitoring going forward (8090 will be closed in production, and a simple ping does not provide enough information).
- We apologize if our initial text came across as we had the specifics figured out already. Setting this up will be a process that takes time and effort, and hopefully best practices can be established via interaction and discussions with the Factom community.
Also please note that we will not utilize automation for Authority server brain-swaps due to the inherent risks associated with broadcasting the same ID (at the current stage).
 
Top