Patroneos: Code Analysis, Benchmark results, Operation and Implementation Recommendations

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
en·@eoseoul·7 years ago
0.000 HBD
Patroneos: Code Analysis, Benchmark results, Operation and Implementation Recommendations
![27017302889_9edc086467_z.jpg](https://cdn.steemitimages.com/DQmRgQzpr6oHkkfSq1Tvn8Vzoubd9QjuzgTAgCCMLL2x4y8/27017302889_9edc086467_z.jpg)

Good morning. It is EOSeoul.

This article is about sharing our understanding of the Patroneos and our recommendations on 1) how Patroneos works, 2) Benchmark results, and 3) Operation and Implementation recommendations. We assume a reader has a sufficient knowledge and understanding of simple / advanced Patroneos settings.

All descriptions are based on the commits below;

* Update on June 6 2018 14:30 KST(UTC+9)
  * Analysis Standard Code Revision: `6501e4429f43f4de78444a227149773b914e221b`
  * Analysis Criteria Code Committed: Thu May 31 17:44:16 2018 -0400
  * Changes
    * [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26)  - **FIXED**
    * `validateMaxTransactions` added
    * updates on recommendation
    * If there were important issues on patroneos for block producers till June 13, we will comment on this post.
* Original post
  * Analysis Standard Code Revision: `48422fa05b47373ad68013f4d77d290e7fc31aae`
  * Analysis Criteria Code Committed: Thu May 31 17:44:16 2018 -0400

# Introduction

Block.one released one of its own software, Patroneos, with the release of EOSIO Dawn 4.2. The name is derived from the Harry Potter novel. Patronus is the name of the spell to defeat a creature called Dementor. Patroneos borrowed the name from this spell.

Many people around the EOS have expressed concerns over a variety attempts to attacks on the EOS mainnet. Communities and Block.one have improved EOSIO software by projecting different attack scenarios. Patroneos is one of the final products of these efforts. It filters out attacks by the basic form of Denial of Service and passes only normal transactions to the EOS RPC API Endpoint.

# How it works

## Code Configuration

The code consists of three main files. `main.go`,` filter.go`, and `fail2ban-relay.go`.

## `main.go`
`main.go` implements the following functions.
* Configuration File
  * Directive `Config` definition
  * Handlers that can update configuration files dynamically
  * Parsing configuration files
* `main` function
  * Parsing command line arguments
  * Call handler according to execution mode

The program is implemented to operate in Filter mode (`filter`) or Relay mode (` fail2ban-relay`). Let's take a look at each.

## `filter.go`
`filter.go` is a file that implements Filter mode.
* Operation
  * The HTTP request received by Patroneos is processed in 5 successive checks.
  * When it is verified to be a valid request, it throws a request to an API Endpoint that provides the EOS RPC API and sends the result to the client.
  * However when it is verified to be an invalid request, it immediately sends the HTTP 400 Bad Request and terminates the connection.
  * If a valid request is sent to the API Endpoint and the HTTP result code 200 is not received from the API Endpoint, a `TRANSACTION_FAILED` error is generated and the connection is terminated.
  * It sends the `/patroneos/fail2ban-relay` message as a HTTP request to Patroneus, running in Filter mode, leaves a log and terminates the connection.
* 5 validation logics
  * `validateJSON()`: It verifies the validity of the JSON received as the body of the HTTP POST. If it fails, `INVALID_JSON` error occurs and it is filtered.
  * `validateMaxTransactions()` : It verifies whether the number of transactions in a JSON array is less than the maximum value. When it is more than the maximun number, `TOO_MANY_TRANSACTIONS` error occurs and is filtered.
  * `validateTransactionSize()`: It verifies whether the number of signatures in the transaction is less than the maximum value. When it is more than the maximum number, `INVALID_NUMBER_SIGNATURES` error occurs and is filtered.
  * `validateMaxSignatures()`: It verifies whether the transaction is a blacklisted contract action. If it fails, `BLACKLISTED_CONTRACT` error is generated and filtered.
  * `validateContract()`: It verifies whether the size of the transaction is less than the maximum value. If it fails, `INVALID_TRANSACTION_SIZE` error occurs and it is filtered.
  * ~~Out of the logics mentioned above, `validateTransactionSize()`, `validateMaxSignatures()` and `validateContract()` assume that the JSON of the HTTP Request Body is an Object. However, `push_transactions` of the HTTP Chain API uses a JSON Array and it is treated as `PARSING_ERROR`. **We reported this on [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26). If the issue is resolved before this post becomes unchangeable, this port will be updated.**~~
* Compatibility of logging and Relay mode
  * For all valid HTTP requests
    * If you connected to the Patroneos running in Relay mode, the processed log is sent to HTTP server.
    * If there is no Patroneos operating in Relay mode, it is logged to the log file
  * For any invalid HTTP requests
    * If you connected to the Patroneos running in Relay mode, the processing log is sent to HTTP server
    * If there is no Patroneos operating in Relay mode, it is logged to the log file
    * Regardless the Patroneos setting in Relay mode, HTTP 400 Bad Request will be sent to to the HTTP client, 

## `fail2ban-relay.go`
`fail2ban-relay.go` is a file implemented in the Relay mode.
* It is very simple. It is to log files received from `/ patroneos / fail2ban-relay` in order for `fail2ban` to scan.

# Review

## Goroutine
Patroneos is written in Go, the programming language Google created in 2009. Go provides the Goroutine as an asynchronous mechanism. The routine is lightweight threads managed by the Go runtime. When you call a function with the keyword "go", the runtime executes the function concurrently in a time-division manner in the same memory address space.

Go program can be processed in parallel with a plurality of CPUs or cores. With the `runtime.GOMAXPROCS()` function, you can determine the number of logical cores it can use. Go has been changed to use all of the logical cores on machines since version 1.5. Therefore, the call functions are then processed in parallel on a multicore machine.

As of 1st June, 2018, it will typically install ‘Go 1.10.2’. The Ubuntu 18.04 LTS and macOS High Sierra 10.13.4 will install golang through apt and brew, respectively, and 1.10.2 will be installed. CentOS 7.5 will install the version 1.9.4.

**Consequently using Goroutine with the recently released version of Go, the process can run automatically in parallel using the multicore machine.**

## HTTP Request processing step: HTTP `ServMux` &` ListenAndServe`

Under the `Serve()` function in `http`, after the connection is accepted, and the new connection is handled by goroutine in `serve`. It can be seen that the part receiving the HTTP request is processed in parallel using multicore.

`ListenAndServe` uses the default `http.Server` without a timeout. At the time of analysis, Patroneos uses `Server` without timeout setting. In case of when there is no appropriate timeout at the point where the HTTP requests are sent to the client or the Patroneos, the latter will wait indefinitely when no data is received after making an HTTP connection.

**Therefore, we recommend the implementation of an appropriate architecture so that 1) Patroneos do not receive the client's request directly, and 2) Patroneos receive an HTTP request with a timeout.**

## Request Validation step

The 5 validations are not called via "go" keyword(goroutine) and are processed in series. In fact, the validation is not processed in parallel at all. The validation logic so far is rather simple that it does not need to be processed in parallel.

As `ServeMux.HandleFunc` is executed after receiving all the HTTP body from the client, the validation logic does not have a timeout issue.

However, when using a relay, timeout may occur, but it seems that there is no big problem as described below.

## Validation Result Relay and API Endpoint forwarding step: HTTP `Client`

In many documents, Go's `HTTP Client` has been confirmed it is safe to use concurrency with the goroutine.

HTTP `Client` has no timeout unless there is specific timeout settings. At the time of this analysis, Patroneos uses default HTTP `Client` to send Filter result to Relay Patroneos, and to replay Request to API Endpoint. In both cases, Timeout is not set.

If there is not an appropriate timeout in the API Endpoint that passes the verified request among the HTTP requests received by Patroneos, it will wait indefinitely until it receives a response.

**Therefore, it is recommended that an API endpoint should set the appropriate timeout.**

Let Patroneos in Relay mode as RP and Patronos in Filter mode as FP. **If FP does not use RP**, there are no issues. **Let's assume that you let the FP to use the RP.** If the RP is off or responds normally, there is no issue. As the logic of the RP is so simple, normal cases excluding the insufficient file descriptor or the delayed processing, it is very rare that the response of the RP is delayed. Moreover, when the ports used by the RP with the indefinite response TCP / HTTP server, that would be a big problem, but this case also is very rare. Therefore, when FP uses RP, the issue related to timeout is expected to be very rare.

# Benchmark

The focus was on identifying the processing capacity of Patroneos itself. So we configured simple HTTP request and API endpoint. This benchmark can be understood as a laboratory benchmark.

Tests use two JSON of different size, two HTTP request concurrency in 100 or 1000. For understand what will happen when there is processing latency in API processing, tests use two latency setting, 0 ms or 100 ms.

## Test Configuration

Below is a test configuration for the benchmark. In the production environment, settings must be changed on the different situation accordingly.

* System
  * OS: Ubuntu 18.04 LTS
  * CPU: Intel i7-6700 CPU @ 3.40GHz / 8 logical cores
  * Memory: 32GB
  * parameter
    * max open files: 500,000
    * net.ipv4.tcp_tw_reuse = 1
    * net.ipv4.ip_local_port_range = "10000 65000"
* HTTP request generation
  * hey (https://github.com/rakyll/hey)
  * Use only 1 core with `-cpus 1` option.
* Patroeos Configuration and setting
  * filter patroneos
```
{
   "listenPort": "8081",

   "nodeosProtocol": "http",
   "nodeosUrl": "127.0.0.1",
   "nodeosPort": "8000",

   "contractBlackList": {
       "currency": true
   },
   "maxSignatures": 10,
   "maxTransactionSize": 1000000,

   "logEndpoints": ["http://127.0.0.1:8080"],
   "filterEndpoints": [],

   "logFileLocation": "./fail2ban.log"
}
```
  * relay patroneos
```
{
   "listenPort": "8080",

   "nodeosProtocol": "http",
   "nodeosUrl": "127.0.0.1",
   "nodeosPort": "8000",

   "contractBlackList": {
       "currency": true
   },
   "maxSignatures": 10,
   "maxTransactionSize": 1000000,

   "logEndpoints": [],
   "filterEndpoints": ["http://127.0.0.1:8081"],

   "logFileLocation": "./fail2ban.log"
}
```
* Dummy API Endpoint using Go : use 1 core only with `runtime.GOMAXPROCS(1)`
```
package main

import (
   "fmt"
   "log"
   "net/http"
   "time"
   "runtime"
   "io/ioutil"
)

func main() {
   runtime.GOMAXPROCS(1)
   http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
       if len(r.FormValue("case-two")) > 0 {
           fmt.Println("case two")
       } else {
           time.Sleep(time.Millisecond * 100)
           b, err := ioutil.ReadAll(r.Body)
           if err != nil {
               log.Fatal(err)
           }
           fmt.Println(b)
           //fmt.Println("case one end")
       }
   })

   if err := http.ListenAndServe(":8000", nil); err != nil {
       log.Fatal(err)
   }
}
```

## Test methods and results

* JSON-A type for test request
```
{"account": "initb", "permission": "init", "authorization" active "}]," data ":" 000000000041934b000000008041934be803000000000000 "}
```
* JSON-B type for test request
```
{ "Id": "37df4598d37bb8fdbc440e31caae07906ac90fd3fd2cd060f2ca13e59e78781e", "signatures": [ "SIG_K1_K8ojKDxMnWy5Q3zAVQPwJANbEE2h9kStmPX4BorEGGQKCJXUYK62UiEYxGyQbaynraMX5WvzEFYaQqAf5Mdwu2yBf36HG7"], "compression": "none", "packed_context_free_data": "", "context_free_data": [], "packed_trx": "5f3b125b17c726f418ba000000000100a6823403ea3055000000572d3ccdcd010000000000ea305500000000a8ed32322e0000000000ea305590d5cc5865570da420a107000000000004454f53000000000d4a756e676c652046617563657400", " 0, "max_cpu_usage_ms": 0, "delay_sec": 0, "ref_block_num": 50967, "ref_block_prefix": 3122197542, "max_net_usage_words" "eosio.token", "name": "transfer", "authorization": [{"actor": "eosio", "permission": " "memo": "Jungle Faucet"}, "hex_data": "active"}, "data": {"from": "eosio" 0000000000ea305590d5cc5865570da420a1070000000004454f53000000000d4a756e676c6520466175636574 "}]," transaction_extensions ": [] }}
```
* Test # 1
  * HTTP request
    * request JSON-A type
    * 100 simultaneous high routine requests, 100,000 total requests
  * Dummy
    * Response Latency 0ms
  * Result
    * Patroneos processing result: Total time required 15.07 seconds, average time per issue 0.0149 seconds, 6635.07 TPS
    * Memory usage: filter mode 17.1MB, relay mode 11.8MB
    * CPU usage: Max 15% per thread
* Test # 2
  * HTTP request
    * request JSON-A type
    * 100 simultaneous high routine requests, 100,000 total requests
  * Dummy
    * Response Latency 100ms
  * Result
    * Patroneos processing results: Total travel time 103.5748 seconds, average travel time per issue 0.1033 seconds, 965.48 TPS
    * Memory usage: filter mode 17.1MB, relay mode 11.8MB
    * CPU usage: Max 9.7% per thread
* Test # 3
  * HTTP request
    * request JSON-A type
    * 1000 simultaneous high routine requests, 500,000 total requests
  * Dummy
    * Response Latency 0ms
  * Result
    * Patroneos processing results: total time 69.79 seconds, average time per issue 0.1060 seconds, 7163.58 TPS
    * Memory usage: filter mode 391.7MB, relay mode 23.7MB
    * CPU usage: Max 26.9% per thread
* Test # 4
  * HTTP request
    * request JSON-A type
    * 1000 simultaneous high routine requests, 500,000 total requests
  * Dummy
    * Response Latency 100ms
  * Result
    * Patroneos processing results: Total time taken 76.2931 seconds, average time per issue 0.1486 seconds, 6553.67 TPS
    * Memory usage: filter mode 143.4MB, relay mode 20.0MB
    * CPU usage: Max 25.0 per thread

* Test # 5
  * HTTP request
    * request JSON-B type
    * 100 simultaneous high routine requests, 100,000 total requests
  * Dummy
    * Response Latency 0ms
  * Result
    * Patroneos processing results: total travel time 29.53 seconds, average travel time per issue 0.0293 seconds, 3385.76 TPS
    * Memory usage: filter mode 17.MB, relay mode 11.7MB
    * CPU usage: Max 7.6% per thread
* Test # 6
  * HTTP request
    * request JSON-B type
    * 100 simultaneous high routine requests, 100,000 total requests
  * Dummy
    * Response Latency 100ms
  * Result
    * Patroneos processing result: Total time required 109.91 seconds, average time per issue 0.1093 seconds, 909.80 TPS
    * Memory usage: filter mode 15.8MB, relay mode 11.0MB
    * CPU usage: Max 5.3% per thread
* Test # 7
  * HTTP request
    * request JSON-B type
    * 1000 simultaneous high routine requests, 500,000 total requests
  * Dummy
    * Response Latency 0ms
  * Result
    * Patroneos processing result: total time required 148.5215 seconds, average time per issue 0.2930 seconds, 3366.51 TPS
    * Memory usage: filter mode 110.5MB, relay mode 14.3MB
    * CPU usage: Max 9.2% per thread
* Test # 8
  * HTTP request
    * request JSON-B type
    * 1000 simultaneous high routine requests, 500,000 total requests
  * Dummy
    * Response Latency 100ms
  * Result
    * Patroneos processing result: total travel time 153.37 seconds, average travel time per issue 0.3046 seconds, 3259.90 TPS
    * Memory usage: filter mode 103.0MB, relay mode 13.5MB
    * CPU usage: Max 8.6% per thread


## Result summary

* Memory usage
  * Filter mode: < 400MB
  * Relay mode: < 24MB
* CPU usage: < 25% per thread
* File Descriptor: no issues with environment above
* DNS resolving: use IP only in configurations, so minimized DNS resolving

# Recommendations and Conclusions

* Patroneos operating recommendations (for block producers)
  * In preparation of malfunctioning Patroneos, the implementation of immediate bypass layer  architecture when necessary and train sufficiently on the bypass on / off.
  * Monitoring of CPU utilization, file descriptor error, etc. to determine scale-out criteria for the Patroneos layer
  * Implementation of a proper architecture to let  Patroneos operating in Filter mode, not to receive client requests directly and to receive HTTP requests with timeout.
  * API Endpoint must set appropriate timeout.
  * Be careful not to make the wrong service on the TCP port of Patroneos operating in Relay mode.
  * Use IP instead of FQDN in the `nodeosUrl` setting, if possible, to minimize DNS resolving overhead
  * Implement an architecture that allows enough file descriptors per Patroneos process and tune necessary system parameters
  * When fail2ban-relay is used, it is advantageous to keep binary name in operation and monitoring. ex) patroneos-filter, patroneos-relay
  * Set TCP port reuse and sufficient port range
  * SSD is recommended to use to rotate and log fail2ban.log file
  * ~~At the time of this analysis, a problem was issued in handling the `push_transactions` of JSON arrays and reported this bug on [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26). **You need to check the response of this issue.** Before resolving, you should implement URI route bypass or reroute for HTTP requests which use `push_transactions`.~~
  * [Patroneos Issue #26] is resolved. It is safe with `push_transactions`.
  * If it needs to open a public API Endpoint, use `access-control-allow-origin` with `*` in `nodeos` config.
    * If doubt, see section same-origin policy & CORS of Reference below.


* Patroneos Implementation Recommendations (to Patroneos Committer & Block.one)
  * It is recommended to have an architecture to reflect the timeout properly in HTTP `Server` and` Client`configuration.
    * HTTP `Server`:` ReadTimeout`, `ReadHeaderTimeout`,` WriteTimeout`
    * Timeout of HTTP `Client`
    * Use HTTP `Transport` for HTTP `Client`
    * Tuning the parameters of `Transport`: `MaxIdleConnsPerHost`, `MaxIdleConns`, `IdleConnTimeout`, `ResponseHeaderTimeout`, `net.Dialer.Timeout`
    * For more information, see the following section of the Reference: Go `net/http` implementation recommendations
  * ~~Resolve [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26)~~ -- **FIXED**

* **Conclusion**
  * **If you follow the recommendations above at the time of writing, there is not going to have significant functional issues using it in the live production environment.**
   * ~~Note: Before [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26) is resolved, it is necessary either to change the URL route or let it bypass HTTP  `push_transactions` in the architecture.~~
    * **Add the timeout setting and make the necessary changes into Patroneos code in order to fine tune timeout of the server and the client accordingly.**


# Reference
* [patroneos github](https://github.com/EOSIO/patroneos)
* [Concurrency in Go](https://www.golang-book.com/books/intro/10)
* [fail2ban](https://www.fail2ban.org/wiki/index.php/Main_Page)
* Go `net/http` docs and source codes
  * https://golang.org/pkg/net/http/
  * https://github.com/golang/go/blob/master/src/net/http/server.go
  * https://github.com/golang/go/blob/master/src/net/http/client.go
* Go `net/http` implementation recommendations
  * [So you want to expose Go on the Internet](https://blog.cloudflare.com/exposing-go-on-the-internet/)
  * [The complete guide to Go net / http timeouts](https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/)
  * [Do not use Go's default HTTP client (in production)](https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779)
* [Patroneos Issue # 26](https://github.com/EOSIO/patroneos/issues/26)  -- **FIXED**
* same-origin policy & CORS
  * https://www.w3.org/Security/wiki/Same_Origin_Policy
  * https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS


# Any Feedback
Suggestions and questions are always welcome. Please do not hesitate to give a feedback to EOSeoul. Join the Telegram Group below to share the latest news from EOSeoul and technical discussions about EOS.

Thank you!

EOSeoul

Telegram (English) : http://t.me/eoseoul_en
Telegram (简体中文) : http://t.me/eoseoul_cn
Telegram (日本語) : http://t.me/eoseoul_jp
Telegram (General Talk, 한국어) : https://t.me/eoseoul
Telegram (Developer Talk, 한국어) : https://t.me/eoseoul_testnet
Steemit : https://steemit.com/@eoseoul
Github : https://github.com/eoseoul
Twitter : https://twitter.com/eoseoul_kor
Facebook : https://www.facebook.com/EOSeoul.kr
Wechat account: neoply
EOSeoul Documentations : https://github.com/eoseoul/docs
👍 penghuren, jerc33, steemkitchen, fsfblender, basi-traza, tsto, youngogmarqs, smartmediagroup, saekil, boatymcboatface, eosphere, eos.detroit, kimhyunhee, yepp4you,
properties (23)vote details (14)