Bug on Hivemind’s following data

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@emrebeyler·
0.000 HBD
Bug on Hivemind’s following data
#### Project Information

* Repository: https://github.com/steemit/hivemind
* Project Name: Hivemind
* Publisher: Steemit inc.
* Related issue at Github: https://github.com/steemit/hivemind/issues/191

#### Problem

Hivemind backed `api.steemit.com` reports invalid/missing following data for some of the accounts. (In comparison to a full node)

#### How to reproduce


1. Query the user `curbot`'s following list. (`condenser_api.get_following`)

```
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://api.steemit.com
```

2. Do the same query on a full node: (https://rpc.usesteem.com)

```
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://rpc.usesteem.com
```

You can see the response is different and incomplete in `api.steemit.com.`.


#### A Python script the detect discrepancies

I believe this is not an exceptional case. I have seen more discrepancies like that while trying to test/benchmark the [tower's new endpoints](https://steemit.com/utopian-io/@emrebeyler/new-version-on-tower-hivemind-rest).

This Python script detects discrepancies on follower lists.

```
from steem import Steem
from steem.account import Account


def get_diff(account):

    followers_on_hivemind = Account(
        account,
        steemd_instance=Steem(
            nodes=["https://api.steemit.com"])
    ).get_followers()

    followers_on_full_node = Account(
        account,
        steemd_instance= Steem(
            nodes=["https://rpc.usesteem.com"])
    ).get_followers()

    print(
        "Accounts listed on api.steemit.com but not in the rpc.usesteem.com")
    print(set(followers_on_hivemind).difference(set(followers_on_full_node)))
    print("*" * 42)
    print(
        "Accounts listed on rpc.usesteem.com but not in the api.steemit.com")
    print(set(followers_on_full_node).difference(set(followers_on_hivemind)))

```
***

The result for `@emrebeyler`'s followers:

```
Accounts listed on api.steemit.com but not in the rpc.usesteem.com
set()
******************************************
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'hariyati.amin', 'curbot', 'kenzyobiadi', 'erhanbute'}
```
***

After some digging, I  have found a rare case on a differently formatted custom json.

For example, I have checked the account history of `curbot` that when he exactly followed my account, and found this transaction:

[Transaction ID: aaccccb73b6dfcb4bbf95f6d2dcb76e1c87137e9](https://steemd.com/b/25992870#aaccccb73b6dfcb4bbf95f6d2dcb76e1c87137e9)

Looks like `curbot` was  bundling follow operations into one transaction. And steemd picked up these and registered as valid follow actions.

However, hive's indexer ignores the `custom_json` op if loaded json's length is greater than 2. 

https://github.com/steemit/hivemind/blob/f7a467921678d928a0d94928c811442b8ab80bce/hive/indexer/custom_op.py#L55

For this case it's greater than 2 because the format is like: 

```
[
    ['follow', {
        'follower': 'curbot',
        'following': 'kevinwong',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'nothingismagick',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'simnrodrguez',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'steem-ua',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'decentraland',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'mikepm74',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'empath',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'emrebeyler',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'eroche',
        'what': ['blog']
    }],
    ['follow', {
        'follower': 'curbot',
        'following': 'ervinneb',
        'what': ['blog']
    }]
]
```
***
This explains `curbot`. 

Regarding my other 3 missing followers:

| Follower      | Following  | Tx ID                                    | Block num | Timestamp           |
|---------------|------------|------------------------------------------|-----------|---------------------|
| erhanbute     | emrebeyler | d10dcd1bdb661fc4e63f2464fa2262624db5d003 | 26710986  | 2018-10-11T09:55:21 |
| kenzyobiadi   | emrebeyler | 9ef235eb36aac5e466b97ad3e459b7eb9495f898 | 26492393  | 2018-10-03T19:38:45 |
| hariyati.amin | emrebeyler | 383a36f7aa65724eb634ebdae141366674dc1df8 | 26450469  | 2018-10-02T08:41:33 |
***
Timestamps suggest that it happened between `2018-10-02` a `2018-10-10`. These transactions don't involve anything unusual.

Additionaly, I have checked `roadscape`'s followers on Steem:

Got this discrepancies:

```
{'curbot', 'kamvreto', 'msutyler'}
```
***

We know the problem w/ `curbot` so I have checked the other accounts.

For the `kamvreto`, they followed `roadscape` at `2016-07-25T22:35:12`. 

Here is the account history output:

```
{
    'trx_id': '2b7595b1f3e0e0105156d518b83d7eeaa19b6070',
    'block': 3514062,
    'trx_in_block': 3,
    'op_in_trx': 0,
    'virtual_op': 0,
    'timestamp': '2016-07-25T22:35:12',
    'op': ['custom_json', {
        'required_auths': [],
        'required_posting_auths': ['kamvreto'],
        'id': 'follow',
        'json': '{"follower":"kamvreto","following":"roadscape","what":["posts","blog"]}'
    }]
}
```
***

It was a **legacy** custom_json transaction. The tricky part is that transaction's `what` property includes two elements.

You can see the Follow constructor expects one element:

https://github.com/steemit/hivemind/blob/60dc61ee4bbde2080421a3fdf10c5b83be840e8b/hive/indexer/follow.py#L71
For this reason, Hive also ignores that.

The problem is same with the other missing follower of `roadscape`:

```
{
    'trx_id': 'c7694ff17ba7ba3fbe1740f05c2727ecbd98cd62',
    'block': 3409232,
    'trx_in_block': 1,
    'op_in_trx': 0,
    'virtual_op': 0,
    'timestamp': '2016-07-22T06:18:27',
    'op': ['custom_json', {
        'required_auths': [],
        'required_posting_auths': ['msutyler'],
        'id': 'follow',
        'json': '{"follower":"msutyler","following":"roadscape","what":["posts","blog"]}'
    }]
}
```
***

Expanding the sample size:

Discrepancies on `@utopian-io`'s followers:

```
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'qawazd', 'steemgems', 'curbot'}
```
***

| Follower  | Following  | Tx ID                                    | Block num | Timestamp           |
|-----------|------------|------------------------------------------|-----------|---------------------|
| steemgems | utopian-io | 25e9c3d8e625e634b68bd5e16e99327fd37174ae | 26722368  | 2018-10-11T19:25:27 |
| qawazd    | utopian-io | 8de43899a8ad84b8bd65a896e71e3e0eafda0757 | 26838941  | 2018-10-15T20:37:51 |
***

Follow operations are valid. Dates are close to what we miss at @emrebeyler's account: `2018-10-11` and `2018-10-15`.

#### TL;DR

- We have missing follow ops on api.steemit.com's hive instance. (Generally clustered around the month `2018-10`.)

- Hive ignores if the follow operation includes multiple follows. (steemd accepts it. The case with the @curbot)

- Hive ignores some legacy follow operations. Because, these ops may include two elements in the `what` property. (Ex: `["posts", "blog"]`)

#### My GitHub Account
https://github.com/emre
👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,