Part 21: Use Multi Threading To Analyse The Steem Blockchain In Parallel
utopian-io·@steempytutorials·
0.000 HBDPart 21: Use Multi Threading To Analyse The Steem Blockchain In Parallel
<center></center> This tutorial is part of a series where different aspects of programming with `steem-python` are explained. Links to the other tutorials can be found in the curriculum section below. This part is a direct continuation on [Part 19: Analysing The Steem Blockchain From A Custom Block Number For A Custom Block Count](https://steemit.com/utopian-io/@steempytutorials/part-19-analysing-the-steem-blockchain-for-a-custom-block-number-for-a-custom-block-count). Where the previous part focussed on how to access blocks on the `Steem Blockchain` 1 by 1 this part will look how this process can be parallelised. --- #### What will I learn - Which data is suitable for parallelisation - Divide work between threads - How does a thread class work - Create threads - Prevent data corruption with a queue and lock - Merge data received from threads #### Requirements - Python3.6 - `steem-python` #### Difficulty - Intermediate --- ### Tutorial #### Setup Download the file from [Github](https://github.com/amosbastian/steempy-tutorials/tree/master/part_21). There is 1 file `multi_threaded.py` which contains the code. The file takes 2 arguments from the command line which sets the amount of `blocks` to analyse and how many `threads` to use. Run scripts as following: `> python multi_threaded.py 1000 8` #### Which data is suitable for parallelization? `Parallelisation` is great to improve efficiency and use all the cores in current CPUs. However, not all data is equal. The most optimal data for `parallelisation` is data that is not related to each other and can be combined in infinite ways without altering the outcome. In this case we will be analysing blocks from the `Steem Blockchain` and counting how many times each `operation` is used. For this example it does not matter at which `block` the counting is started, or in which sequence the blocks are counted. As long as all `blocks` are counted the end result will be the same. Therefor, this is perfect for `parallelisation`. #### Divide work between threads For optimal performance work has to be divided equally. The amount of work per thread `n` can be calculated by taking the total `block_count` and dividing this by the `amount_of_threads`. For this to work `n` has to be a round number, so choose the `block_count` and `amount_of_threads` accordingly. ```python block_count = int(sys.argv[1]) amount_of_threads = int(sys.argv[2]) n = int(block_count/amount_of_threads) ``` Each thread has it's own `start_block` and `end_block`. To prevent overlap, since the first block is also counted. 1 has to be subtracted from `n`. ```python start = initial_value for each thread: start_block = start end_block = start + n -1 start = start + n ``` <br> #### How does a thread class work To make threads a theading.thread class has to be created. Consisting of an `__init__` part and a `run()` function. The `__init__` section contains all unique and shared variables that the thread requires. ```python class myThread (threading.Thread): def __init__(self, thread_id, start_block, end_block, n, blockchain, workQueue, queueLock): threading.Thread.__init__(self) self.thread_id = thread_id self.start_block = start_block self.end_block = end_block self.n = n self.blockchain = blockchain self.stream = self.blockchain.stream_from(start_block=start_block, end_block=end_block) self.current_block = self.start_block self.workQueue = workQueue self.queueLock = queueLock print (self.thread_id, self.start_block, self.end_block, '\n') ``` The `run()` function is used to make the thread do stuff and is called automatically. ```python def run(self): data = {} for post in self.stream: if post['block'] != self.current_block: # Do stuff ``` #### Create threads Create a list for the `threads`. Create each `thread` with it's unique and shared variables. Start the thread and append it to the list. ```python threads = [] for x in range(0, amount_of_threads): thread = myThread(x, start, start+ n-1, n, blockchain, workQueue, queueLock) thread.start() threads.append(thread) start = start + n ``` #### Prevent data corruption with a queue and lock The code is set up in such a way that each thread does all it own computations. Then when it is done it adds its data to a `queue` for the main thread to retrieve from. Since it is possible that multiple threads finish at the same time a locking mechanise is required to prevent data corruption. ```python # variables queueLock = threading.Lock() workQueue = queue.Queue(amount_of_threads) # locking/unlocking sequence self.queueLock.acquire() self.workQueue.put(data) self.queueLock.release() ``` #### Merge data received from threads The main threads waits for all the threads to finish to return work. ```python # wait for threads for t in threads: t.join() ``` Now it can retrieve all the finished work from the `queue` and merge it together. ```python merged_data = {} while not workQueue.empty(): data = workQueue.get() for key in data: if key not in merged_data: merged_data[key] = data[key] else: merged_data[key] += data[key] ``` #### Running the script Running the script will analyse the set amount of `blocks` back in time from the current `head block`. It will divide the `blocks` over the `amount_of_threads` set and prints out each `thread` and the work this `thread` has to do. During the process each thread updates it's current progress. At the end the merged data is printed. Test for yourself for different `block_counts` and `amount_of_threads` how much of a difference multi threading yields for this type op work. ``` python multi_threaded.py 1000 8 0 19512130 19512254 1 19512255 19512379 2 19512380 19512504 3 19512505 19512629 4 19512630 19512754 5 19512755 19512879 6 19512880 19513004 7 19513005 19513129 ... Thread 2 is at block 19512445/19512504 51.20% Thread 0 is at block 19512199/19512254 54.40% Thread 3 is at block 19512573/19512629 53.60% Thread 4 is at block 19512694/19512754 50.40% Thread 7 is at block 19513071/19513129 52.00% Thread 6 is at block 19512936/19513004 44.00% Thread 5 is at block 19512822/19512879 52.80% Thread 1 is at block 19512321/19512379 52.00% ... 'custom_json': 17688, 'claim_reward_balance': 1569, 'vote': 23629, 'comment': 9053, 'transfer_to_vesting': 82, 'comment_options': 1213, 'limit_order_create': 126, 'fill_order': 74, 'return_vesting_delegation': 675, 'producer_reward': 1000, 'curation_reward': 4428, 'author_reward': 1687, 'transfer': 1615, 'comment_benefactor_reward': 335, 'fill_vesting_withdraw': 82, 'account_update': 327, 'account_create_with_delegation': 43, 'delete_comment': 66, 'fill_transfer_from_savings': 6, 'feed_publish': 98, 'account_witness_vote': 38, 'account_witness_proxy': 5, 'transfer_to_savings': 6, 'account_create': 4, 'limit_order_cancel': 29, 'delegate_vesting_shares': 10, 'withdraw_vesting': 17, 'transfer_from_savings': 3, 'cancel_transfer_from_savings': 2, 'witness_update': 1} ``` #### Curriculum ##### Set up: - [Part 0: How To Install Steem-python, The Official Steem Library For Python](https://utopian.io/utopian-io/@amosbastian/how-to-install-steem-python-the-official-steem-library-for-python) - [Part 1: How To Configure The Steempy CLI Wallet And Upvote An Article With Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-1-how-to-configure-the-steempy-cli-wallet-and-upvote-an-article-with-steem-python) ##### Filtering - [Part 2: How To Stream And Filter The Blockchain Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-2-how-to-stream-and-filter-the-blockchain-using-steem-python) - [Part 6: How To Automatically Reply To Mentions Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-6-how-to-automatically-reply-to-mentions-using-steem-python) ##### Voting - [Part 3: Creating A Dynamic Autovoter That Runs 24/7](https://utopian.io/utopian-io/@steempytutorials/part-3-creating-a-dynamic-upvote-bot-that-runs-24-7-first-weekly-challenge-3-steem-prize-pool) - [Part 4: How To Follow A Voting Trail Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-4-how-to-follow-a-voting-trail-using-steem-python) - [Part 8: How To Create Your Own Upvote Bot Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-8-how-to-create-your-own-upvote-bot-using-steem-python) ##### Posting - [Part 5: Post An Article Directly To The Steem Blockchain And Automatically Buy Upvotes From Upvote Bots](https://utopian.io/utopian-io/@steempytutorials/part-5-post-an-article-directly-to-the-steem-blockchain-and-automatically-buy-upvotes-from-upvote-bots) - [Part 7: How To Schedule Posts And Manually Upvote Posts For A Variable Voting Weight With Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-7-how-to-schedule-posts-and-manually-upvote-posts-for-a-variable-voting-weight-with-steem-python) ##### Constructing - [Part 10: Use Urls To Retrieve Post Data And Construct A Dynamic Post With Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-10-use-urls-to-retrieve-post-data-and-construct-a-dynamic-post-with-steem-python) ##### Rewards - [Part 9: How To Calculate A Post's Total Rewards Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/how-to-calculate-a-post-s-total-rewards-using-steem-python) - [Part 12: How To Estimate Curation Rewards Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-12-how-to-estimate-curation-rewards) - [Part 14: How To Estimate All Rewards In Last N Days Using Steem-Python](https://utopian.io/utopian-io/@steempytutorials/how-to-estimate-all-rewards-in-last-n-days-using-steem-python) - [Part 20: Plotting Account's Total Generated Post Rewards Since Creation](https://steemit.com/utopian-io/@steempytutorials/part-20-plotting-account-s-total-generated-post-rewards-since-creation) ##### Transfers - [Part 11: How To Build A List Of Transfers And Broadcast These In One Transaction With Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-11-how-to-build-a-list-of-transfers-and-broadcast-these-in-one-transaction-with-steem-python) - [Part 13: Upvote Posts In Batches Based On Current Voting Power With Steem-Python](https://utopian.io/utopian-io/@steempytutorials/part-13-upvote-posts-in-batches-based-on-current-voting-power-with-steem-python) ##### Analysis - [Part 15: How To Check If An Account Is Following Back And Retrieve Mutual Followers/Following Between Two Accounts](https://utopian.io/utopian-io/@steempytutorials/part-15-how-to-check-if-an-account-is-following-back-and-retrieve-mutual-followers-following-between-two-accounts) - [Part 16: How To Analyse A User's Vote History In A Specific Time Period Using Steem-Python](https://steemit.com/utopian-io/@steempytutorials/part-16-how-to-analyse-a-user-s-vote-history-in-a-specific-time-period-using-steem-python) - [Part 18: How To Analyse An Account's Resteemers Using Steem-Python](https://steemit.com/utopian-io/@steempytutorials/part-18-how-to-analyse-an-account-s-resteemers) - [Part 19: Analysing The Steem Blockchain From A Custom Block Number For A Custom Block Count](http://utopian.io/utopian-io/@steempytutorials/part-19-analysing-the-steem-blockchain-for-a-custom-block-number-for-a-custom-block-count) --- The code for this tutorial can be found on [GitHub](https://github.com/amosbastian/steempy-tutorials/tree/master/part_21)! This tutorial was written by @juliank in conjunction with @amosbastian. <br /><hr/><em>Posted on <a href="https://utopian.io/utopian-io/@steempytutorials/part-21-use-multi-threading-to-analyse-the-steem-blockchain-in-parallel">Utopian.io - Rewarding Open Source Contributors</a></em><hr/>
👍 amosbastian, steempytutorials, juliank, bloodviolet, makerhacks, clicker, cccp, percetakanhca, oups, tiagopaixao, setianyareza, fisherck, maxg, for91days, elkutuz, lesyaparadi, criptoivan-en, veleje, steem-plus, utopian-io, mytechtrail, luj1, raquel.ramirezv, cha0s0000, kabirlec9, sandyprasasty, schase, jabed70, boyhaqii, priyalath5, abdullah67, prasannab, golamor, alfiannurmedia, jpphoto, jahidrony,