Distributed Data Processing - Freeing the Power of Information
gridcoin·@jringo·
0.000 HBDDistributed Data Processing - Freeing the Power of Information
<center></center> Hello team! I'd like to share with you a short snippet from a project I am working on for BOINC/GRC. This is an early draft and should be treated as such. I welcome all input, criticism, corrections to grammar, content, and spelling, and above all, open and ego free discussion on the subjects of: data collection and processing distributed processing BOINC If you would like to chat privately please feel free to PM me on [slack](https://teamgridcoin.signup.team/). --------------- <center><h2>Why Data?</h2></center> An increased quality of living in a society often coincides with an increase in that society’s ability to gather and process data. For example, one of the most early examples of data collection and processing is likely related to farming, the technology which allowed humans to settle. Seeds were strewn about, crops grew, people collected data regarding quantity and quality of the harvest under varying conditions, and people processed that data to inform their future actions. In order to interact with data, humans use the technologies which sparked humanity’s data revolution: Language and Calculation. As a result of this propensity for gathering and processing data, communities were able to form. Then towns, cities, empires and nations – society continued to collect magnitudes upon magnitudes of data. Which way do the winds blow? How do the seasons cycle? How do groups of people act, react? How do the gods act, react? Even the movement of the stars was analyzed and processed over thousands of years. Tools, from intricate machinery to the most detailed epics, were developed and improved humanity's ability to communicate and calculate. Over time these tools were refined and the human capacity for data collection and processing has, over time, only increased. The song, the story, the epic poem. Paper, the printing press, the internet. Addition, division, the abacus, the number 0. This human proficiency is what has given us our current understanding of our universe, our world, and ourselves. The quest to build upon this proficiency has driven countless major technological and sociological advancements. <center><h2>[The Z1](https://en.wikipedia.org/wiki/Z1_%28computer%29)</h2></center> Humans relied on Language and Calculation to collect and process data for millennia. Automated binary processing changed everything. Binary is a language that can be understood by humans. However, the human condition (mind, body, soul if you believe in that sort of thing) makes binary far less efficient for inter-human interaction when compared to our current language structures; It’s impractical to use binary for communication between humans. The opposite is true when binary is input to a machine instead of a human. It makes sense: Why would instruments of logic use human language, a tool ripe with context, metaphor, and idiosyncrasies. Languages and calculations were developed in ways that help human minds process data which humans bodies collect. Computers do the work of the human body and the mind in [ways we can’t even fathom](https://www.fastcodesign.com/90132632/ai-is-inventing-its-own-perfect-languages-should-we-let-it) and at rates we’ve only just begun to explore. In the 80 years since the Z1, computers have been collecting exponentially more data. Soil pH, particle density, climate trends, political will, social response, medical success rate, molecular result, particle composition, astronomical calculation, genomic structures, everything -- It could be said that the post-binary world is an infinite cacophony of data points touching every conceivable subject. It could also be said that, for the most part, humans and our extraordinary minds have tried and failed to process this data. When developed, binary had the communication tool, but lacked the resources for calculation. Humans had the calculations, but lacked the language to deal with such large datasets. Centralized Super Computers built, operated, and owned by individuals or organizations, have been the primary processing tools for binary calculation. There have been some [insane](https://en.wikipedia.org/wiki/Atlas_%28computer%29) [super](https://en.wikipedia.org/wiki/Cray-1) [computers](https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/) in our time, but access to their utility is hard to come by even for respected institutions, let alone the average scientist, and don't even mention a kid working on a project. In other words, access to a tool critical to improving civilization is controlled by a limited number of well-connected and wealthy individuals or organizations. This is not a bad thing, but simply how technology develops. Only the Egyptian elite kept records of society’s food supply. Only monks knew how to read and write. Only the government and universities had access to the early internet. Consider: Language and Calculation was and is not taught to all people, but when it is, the standard of living of a society is, in general, greatly improved. <center><h2>[Distributed Processing](https://en.wikipedia.org/wiki/Distributed_computing#History)</h2></center> Distributed Processing takes a large data-set and splits it into manageable units. These units are distributed to multiple processors which are all connected by a unified network. These data set units are processed by the nodes of this network. Their results returned to the network and delivered to the original data host (or whatever the protocol of the network dictates). The host (or network) uses these results to formulate an answer to the best of its ability. Essentially, individuals or organizations offer their available processing power to individuals or organizations which have parsable data. This network of individuals and organizations creates an entity analogous with an enormous super computer and could be centralized or otherwise. If structured to prioritized volunteer oriented infrastructures which inhibit the utilization of task specific processors, the network would disintegrate the resource barrier inherent to rent-based super-computing. Under a decentralized system, everyone from first year CSC to top level researchers would have access to the world’s most powerful supercomputer. Consider: the Bitcoin network is 50,000 times more powerful than the top 500 super computers in the world combined, and is only growing. <center><h2>[BOINC (The Berkeley Open Infrastructure for Network Computing)](https://boinc.berkeley.edu/)</h2></center> BOINC is an open-source volunteer based distributed computing platform which provides scientists and enthusiasts with a means to host data that needs to be processed. BOINC has been operating since 2002 and has and continues to process data that helps map the Milky Way, detect asteroids, find prime numbers, fold proteins, test chemical and molecular combinations, Search for Extraterrestrial Life (SETI), and more. BOINC projects are created at no cost by anyone who has generated or gathered parsable data and has formatted that data for the BOINC network. This data can be anything – Scientific, mathematic, social, political – anything. This data can be processed at only the cost of electricity by anyone with a computer, from a cell phone to a server, connected to the BOINC network. This structure of volunteer hosting and processing distributes the power of information among the proletariat instead of solely among those who possess the resources to process large volumes of data. It similar to making a public school system which prioritizes teaching reading, writing, and mathematics. Further, BOINC is designed in a way which limits the ability of task specific processors (ASIC machines and GPU farms). Instead, it encourages the utilization of idle processors such as a personal computer when its user is asleep or at work. This model has the potential to scale limitlessly as the internet of things seeps further into digital cultures.