Learn Python Series (#4) - Round-Up #1
utopian-io·@scipio·
0.000 HBDLearn Python Series (#4) - Round-Up #1
# Learn Python Series (#4) - Round-Up #1  #### What Will I Learn? - You will learn how to combine essential Python language mechanisms, and the built-in string methods, to program your own, self-defined, real-life and useful functions, - In the code examples I'll be only using what I've covered in the previous `Learn Python Series` episodes. #### Requirements - A working modern computer running macOS, Windows or Ubuntu - An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution - The ambition to learn Python programming #### Difficulty Intermediate #### Tutorial Contents A full description of the topics of this video tutorial, plus the contents of the tutorial itself. #### Curriculum (of the `Learn Python Series`): - [Learn Python Series - Intro](https://utopian.io/utopian-io/@scipio/learn-python-series-intro) - [Learn Python Series (#2) - Handling Strings Part 1](https://utopian.io/utopian-io/@scipio/learn-python-series-2-handling-strings-part-1) - [Learn Python Series (#3) - Handling Strings Part 2](https://utopian.io/utopian-io/@scipio/learn-python-series-3-handling-strings-part-2) # Learn Python Series (#4) - Round-Up #1 This is the first **Round-up** episode within the `Learn Python Series`, in which I will show you how to build interesting things using just the mechanisms that were covered already in the previous `Learn Python Series` episodes. Of course, as the series progress, with each tutorial episode more tools are added to our tool belt, so to keep things organized I'll try to use mostly what was covered in the last few episodes. ### Getting creative with strings Programming is a creative task. Depending on the complexity of what you want to build, you first need to have a fairly clear idea of how to achieve your goal - a working program - and while you're coding you oftentimes run into problems (or "puzzles") that need to be solved. In order to become a proficient programmer, in Python and in any programming language, it's very important that you **enjoy** trying to solve those "puzzles". The more "tools" you have on your "tool belt", the complexer the puzzles you're able to solve. To get better at programming, I think it's also important to keep pushing your limits: get out of your comfort zone and expand your horizons! Up until now, in the previously released `Handling Strings` tutorials, we've been discussing the usage of individual string methods. But of course we can combine their individual strengths to create self-defined functions that do exactly what we want them to do! That's actually the beauty of the Python programming language: we can pick (and import!) individual "tools", just the tools we need to use per project / script, use them as "building blocks", and then create even better tools or more advanced "building blocks" for our own purposes. **Disclaimer:** _The following two "mini-projects" cover how to program self-defined, somewhat useful, string handling functions. I'm not stating these are the best, let alone only, ways to program them. The goal is to show, to the reader / aspiring Python programmer, that only understanding what was covered already is enough to program interesting and useful code with!_ # Mini project `parse_url()` In case you want to program a web crawler, to fetch unstructured data from web pages if an API providing structured JSON data is missing, or in case you want to build an run a full-fledged search engine, you need to handle URLs. URLs come in many forms, but still have components that are characteristic to any URL. In order to properly use URLs, you need to "parse" them and "split" them into their components. Let's see how to develop a `parse_url()` function that splits several URL components and returns them as a tuple. We're looking to return these components: * protocol or scheme (e.g. `https://`), * host (which could be an IP address or something like `www.google.com`), * the domain name (e.g. `steemit.com`), * the Top Level Domain TLD (e.g. `.com`), * the subdomain (e.g. `staging` in `https://staging.utopian.io`), * and the file path (e.g. `index.php?page=321`) **PS: The explanations are put inside the code as `# comments`!** ```python def parse_url(url): # First we initiate the variables we want # the function to return, set all to an empty string. scheme = host = subdomain = tld = domain = path = '' # --------------------------------------------------- # -1- Identify and, if applicable, isolate the scheme # --------------------------------------------------- needle = '://' if needle in url: scheme_index = url.find(needle) scheme = url[:scheme_index + len(needle)] # Slice the scheme from the url url = url[len(scheme):] # --------------------------------------------------- # -2- Identify and, if applicable, isolate # the file path from the host # --------------------------------------------------- needle = '/' if needle in url: # Split the host from the file path. host, path = url.split(sep=needle, maxsplit=1) else: # The remaining url is the host host = url # --------------------------------------------------- # -3- Check if the host is an IP address or if it # contains a domain # --------------------------------------------------- # Remove the dots from the host needle = '.' no_dots = host.replace(needle, '') if no_dots.isdigit() == False: # The host contains a domain, so continue # --------------------------------------------------- # -4- Identify and isolate the tld # --------------------------------------------------- num_dots = host.count(needle) # --- NB1: --- # When num_dots == 0 , the string wasn't a url! ;-) # But let's just assume for now the string is a valid url. if num_dots == 1: # The host does not contain a subdomain domain = host tld = host[host.find(needle)+1:] elif num_dots > 1: # The host might contain a subdomain # --- NB2: --- # In order to distinguish between a host containing # one or more subdomains, and a host containing a 3rd # or higher level tld, or both, we need a list # that contains all tlds. # # That list seems to be here ... # # https://publicsuffix.org/list/public_suffix_list.dat # # ... but we haven't covered yet how to fetch # data from the web. # # So for now, let's just create a list containing # some 3rd level tlds, and just assume it is complete. all_3rdlevel_tlds = ['co.uk', 'gov.au', 'com.ar'] for each_tld in all_3rdlevel_tlds: if each_tld in host: # Apparently the tld in the url is a 3rd level tld tld = each_tld break # --------------------------------------------------- # PS: Notice that this `else` belongs to the `for` # and not the `if` ! It only runs when the `for` # exhausted but did not break. # --------------------------------------------------- else: tld = host[host.rfind(needle)+1:] # --------------------------------------------------- # -5- Identify and, if applicable, isolate # the subdomain from the domain # --------------------------------------------------- host_without_tld = host[:host.find(tld)-1] num_dots = host_without_tld.count(needle) if num_dots == 0: # The host doesn't contain a subdomain domain = host_without_tld + needle + tld else: # The host contains a subdomain subdomain_index = host_without_tld.rfind('.') subdomain = host_without_tld[:subdomain_index] domain = host[subdomain_index+1:] return scheme, host, subdomain, domain, tld, path # Let's test the function on several test urls! test_urls = [ 'https://www.steemit.com/@scipio/recent-replies', 'https://steemit.com/@scipio/recent-replies', 'http://www.londonlibrary.co.uk/index.html', 'http://londonlibrary.co.uk/index.html', 'https://subdomains.on.google.com/', 'https://81.123.45.2/index.php' ] # And finally call the parse_url() function, # and print its returned output! for url in test_urls: print(parse_url(url)) # YES! It works like a charm! ;-) # --------- # Output: # --------- # ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies') # ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies') # ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html') # ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html') # ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '') # ('https://', '81.123.45.2', '', '', '', 'index.php') ``` ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies') ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies') ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html') ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html') ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '') ('https://', '81.123.45.2', '', '', '', 'index.php') # Mini project `encode_gibberish()` and `decode_gibberish()` Remember my hidden message that was contained in the "Gibberish string", covered in [Handling Strings Part 1](https://utopian.io/utopian-io/@scipio/learn-python-series-2-handling-strings-part-1)? For a brief reminder, we used a `-3` negative stride on a reversed string that contained the hidden message hidden within a bunch of nonsense. This was the code: ```python gibberish_msg = """!3*oJ6iFupOGiF6cNFSHU 6dmVhoKUrTvfHi KteBrgHvaIgsX$snTeIgmV0 HvnYGembdJRd*&i$6h&6 &5a*h BGsF@iGv NhsIgiYdh67T""" print(gibberish_msg[::-3]) # This is a hidden message from Scipio! ``` This is a hidden message from Scipio! Now as the second "mini-project" for this Round-Up, let's learn how to program a function `encode_gibberish()` to **encode** a gibberish string from a message, and another one `decode_gibberish()` to reveal / **decode** the hidden message contained inside the gibberish! **PS: The explanations are put inside the code as `# comments`!** ```python def encode_gibberish(message, stride=1): # Let's use a mixed-up `chars` list containing lower-case letters, # upper-case letters, integers 0-9, and some other characters, # all found on a regular keyboard. chars = ['x', '-', 'G', 'H', 'l', 'a', '{', 'r', 2, ']', ';', 'F', 'E', 'A', 'V', ')', '$', '?', '/', 'i', 'M', 'p', 9, 'C', 'w', 'k', '}', ':', '_', '%', 'D', 'I', 'b', 'z', 'd', 6, 'N', 'L', 'c', '.', 1, 'X', 'h', 4, '!', 'S', '~', 'u', '+', 'f', 'R', 8, 3, '&', '<', 'y', 'Z', 'P', 'n', '^', 'J', 'q', 5, 'o', 'W', '*', 'Q', 7, 'B', 'g', 'O', 'K', 'm', ',', 's', '>', 'T', '(', '#', 't', 'j', 'e', 'Y', '@', '[', 'v', '=', 'U' ] # Initialize an iterator for the `chars` list chars_index = 0 # Convert the message string to a list message = list(message) # Quick fix for negative strides: # if stride is negative, use the # absolute (positive) value abs_stride = stride * -1 if stride < 0 else stride # For all characters from the `message` list, # add characters from the `chars` list for index in range(len(message)): # Iterate over the `chars` list, and per # `message` character concatenate as many # characters as the `stride` argument salt = '' for i in range(abs_stride): salt += str(chars[chars_index]) if chars_index == len(chars)-1: chars_index = 0 else: chars_index += 1 message[index] = message[index] + salt # Convert back to string message = ''.join(message) # In case of a negative stride, # reverse the message if stride < 0: message = message[::-1] return message def decode_gibberish(encoded_msg, stride=1): # Simply decode the encoded message using # the `stride` argument stride = stride + 1 if stride > 0 else stride -1 return encoded_msg[::stride] # Let's see if this works! stride = -5 msg1 = "This is a very secret message that must be encoded at all cost. Because it's secret!" # Encode, and decode encoded_msg = encode_gibberish(msg1, stride) decoded_msg = decode_gibberish(encoded_msg, stride) # Print the encoded and decoded message strings print(encoded_msg) print(decoded_msg) ``` 7Q*Wo!5qJ^ntPZy<&e38Rf+ru~S!4chX1.ceLN6dzsbID%_ :}kwCs9pMi/'?$)VAtEF;]2ir{alH G-xU=ev[@Yesjt#(Tu>s,mKaOgB7Qc*Wo5qeJ^nPZBy<&38 Rf+u~.S!4hXt1.cLNs6dzbIoD%_:}ckwC9p Mi/?$l)VAEFl;]2r{aalHG- xU=v[t@Yejta#(T>s ,mKOgdB7Q*Weo5qJ^dnPZy<o&38Rfc+u~S!n4hX1.ecLN6d zbID%e_:}kwbC9pMi /?$)VtAEF;]s2r{aluHG-xUm=v[@Y ejt#(tT>s,maKOgB7hQ*Wo5tqJ^nP Zy<&3e8Rf+ug~S!4haX1.cLsN6dzbsID%_:e}kwC9mpMi/? $)VAEtF;]2re{alHGr-xU=vc[@Yejet#(T>ss,mKO gB7Q*yWo5qJr^nPZye<&38Rvf+u~S !4hX1a.cLN6 dzbIDs%_:}kiwC9pM i/?$)sVAEF;i]2r{ahlHG-xT This is a very secret message that must be encoded at all cost. Because it's secret! # What did we learn, hopefully? That, although we have yet still only covered just a few Python languages mechanisms and haven't even used an `import` statement, which we will cover in the next `Learn Python Series` episode, we already have "the power" to program useful functions! We only needed 4 tutorial episodes for this, so let's find out just how much more we can learn in the next episodes! See you there! ### Thank you for your time! <br /><hr/><em>Posted on <a href="https://utopian.io/utopian-io/@scipio/learn-python-series-4-round-up-1">Utopian.io - Rewarding Open Source Contributors</a></em><hr/>
👍 scipio, steemitstats, joshboris, steemline, stoodkev, msp3k, nettybot, matrixonsteem, cajun, steemliberator, witnessstats, steemcreate, thashadowbrokers, brotato, pizaz, coonass, squirrelnuts, jeezy, r2steem2, triplethreat, dootdoot, wewt, conflaxus, tittilatey, steemdevs, test.with.dots, pi-pi, listentosteem, gravy, dailychina, justyy, happyukgo, dailyfortune, superbing, dailystats, oguzhangazi, infoairdrops, dreamsss, jasonbu, cifer, anomaly, asrizalmustafa, amosbastian, loshcat, someguy123, mdf-365, diogogomes, robertlyon, lukestokes, greenstar, davidmendel, utopian-io, curtiscolwell, onebartender, lenokvyalv179, techslut, steempty, fminerten, gutzygwin, brainiac01, teamsarcasm, rhema, zoef, ggabogarcia, abh12345, zapncrap, lemouth, steemstem, anarchyhasnogods, justtryme90, mobbs, the-devil, robotics101, effofex, foundation, lamouthe, himal, rachelsmantra, kerriknox, nitesh9, gra, rjbauer85, rockeynayak, sci-guy, amavi, dber, dna-replication, gentleshaid, curie, kenadis, locikll, mountainwashere, carloserp-2000, gambit.coin, sammarkjames, xanderslee, hadji, anwenbaumeister, markangeltrueman, tantawi, aboutyourbiz, phogyan, justdentist, birgitt, hendrikdegrote, fredrikaa, keshawn, howtostartablog, karyah1001, spectrums, dysfunctional, speaklife, randomwanderings, ligarayk, altherion, responsive, dyancuex, slickhustler007, ertwro, makrotheblack, rejzons, markmorbidity, jacoblayan, esaia.mystic, dethclad, zipporah, heriafriadiaka, kushed, pacokam8, infinitelearning, artepoetico, revilationer, blockmountain, oscarcc89, churchboy, vadimlasca, smafey, marialefleitas, gordon92, laritheghost, will12, sco, dashfit, devi1714, tensor, levinvillas, de-stem, deutsch-boost, pharesim, donchate, saiku, lenin-mccarthy, skycae, tito36, crypticalias, blerdrage, neneandy, wisewoof, michelios, teamhumble, positiveninja, eric-boucher, steemulator, coloringiship, jaeydallah, hillaryaa, delph-in-holland, fidelpoet, steemstem-bot, proteus-h, drifter1, zulfan91, ghasemkiani, iamredbar, acutpelangi,