Learn Python Series (#4) - Round-Up #1

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com

utopian-io·@scipio·7 years ago

0.000 HBD

Learn Python Series (#4) - Round-Up #1

# Learn Python Series (#4) - Round-Up #1

![python_logo.png](https://res.cloudinary.com/hpiynhbhq/image/upload/v1520029119/rw3u19p3ap1c6hjcfa9w.png)

#### What Will I Learn?
- You will learn how to combine essential Python language mechanisms, and the built-in string methods, to program your own, self-defined, real-life and useful functions,
- In the code examples I'll be only using what I've covered in the previous `Learn Python Series` episodes.

#### Requirements
- A working modern computer running macOS, Windows or Ubuntu
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
- The ambition to learn Python programming

#### Difficulty
Intermediate

#### Tutorial Contents
A full description of the topics of this video tutorial, plus the contents of the tutorial itself.

#### Curriculum (of the `Learn Python Series`):
- [Learn Python Series - Intro](https://utopian.io/utopian-io/@scipio/learn-python-series-intro)
- [Learn Python Series (#2) - Handling Strings Part 1](https://utopian.io/utopian-io/@scipio/learn-python-series-2-handling-strings-part-1)
- [Learn Python Series (#3) - Handling Strings Part 2](https://utopian.io/utopian-io/@scipio/learn-python-series-3-handling-strings-part-2)

# Learn Python Series (#4) - Round-Up #1
This is the first **Round-up** episode within the `Learn Python Series`, in which I will show you how to build interesting things using just the mechanisms that were covered already in the previous `Learn Python Series` episodes.

Of course, as the series progress, with each tutorial episode more tools are added to our tool belt, so to keep things organized I'll try to use mostly what was covered in the last few episodes.

### Getting creative with strings
Programming is a creative task. Depending on the complexity of what you want to build, you first need to have a fairly clear idea of how to achieve your goal - a working program - and while you're coding you oftentimes run into problems (or "puzzles") that need to be solved. In order to become a proficient programmer, in Python and in any programming language, it's very important that you **enjoy** trying to solve those "puzzles". The more "tools" you have on your "tool belt", the complexer the puzzles you're able to solve. To get better at programming, I think it's also important to keep pushing your limits: get out of your comfort zone and expand your horizons!

Up until now, in the previously released `Handling Strings` tutorials, we've been discussing the usage of individual string methods. But of course we can combine their individual strengths to create self-defined functions that do exactly what we want them to do! That's actually the beauty of the Python programming language: we can pick (and import!) individual "tools", just the tools we need to use per project / script, use them as "building blocks", and then create even better tools or more advanced "building blocks" for our own purposes.

**Disclaimer:** _The following two "mini-projects" cover how to program self-defined, somewhat useful, string handling functions. I'm not stating these are the best, let alone only, ways to program them. The goal is to show, to the reader / aspiring Python programmer, that only understanding what was covered already is enough to program interesting and useful code with!_

# Mini project `parse_url()`
In case you want to program a web crawler, to fetch unstructured data from web pages if an API providing structured JSON data is missing, or in case you want to build an run a full-fledged search engine, you need to handle URLs. URLs come in many forms, but still have components that are characteristic to any URL. In order to properly use URLs, you need to "parse" them and "split" them into their components.

Let's see how to develop a `parse_url()` function that splits several URL components and returns them as a tuple. We're looking to return these components:

* protocol or scheme (e.g. `https://`),
* host (which could be an IP address or something like `www.google.com`),
* the domain name (e.g. `steemit.com`),
* the Top Level Domain TLD (e.g. `.com`),
* the subdomain (e.g. `staging` in `https://staging.utopian.io`),
* and the file path (e.g. `index.php?page=321`)

**PS: The explanations are put inside the code as `# comments`!**


```python
def parse_url(url):
    
    # First we initiate the variables we want 
    # the function to return, set all to an empty string.
    
    scheme = host = subdomain = tld = domain = path = ''    
    
    # ---------------------------------------------------
    # -1- Identify and, if applicable, isolate the scheme
    # ---------------------------------------------------
    
    needle = '://'
    if needle in url:
        
        scheme_index = url.find(needle)
        scheme = url[:scheme_index + len(needle)]        
        
        # Slice the scheme from the url
        
        url = url[len(scheme):]
    
    # ---------------------------------------------------
    # -2- Identify and, if applicable, isolate 
    #     the file path from the host
    # ---------------------------------------------------
    
    needle = '/'
    if needle in url:
        
        # Split the host from the file path.
        
        host, path = url.split(sep=needle, maxsplit=1)
        
    else:
        
        # The remaining url is the host
        
        host = url
        
    # ---------------------------------------------------
    # -3- Check if the host is an IP address or if it
    #     contains a domain
    # ---------------------------------------------------

    # Remove the dots from the host
    
    needle = '.'
    no_dots = host.replace(needle, '')
    if no_dots.isdigit() == False:
        
        # The host contains a domain, so continue
    
        # ---------------------------------------------------
        # -4- Identify and isolate the tld
        # ---------------------------------------------------    

        num_dots = host.count(needle)

        # --- NB1: ---
        # When num_dots == 0 , the string wasn't a url! ;-)
        # But let's just assume for now the string is a valid url.    

        if num_dots == 1:

            # The host does not contain a subdomain

            domain = host
            tld = host[host.find(needle)+1:]

        elif num_dots > 1:

            # The host might contain a subdomain

            # --- NB2: ---
            # In order to distinguish between a host containing
            # one or more subdomains, and a host containing a 3rd
            # or higher level tld, or both, we need a list 
            # that contains all tlds.
            #
            # That list seems to be here ...
            #
            # https://publicsuffix.org/list/public_suffix_list.dat
            #
            # ... but we haven't covered yet how to fetch 
            # data from the web.
            #
            # So for now, let's just create a list containing
            # some 3rd level tlds, and just assume it is complete.

            all_3rdlevel_tlds = ['co.uk', 'gov.au', 'com.ar']

            for each_tld in all_3rdlevel_tlds:
                if each_tld in host:

                    # Apparently the tld in the url is a 3rd level tld

                    tld = each_tld                
                    break

            # ---------------------------------------------------
            # PS: Notice that this `else` belongs to the `for`
            #     and not the `if` ! It only runs when the `for`
            #     exhausted but did not break.
            # ---------------------------------------------------

            else:            
                tld = host[host.rfind(needle)+1:]            

            # ---------------------------------------------------
            # -5- Identify and, if applicable, isolate 
            #     the subdomain from the domain
            # ---------------------------------------------------  

            host_without_tld = host[:host.find(tld)-1]        
            num_dots = host_without_tld.count(needle)

            if num_dots == 0:

                # The host doesn't contain a subdomain

                domain = host_without_tld + needle + tld

            else:

                # The host contains a subdomain

                subdomain_index = host_without_tld.rfind('.')
                subdomain = host_without_tld[:subdomain_index]
                domain = host[subdomain_index+1:]        

    return scheme, host, subdomain, domain, tld, path

# Let's test the function on several test urls!

test_urls = [
    'https://www.steemit.com/@scipio/recent-replies',
    'https://steemit.com/@scipio/recent-replies',
    'http://www.londonlibrary.co.uk/index.html',
    'http://londonlibrary.co.uk/index.html',
    'https://subdomains.on.google.com/',
    'https://81.123.45.2/index.php'
]

# And finally call the parse_url() function,
# and print its returned output!

for url in test_urls:
    print(parse_url(url))

# YES! It works like a charm! ;-)
# ---------
# Output:
# ---------
# ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
# ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
# ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
# ('https://', '81.123.45.2', '', '', '', 'index.php')
```

    ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
    ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
    ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
    ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
    ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
    ('https://', '81.123.45.2', '', '', '', 'index.php')


# Mini project `encode_gibberish()` and `decode_gibberish()`
Remember my hidden message that was contained in the "Gibberish string", covered in [Handling Strings Part 1](https://utopian.io/utopian-io/@scipio/learn-python-series-2-handling-strings-part-1)? For a brief reminder, we used a `-3` negative stride on a reversed string that contained the hidden message hidden within a bunch of nonsense.

This was the code:


```python
gibberish_msg = """!3*oJ6iFupOGiF6cNFSHU 6dmVhoKUrTvfHi 
                    KteBrgHvaIgsX$snTeIgmV0 HvnYGembdJRd*&i$6h&6 &5a*h BGsF@iGv NhsIgiYdh67T"""
print(gibberish_msg[::-3])
# This is a hidden message        from Scipio!
```

    This is a hidden message        from Scipio!


Now as the second "mini-project" for this Round-Up, let's learn how to program a function `encode_gibberish()` to **encode** a gibberish string from a message, and another one `decode_gibberish()` to reveal / **decode** the hidden message contained inside the gibberish!

**PS: The explanations are put inside the code as `# comments`!**


```python
def encode_gibberish(message, stride=1):
    
    # Let's use a mixed-up `chars` list containing lower-case letters,
    # upper-case letters, integers 0-9, and some other characters,
    # all found on a regular keyboard.
        
    chars = ['x', '-', 'G', 'H', 'l', 'a', '{', 'r', 2, ']', 
         ';', 'F', 'E', 'A', 'V', ')', '$', '?', '/', 
         'i', 'M', 'p', 9, 'C', 'w', 'k', '}', ':', 
         '_', '%', 'D', 'I', 'b', 'z', 'd', 6, 'N', 
         'L', 'c', '.', 1, 'X', 'h', 4, '!', 'S', '~', 
         'u', '+', 'f', 'R', 8, 3, '&', '<', 'y', 'Z', 
         'P', 'n', '^', 'J', 'q', 5, 'o', 'W', '*', 'Q', 
         7, 'B', 'g', 'O', 'K', 'm', ',', 's', '>', 
         'T', '(', '#', 't', 'j', 'e', 
         'Y', '@', '[', 'v', '=', 'U'
    ]
    
    # Initialize an iterator for the `chars` list
    
    chars_index = 0

    # Convert the message string to a list
    
    message = list(message)
    
    # Quick fix for negative strides:
    # if stride is negative, use the 
    # absolute (positive) value
    
    abs_stride = stride * -1 if stride < 0 else stride
    
    # For all characters from the `message` list,
    # add characters from the `chars` list    
    
    for index in range(len(message)):
        
        # Iterate over the `chars` list, and per
        # `message` character concatenate as many
        # characters as the `stride` argument
        
        salt = ''        
        for i in range(abs_stride):
            salt += str(chars[chars_index])
            if chars_index == len(chars)-1:
                chars_index = 0
            else:
                chars_index += 1
        message[index] = message[index] + salt
    
    # Convert back to string
    message = ''.join(message)
 
    # In case of a negative stride, 
    # reverse the message    
    if stride < 0:
        message = message[::-1] 
    
    return message

def decode_gibberish(encoded_msg, stride=1):
    
    # Simply decode the encoded message using
    # the `stride` argument
    
    stride = stride + 1 if stride > 0 else stride -1
    return encoded_msg[::stride]

# Let's see if this works!
stride = -5
msg1 = "This is a very secret message that must be encoded at all cost. Because it's secret!"

# Encode, and decode
encoded_msg = encode_gibberish(msg1, stride)
decoded_msg = decode_gibberish(encoded_msg, stride)

# Print the encoded and decoded message strings
print(encoded_msg)
print(decoded_msg)
```

    7Q*Wo!5qJ^ntPZy<&e38Rf+ru~S!4chX1.ceLN6dzsbID%_ :}kwCs9pMi/'?$)VAtEF;]2ir{alH G-xU=ev[@Yesjt#(Tu>s,mKaOgB7Qc*Wo5qeJ^nPZBy<&38 Rf+u~.S!4hXt1.cLNs6dzbIoD%_:}ckwC9p Mi/?$l)VAEFl;]2r{aalHG- xU=v[t@Yejta#(T>s ,mKOgdB7Q*Weo5qJ^dnPZy<o&38Rfc+u~S!n4hX1.ecLN6d zbID%e_:}kwbC9pMi /?$)VtAEF;]s2r{aluHG-xUm=v[@Y ejt#(tT>s,maKOgB7hQ*Wo5tqJ^nP Zy<&3e8Rf+ug~S!4haX1.cLsN6dzbsID%_:e}kwC9mpMi/? $)VAEtF;]2re{alHGr-xU=vc[@Yejet#(T>ss,mKO gB7Q*yWo5qJr^nPZye<&38Rvf+u~S !4hX1a.cLN6 dzbIDs%_:}kiwC9pM i/?$)sVAEF;i]2r{ahlHG-xT
    This is a very secret message that must be encoded at all cost. Because it's secret!


# What did we learn, hopefully?

That, although we have yet still only covered just a few Python languages mechanisms and haven't even used an `import` statement, which we will cover in the next `Learn Python Series` episode, we already have "the power" to program useful functions! We only needed 4 tutorial episodes for this, so let's find out just how much more we can learn in the next episodes! See you there!

### Thank you for your time!

<br /><hr/><em>Posted on <a href="https://utopian.io/utopian-io/@scipio/learn-python-series-4-round-up-1">Utopian.io -  Rewarding Open Source Contributors</a></em><hr/>

👍 scipio, steemitstats, joshboris, steemline, stoodkev, msp3k, nettybot, matrixonsteem, cajun, steemliberator, witnessstats, steemcreate, thashadowbrokers, brotato, pizaz, coonass, squirrelnuts, jeezy, r2steem2, triplethreat, dootdoot, wewt, conflaxus, tittilatey, steemdevs, test.with.dots, pi-pi, listentosteem, gravy, dailychina, justyy, happyukgo, dailyfortune, superbing, dailystats, oguzhangazi, infoairdrops, dreamsss, jasonbu, cifer, anomaly, asrizalmustafa, amosbastian, loshcat, someguy123, mdf-365, diogogomes, robertlyon, lukestokes, greenstar, davidmendel, utopian-io, curtiscolwell, onebartender, lenokvyalv179, techslut, steempty, fminerten, gutzygwin, brainiac01, teamsarcasm, rhema, zoef, ggabogarcia, abh12345, zapncrap, lemouth, steemstem, anarchyhasnogods, justtryme90, mobbs, the-devil, robotics101, effofex, foundation, lamouthe, himal, rachelsmantra, kerriknox, nitesh9, gra, rjbauer85, rockeynayak, sci-guy, amavi, dber, dna-replication, gentleshaid, curie, kenadis, locikll, mountainwashere, carloserp-2000, gambit.coin, sammarkjames, xanderslee, hadji, anwenbaumeister, markangeltrueman, tantawi, aboutyourbiz, phogyan, justdentist, birgitt, hendrikdegrote, fredrikaa, keshawn, howtostartablog, karyah1001, spectrums, dysfunctional, speaklife, randomwanderings, ligarayk, altherion, responsive, dyancuex, slickhustler007, ertwro, makrotheblack, rejzons, markmorbidity, jacoblayan, esaia.mystic, dethclad, zipporah, heriafriadiaka, kushed, pacokam8, infinitelearning, artepoetico, revilationer, blockmountain, oscarcc89, churchboy, vadimlasca, smafey, marialefleitas, gordon92, laritheghost, will12, sco, dashfit, devi1714, tensor, levinvillas, de-stem, deutsch-boost, pharesim, donchate, saiku, lenin-mccarthy, skycae, tito36, crypticalias, blerdrage, neneandy, wisewoof, michelios, teamhumble, positiveninja, eric-boucher, steemulator, coloringiship, jaeydallah, hillaryaa, delph-in-holland, fidelpoet, steemstem-bot, proteus-h, drifter1, zulfan91, ghasemkiani, iamredbar, acutpelangi,

properties (23)vote details (175)