NLP: A Simple Python program to analyze reviews/comments and classify the review as Negative or Positive :)
nlp·@b-reddi·
0.000 HBDNLP: A Simple Python program to analyze reviews/comments and classify the review as Negative or Positive :)
I was just killing my time with some NLP stuff recently and came across few articles that machine learning has grown so powerful and will continue to grow thousand folds(bcz the corpus/training data is increasing day by day). https://steemitimages.com/DQmSUoRFoCcvKYQcUSCzeotAQccS5ZsHucGoj4R2hRBxJut/NLP.jpg Then I though of trying a basic program that can train on a set of positive reviews and a set of negative review independently. Then I thought of testing it against a totally unknown set of reviews, to check if this algorithm can predict the type of review(+ive or -ive). Hence I came up with below code and tested it .... To my wonder it is working good, so good, so soo good. I mean given the size of my training data and the outputs I tested randomly, the accuracy for Positive reviews was 92-100% and for Negative it was 89 - 99% ... yayyyy !!! :D It was a good learning though, I am sharing my code below , just in case if there are NLP curious people around ^_^ :D *********************************************************************** Feel free to reuse the code or suggest me edits or tips (y) *********************************************************************** import re import math from collections import Counter inputFile = open("../sample.txt").readlines() # Give un known data here as input outputFile = open("../testdata.txt", 'w') # Expect this file to be your output :) posReview = open("../hotelPosTive_train.txt", 'r').readlines() # +ive training corpus Download from internet :P negReview = open("../hotelNegTive_train.txt", 'r').readlines() # -ive training corpus Download from internet :P positiveWords = [] # all positive words from corpus negativeWords = [] # all negative words from corpus probPos = {} # dictionary with initial word likelihood probabilities in +ive review probNeg = {} # dictionary with initial word likelihood probabilities in -ive review for a1 in posReview: a1 = a1.strip().split() for b1 in range(1, len(a1)): S = re.sub('[^A-Za-z0-9]+', '', a1[b1]) # removed all special characters from input positiveWords.append(S) for a2 in negReview: a2 = a2.strip().split() for b2 in range(1, len(a2)): S = re.sub('[^A-Za-z0-9]+', '', a2[b2]) # removed all special characters from input negativeWords.append(S) positivePriorprob = len(posReview) / (len(posReview) + len(negReview)) # prior Probability for +ive reviews negativePriorprob = len(negReview) / (len(posReview) + len(negReview)) # prior Probability for -ive reviews freqPositiveWords = Counter(positiveWords) # positiveReviews words and their freq freqNegativeWords = Counter(negativeWords) # negativeReviews words and their freq uniquePositiveWords = set(positiveWords) # Positive unique words in +ive review corpus uniqueNegativeWords = set(negativeWords) # Negative unique words in -ive review corpus uniqueAllWords = set(positiveWords + negativeWords) # Total unique words in the corpus def nBayesAlgorithm(review): probReviewPos = 0 probReviewNeg = 0 probFinalPos = 1 probFinalNeg = 1 for a3 in review: probPos[a3] = (freqPositiveWords[a3] + 1) / ( len(positiveWords) + len(uniqueAllWords)) # smoothed +ive probabability probNeg[a3] = (freqNegativeWords[a3] + 1) / ( len(negativeWords) + len(uniqueAllWords)) # smoothed -ive probabability for a4 in review: if a4 in uniqueAllWords: probReviewPos += math.log(probPos[a4]) probReviewNeg += math.log(probNeg[a4]) else: continue probFinalPos = probReviewPos + math.log(positivePriorprob) probFinalNeg = probReviewNeg + math.log(negativePriorprob) if float(probFinalPos) > float(probFinalNeg): return "POS" else: return "NEG" *****************Printing To Output File*************** for a5 in inputFile: reviewList = a5.split() outputFile.write(reviewList[0] + " " + nBayesAlgorithm(reviewList[1:len(reviewList)]) + "\n") Image Credits: Internet(Google image search)