Creating a Machine Learning web app


#1

Given that I speak a few languages, I thought it would be cool to be able to programatically find business names since most the awesome .com’s are taken.

The only way I thought of doing this was by making a app that would determine the pronounceability of the word. So pronounceability was born. Now given that Machine Learning is the big crazy right now and everyone is thinking it’s where it’s going, I fgured what better idea to start with.

I code mostly in python and golang. This example is going to be in python.

The entire git repo for the application can be found here.

If you don’t know much about programming I would say you are missing out. The amount of things I’ve been able to accomplish by writing scripts is phenomenal. But if you want to learn, this is a bit of a more sophistaced app, but it never hurts to start out with this tutorial to understand the minimalstic framework I’m using to get it live.

but the core nucleous of the app is this

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
import models
import random

words = [w.strip() for w in open('words.txt') if w == w.lower()]
def scramble(s):
    return "".join(random.sample(s, len(s)))

scrambled = [scramble(w) for w in words]
X = words+scrambled
# explicitly create binary labels
label_binarizer = LabelBinarizer()
y = label_binarizer.fit_transform(['word']*len(words) + ['unpronounceable']*len(scrambled))

text_clf = Pipeline([
    ('vect', CountVectorizer(analyzer='char', ngram_range=(1, 3))),
    ('clf', MultinomialNB())
])
text_clf = text_clf.fit(X, y)
# you might want to persist the Pipeline to disk at this point to ensure it's not lost in case there is a crash

@models.db_session
def check_pronounceability(word):
    stuff = text_clf.predict_proba([word])
    pronounceability = round(100*stuff[0][1], 2)
    models.Word(word=word, pronounceability=pronounceability)
    models.commit()
return pronounceability

Currently one of the more known libraries on machine learning is scitkit, you can read more about it here it has a bunch of different algorithms and examples, from natural language, to picture recognition.

If you have any questions let me know, if not check out the app and see how pronounceable some words are. I will be adding in a API to it at some point, pull requests always welcome