Given that I speak a few languages, I thought it would be cool to be able to programatically find business names since most the awesome .com’s are taken.
The only way I thought of doing this was by making a app that would determine the pronounceability of the word. So pronounceability was born. Now given that Machine Learning is the big crazy right now and everyone is thinking it’s where it’s going, I fgured what better idea to start with.
I code mostly in python and golang. This example is going to be in python.
The entire git repo for the application can be found here.
If you don’t know much about programming I would say you are missing out. The amount of things I’ve been able to accomplish by writing scripts is phenomenal. But if you want to learn, this is a bit of a more sophistaced app, but it never hurts to start out with this tutorial to understand the minimalstic framework I’m using to get it live.
but the core nucleous of the app is this
from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split import models import random words = [w.strip() for w in open('words.txt') if w == w.lower()] def scramble(s): return "".join(random.sample(s, len(s))) scrambled = [scramble(w) for w in words] X = words+scrambled # explicitly create binary labels label_binarizer = LabelBinarizer() y = label_binarizer.fit_transform(['word']*len(words) + ['unpronounceable']*len(scrambled)) text_clf = Pipeline([ ('vect', CountVectorizer(analyzer='char', ngram_range=(1, 3))), ('clf', MultinomialNB()) ]) text_clf = text_clf.fit(X, y) # you might want to persist the Pipeline to disk at this point to ensure it's not lost in case there is a crash @models.db_session def check_pronounceability(word): stuff = text_clf.predict_proba([word]) pronounceability = round(100*stuff, 2) models.Word(word=word, pronounceability=pronounceability) models.commit() return pronounceability
Currently one of the more known libraries on machine learning is
scitkit, you can read more about it here it has a bunch of different algorithms and examples, from natural language, to picture recognition.
If you have any questions let me know, if not check out the app and see how pronounceable some words are. I will be adding in a API to it at some point, pull requests always welcome