Final Thoughts About RelevantXKCD project

Part of my RelevantXKCD project writeup
After I wrapped a simple Django server and posted it on Reddit in the subreddit /r/xkcd, I got some great feedback!
For the Model
As /u/drcopus helpfully enlightened me, I should've done data augmentation on the original training set. I could've taken these comments, and substituted synonyms and even split each sentence in a comment into different sets.
He also mentioned that I should've finetuned a Neural Network rather than create one from scratch. I thought a <1% was fishy, so now I know that it just needs to run more. (Which means that training would take a very long time)
Essentially, attempting to create a model just takes a lot of time, and it is going to take a lot of processing time before progress can be accurately gauged.
Flaws
I think the model works pretty well, although I've noticed that the top result isn't usually the "best" simply because how people tended to search for one word queries...
Also not all comics were equally represented. Luckily most of the comics (1773 of them in the training data) had at least 2 training examples. But having 20 or more was less than 50%. This biased the system since more training examples = more vocabulary = higher probability to see comic.
2 or more times: 1716 (96.78%) 5 or more times: 1467 (82.74%) 20 or more times: 793 (44.72%) 50 or more times: 392 (22.10%) 200 or more times: 116 (6.54%) 1000 or more times: 12 (0.67%)
Here is a list of all comic IDs that had NO data (therefore would never been seen) -- it isn't part of the comic percentages above --
Some of these are really good, however, obviously these aren't the popular ones.
187 213 223 347 372 437 474 510 536 618 711 744 812 823 825 930 991 999 1006 1359 1466 1522 1556 1574 1596 1631 1648 1651 1699 1713 1733 1746 1754 1762 1778 1780 1783 1784 1798 1800 1802 1805 1811 anything higher than 1811
Slightly Unrelated Stuff...
So over that week, I compiled the top queries:

Everything else was quite narrow, so I decided against showing the queries. The queries are expected because the top 3 reddit comments said:
I had a good laugh at the 'dragon' result
Try "desolate"
It also shows that Bobby Tables is a classic (sql).
<side> It's funny that a week later that XKCD's newest comic had to do with Machine Learning. Pretty much what I did lol. </side>