Instead, I jumped right into improve search functionality. I skimmed the papers quickly, and it seems that I might need more time than I have to implement them, instead I attempted to use fairly basic heuristics for sorting results. It was actually quite successful.
I initially had gathering results simply based on whether or not the contained any of the words in the search query. Obviously this wasn't the best way to do it but it was a good functionality test. As you can see bellow, the results aren't very good.
I next tried ranking results based on how much they matched the query. I simply counted the letters in all the query words that existed in each food name. I then normalized the ranks by name length. The results were quite a bit better, but they were still not quite what I wanted. I was getting results that matched both words at the top of the query, but I was missing some results that started with the first query word. For instance bellow, you will notice that there are no top results beginning with butter (butter is actually at the top of the unsorted list too so it was definitely in the db).
My last and current method employed some weighting based on the order of the words in the query and in the word. I calculate three weights:
- the index in the word that the query word is found as a substring
- The index is normalized by the length of the word, and inverted to rank earlier indices higher.
- Eg. The query "butter" for the word "butter" gets a index rating 1.0, while "utter" gets an index rating of 0.8333 (1 - 1/6)
- the matched words position in the foods name (the first word in the food name is rated higher than the second).
- The word position ranking is then normalized by the total number of words in the food name.
- the percentage of the word that the query string matches
I then multiply the three weights together to form the rank, and sort to get the highest ranked words. The results were excellent. I was testing briefly with Elliot and was able to find whatever food in just a few letters entered.
The search works great, but there still remains the problem that it is a fair bit slow. Right now it will search every time a letter of the query string is modified, so the ranking is run very often. I will look into how easily I can launch a timer, and only search every ~0.5 seconds or so if the query string is updated.
Next week will probably be my last week of real development before the end of the semester. I forgot to factor in time for making my video and write-up when I was setting goals last week, so I will need to reserve the last few days for that. On the plus side though, next week is reading days, so I will have more time to code (in theory at least). I will aim to clean up the nutrition history visualization page. Hopefully I can get meals to work, but I need to vow not to waste too much time on it or nothing else will get done. The next thing after that will probably be hard-coding in a recommended daily values table to get the visualization information normalized properly. After that I will clean up the UI and clean up the code a little bit so it isn't too bad for Norm and Joe to review.
No comments:
Post a Comment