Friday, November 4, 2011

Database Structure

After much pain in trying to find one database good enough to fit my needs, I have given in and decided to work with three databases together.

The first database is the USDA database that I have already been using. The structure for this database will dictate how the rest of the databases that get added on will be formed. There is one problem with the USDA database. Unfortunately for some reason it does not have data on many brand name products. I thought it was surprising that there is no government database that catalogs brand name product nutritional information, but I will have to make do in other ways.

Instead I will turn to the information for www.nutritional-information.info. The database on the site has a much better selection of brand name and restaurant food items. Unfortunately, I noticed that some of the data is not completely accurate (just by comparing values of food that I have lying around). However, the values aren't terribly off, and for now it provides me with a large set of data to develop off of. When I reach the point that I want to actually publish this app, I can look into licensing a non-free database with hopefully more accurate information. There isn't a direct database access feature to this database, so I can't use the site as is. Instead I am going to write a script to scrape all of the information of the site and merge it into the USDA database.

Next I will need a way to look up products by barcode. Unfortunately neither of these two databases have and reference to UPC codes, so I will need yet another reference for this. There are a number of UPC lookup websites (I haven't chosen a particular one yet, I will need to see which one has the most consistent lookup success).  The websites really only provide a product name, so the best I can do is then query the database using the product name as a search string. Hopefully this should generally provide good results.

After looking at the competing nutrition apps, it seems like the full coverage that they have gotten is through crowd-sourcing their data. Essentially, in order to fill in the gaps of the database, I can add the option for users to manually enter any food that isn't already in the database. Sure a lot of users won't do this, but even a small number of contributors should greatly improved the breadth of the database.

For what it is worth, Norm also forwarded my information about a Harvard project which crowd-sources nutrition estimates http://www.seas.harvard.edu/news-events/press-releases/crowdsourcing-nutrition-in-a-snap. Essentially, you can just take a picture of your meal and send it out to the crowd and get back an estimate of what the nutritional values are.

No comments:

Post a Comment