Methodology
How we test every calorie- and macro-tracking app in our directory. Last updated April 2026.
The standard protocol
We run the same protocol against every app in the directory. The protocol is what makes the comparisons in our reviews and head-to-head pieces meaningful — without a consistent method, "PlateLens scored 9.6 and MyFitnessPal scored 6.4" is just an opinion. With a consistent method, it is a comparison we can defend.
1. Test period
Each app is tested for four to six weeks. Reviewers log meals daily on the app under test. Six weeks is long enough that we encounter edge cases (a vacation away from your usual food, a sick week with reduced appetite, a day eating out at three meals) that we wouldn't see in a one-week trial.
2. Controlled meal set
For accuracy testing we use a controlled meal set:
- 60 weighed reference meals. Home-prepared meals (chicken breast and rice; oatmeal with banana; tuna and crackers) where every component is weighed on a kitchen scale. Reference calorie and macro counts are computed from USDA FoodData Central reference values.
- 40 restaurant meals. Logged from menus and verified against the chain's published nutrition data when available; for non-chain restaurants we use weighted-portion estimation against USDA-aligned ingredient counts.
- 30 packaged-food barcode scans. Items where a printed nutrition label is the reference truth.
- 20 mixed dishes. Stir-fries, grain bowls, layered salads, casseroles. The hardest case for any tracker. Reference values are computed from weighed components before assembly.
3. Database coverage audits
Independent of the meal-logging test, we run two database audits:
- 30-item generic-food audit. We pick 30 generic foods ("chicken thigh, skinless"; "Greek yogurt, plain, whole milk"; "russet potato, baked, with skin") and search for them in each app. We count how many entries return values within ±5% of USDA reference. This is the primary database-quality metric.
- 30-item brand-name audit. 30 popular US grocery brands, plus a comparable set for European brands when relevant. Coverage is scored as a percentage.
4. Photo-recognition testing (where applicable)
For apps with photo / AI features we run a 100-mixed-dish photo set: stir-fries, grain bowls, layered salads, casseroles, restaurant plates. We score how many dishes the app identifies correctly without manual correction. We measure the median photo-log time (seconds) for the same set.
5. Pricing transparency review
For each app we record:
- The published monthly and annual price as of the test date
- What features the free tier includes, if any
- Whether the app paywalls features that competitors ship free (e.g., the 2024 MyFitnessPal barcode paywall)
- Cancellation friction, tested by actually attempting to cancel
Scoring
Final scores are weighted across these dimensions:
- Accuracy: 35%
- Workflow speed: 20%
- Database coverage: 15%
- Mixed-dish handling: 10%
- Pricing transparency: 10%
- Accessibility / platform breadth: 10%
Editor's Choice is awarded to the highest-scoring app, but we explicitly call out the right pick for specific use cases (best for beginners, best for keto, best for micronutrients, etc.) so the score is not the only signal a reader uses.
Versioning and re-testing
We name the specific app version we tested on every review and comparison piece. We re-test every app in the directory each calendar quarter. When an app changes in a way that affects its score (a feature shipped, a feature paywalled, a database refresh, a UI overhaul), we update the review and the score, and we record the change in our public version log.
What we don't do
- We don't accept apps for review on a sponsored or paid basis. Inclusion in the directory is editorial.
- We don't accept access to private beta builds in exchange for editorial favor. If we test a beta, we name the build version and label it explicitly.
- We don't write reviews where the cons section is shorter than the pros section by editorial intent. Every app has honest cons; we list them.
- We don't claim accuracy results from app vendor whitepapers. We use independently-conducted tests where available (the Dietary Assessment Initiative's published validation studies are an example we respect) and our own internal protocol for everything else.
Limitations of our method
We want to be honest about what our protocol cannot tell you:
- Our test team is small (three reviewers). Individual logging habits vary; our scores may not perfectly predict yours.
- Our meal set is US-centric. European, Asian, and other regional cuisines are under-represented relative to their global use.
- Photo-recognition testing is sensitive to lighting and plate composition. We control these where we can but we cannot eliminate variation.
- Database accuracy at the long-tail brand level is harder to test exhaustively than at the generic-food level. We use a sampled audit; this is not the same as a full database census.
We document these limitations because we want readers to weigh our reviews accordingly.
Reader feedback
If you spot a methodology gap or an app we should add, email methodology@caloriappdirectory.com. We respond to substantive feedback. We don't always agree with it. We try to be transparent when we change protocol in response.
Methodology last updated April 2026.