Structure claims and function claims require fundamentally different evidence
The bench is the right place to measure what a compound does at a molecular level. It is the wrong place to measure whether a person feels less stressed, sleeps better, or has more energy. These are different scientific questions. They require different tools. NRI integrates both.
Two categories of claim. Two categories of evidence.
Analytical chemistry and biochemistry produce rigorous, reproducible measurements of what a compound is and how it behaves in controlled conditions. That training is exactly right for structure claims. The real world introduces variables the bench was designed to eliminate. A human body is not a controlled condition. Biological variability, psychological context, behavioral compliance, and lived experience all affect how a person responds to an intervention. Function claims operate in that reality.
What the compound does at a molecular or physiological level
Measured through assays, biomarkers, and laboratory analysis. This is where analytical chemistry is most useful. The bench is the right tool.
- Antioxidant capacity via ORAC assay
- Anti-inflammatory via cytokine panel
- Bioavailability via plasma concentration
- Immune activation via NK cell activity
What the person experiences as a result
Measured through validated instruments that capture human experience directly. The bench cannot access this data.
- Reduces stress via validated stress instrument
- Supports restful sleep via sleep quality scale
- Promotes mental clarity via cognition scale
- Maintains healthy energy via energy scale
The FDA and FTC evaluate structure/function claims on evidence that matches the claim. If your claim is about human function, your evidence needs to come from humans reporting their function. This is not a methodology preference. It is a definitional requirement.
Objective does not mean accurate. Proxy is not the same as direct.
Laboratory endpoints are a core part of NRI's measurement approach. But when used alone for function claims, they are proxies for what you are actually trying to demonstrate. A proxy adds an inference layer that a direct measure does not require, and it frequently introduces noise that buries a real effect.
Objective, reproducible, and mostly irrelevant
Blood pressure fluctuates based on temperature, posture, time of day, hydration, recent conversation, and dozens of variables unrelated to psychological stress. A supplement that genuinely reduces perceived stress may produce no change in blood pressure. The proxy says no effect. The participant's experience says otherwise.
Estradiol does not tell you about the flash
Estradiol can be within a clinically normal range while a woman experiences ten hot flashes a day. The flash is driven by a neuroendocrine cascade that varies between individuals with identical hormone profiles. The hormone level does not capture the symptom burden. The symptom burden is what the claim targets.
Seven hours of bad sleep is not seven hours of sleep
Polysomnography tells you a participant slept 7.5 hours. It cannot tell you they felt unrested at 6am and could not concentrate all afternoon. A product that improves quality without changing duration fails a duration-only endpoint despite delivering exactly what the consumer wants.
Self-report is not the same as subjective. A validated PROM is not a survey.
Patient Reported Outcome Measures are frequently mischaracterized as soft or impressionistic. There is a meaningful difference between asking someone "do you feel stressed, scale of 1 to 10" and administering a validated multidimensional instrument with a confirmed factor structure, calibrated item parameters, and known-groups validity. One is a survey. The other is a measurement tool.
A well-constructed PROM is not asking for an opinion. It is asking for a structured, calibrated report of an observable phenomenon that the participant is uniquely positioned to observe. Pain, stress, sleep quality, and cognitive function are not accessible by external measurement alone. For these outcomes, the participant is not just the most convenient data source. They are the most accurate one.
"In the past week, how many times did you forget a word while speaking?" This is a behavioral frequency question. The participant is reporting a count, not a feeling. The item feeds into a working memory subscale with high test-retest reliability. The fact that it is delivered as self-report does not make it less rigorous than a blood draw. For this construct, it is more rigorous, because no blood draw accesses working memory.
Validated does not mean fit for purpose
A psychometric instrument is validated for a specific population, at a specific time, measuring a specific construct. Using it outside those conditions produces unreliable data regardless of its citation count. Most available instruments were built for clinical populations, disease screening, or academic research, not for detecting wellness improvements in healthy consumers.
The PSS was developed for populations under significant stress load. A healthy nutraceutical consumer seeking improvement from a normal baseline is a different population. The instrument was not built to detect change at that level, and it does not.
The PSS asks about the past month. A two-week intervention using the PSS cannot detect a two-week change. The temporal window invalidates the design regardless of everything else done correctly. This produces false negatives in otherwise well-designed trials.
Instruments designed to screen for clinical disorder detect pathological levels. Healthy participants score near the floor at baseline. There is no room to improve. The instrument cannot see the range where the effect lives.
How people describe and experience stress in 2026 is not how they described it when many gold standard measures were developed. A validated instrument from 40 years ago may not be valid in your current consumer population.
Fit for purpose means: built for this population, sensitive to change at this effect size, covering the right time horizon, using language this group actually uses today, and measuring the construct your claim targets.
Scale development is not writing questions. It is a multi-stage validation process.
Each NRI instrument began with a dual-track review: existing stress-related and condition-specific questionnaires were evaluated to map the domains covered in the literature, and published qualitative research examining how otherwise healthy individuals actually describe their symptoms was reviewed to identify natural language and real-world themes. This ensures items reflect how your consumer population experiences the construct, not how clinicians have historically categorized it.
A long-form pilot instrument was administered to an initial sample. Open-ended responses were analyzed for emergent themes and used to refine item language. Within-domain correlation matrices were computed and redundant items removed before the final instrument was submitted to primary validation.
CFA using maximum likelihood estimation confirmed the multidimensional domain structure. Each domain loads cleanly onto its factor with acceptable fit indices, confirming the instrument measures what it claims to measure across independent subscales.
Cronbach's alpha computed across all subdomains. NRI-SRI alpha ranges from 0.79 to 0.91 across domains, meeting the threshold for both research and clinical use. Items with low corrected item-total correlations were removed during development.
Item discrimination and difficulty parameters estimated using the Graded Response Model. This identifies items that perform well across the full range of the construct, not just at clinical severity levels, which is the specific challenge for wellness population measurement.
Domain scores compared across self-reported stress levels to confirm the instrument distinguishes meaningfully between groups known to differ. The practical test: does the scale actually separate people who are more stressed from people who are less stressed?
NRI-SRI validated against established measures including PSS-10 (r=0.78) and DASS-21 stress subscale (r=0.81), confirming convergent validity. Discriminant validity demonstrated against measures of unrelated constructs. The instrument correlates with what it should and does not correlate with what it should not.
Purpose-built for nutraceutical research. Validated for your population.
Each NRI instrument was developed under FDA Patient Reported Outcome guidance for non-diseased populations experiencing wellness changes. Validated against established measures for construct validity, then refined for the sensitivity range relevant to botanical and nutraceutical interventions. Multidimensional by design: every scale captures a primary construct across independent domains so you can match data to claims with precision.
Stress Response Inventory
Validated against PSS-10 (r=0.78) and DASS-21 (r=0.81). Cronbach's alpha 0.79 to 0.91. CFA confirmed. IRT-calibrated items. Used in published adaptogen and stress trials.
Energy Scale
Six-domain fatigue and energy instrument. Captures independent changes across physical, cognitive, and emotional energy at the effect sizes typical of energy and adaptogen interventions.
Menopause Symptom Scale
Developed under FDA PRO guidance. Five domains including cognitive function, absent from most existing menopause scales. Active validation in NRI-MENO-001.
Sleep Scale
Captures restorative sleep and next-day function, not duration alone. Designed for sensitivity to sleep quality changes in healthy populations.
Cognition Scale
Captures cognitive wellness changes in non-diseased populations at the effect sizes typical of botanical cognitive interventions.
Women's Health Scales
Endpoint suite for premenopausal women's health trials. Validated for use with hormonal and botanical interventions.
NRI integrates lab and psychometric endpoints. Neither alone is sufficient.
Two failure modes exist in nutraceutical research. The first is running biomarker panels and calling them a clinical trial, while ignoring what participants actually experience. The second is running unvalidated app surveys and calling them PROMs. NRI does neither. We integrate validated psychometrics with laboratory endpoints because complete data requires both.
For most structure/function claims, validated psychometrics are the primary evidence. Laboratory endpoints corroborate, provide mechanistic context, and fulfill safety monitoring requirements. A psychometric result supported by a relevant biomarker is more compelling than either alone.
On-site laboratory endpoints
Routine blood draws for biomarker panels, screening procedures, and nutrient absorption studies. Available for both localized and decentralized protocols.
- Hormone panels — menopause, women's health, stress
- Immune markers — elderberry, respiratory, immune health
- Cortisol awakening response — stress and adaptogen studies
- Inflammatory markers — pain, gut health, immune
- Metabolic panels — liver, kidney, blood sugar, lipids
- Microbiome sequencing — 16S rRNA for prebiotic and gut trials
In-home laboratory endpoints
Some biomarkers must be collected at home to avoid site-visit interference with target values. NRI participants are trained in home collection protocols ensuring compliance and validity.
- Salivary cortisol — CAR collection avoids site-visit elevation
- Comprehensive stool analysis — microbiome and GI function
- Dried blood spots — hormones and unique biomarkers
- Nasal swabs — influenza, COVID-19, respiratory incidence
- Wearable HRV data — sleep and autonomic function
Licensed gold-standard psychometrics are used alongside NRI instruments where regulatory credibility or comparison to existing literature requires it, including the PSS-10, PSQI, DASS-21, SF-36, and condition-specific measures selected for population and time horizon fit.
Questions about endpoints for your study?
Endpoint selection starts with your claims. Tell us what you need to prove.