TY - JOUR KW - NAM evaluation KW - generalizability KW - performance metrics KW - replicability KW - Risk Assessment KW - variability AU - Agnes L. Karmaus AU - Anna L. Kreutz AU - Oluwakemi Oyetade AU - Katie Paul Friedman AU - Martin Paparella AU - Emily N. Reinke AU - David Allen AU - Helena T. Hogberg AU - Nicole C. Kleinstreuer AB - IntroductionAnimal studies have historically informed toxicological testing and safety assessments. However, assessment of the variability in both quantitative and qualitative results has been limited. Biological variability, experimental differences, interpretation of categorical endpoints, and data availability and curation approaches all contribute to the quantified variability.MethodsA literature review was conducted to identify publications describing variability analyses for in vivo toxicology studies. Variability analyses were evaluated and summarized for a variety of toxicological endpoints: ocular irritation, dermal sensitization and irritation, acute oral and inhalation lethality, subchronic and chronic toxicity, carcinogenicity, neurotoxicity including DNT, endocrine, and genotoxicity.ResultsThis review summarizes published investigations of variability within mammalian toxicological studies that have been largely conducted in accordance with health effects test guidelines. The results of this review suggest that replicability of in vivo toxicological guideline studies varies widely by study type, endpoint complexity, and classification approach.DiscussionWhile any test system will have inherent variability, understanding its sources and impact on study interpretation will help ensure that appropriate confidence is applied when using the test method. Furthermore, such information aids in establishing relevant metrics to serve as baselines for informing performance characterization of new approach methodologies (NAMs). Future evaluation of NAMs should be contextualized using estimates of uncertainty and variance of the traditional study data to demonstrate “better” performance compared to traditional testing approaches. Robust understanding of guideline study performance is important for risk assessments, where it is important to find species-relevant NAMs that can perform at least as well as existing bioassays. BT - Frontiers in Toxicology DA - 2026-03-02 DO - 10.3389/ftox.2026.1778353 LA - English N2 - IntroductionAnimal studies have historically informed toxicological testing and safety assessments. However, assessment of the variability in both quantitative and qualitative results has been limited. Biological variability, experimental differences, interpretation of categorical endpoints, and data availability and curation approaches all contribute to the quantified variability.MethodsA literature review was conducted to identify publications describing variability analyses for in vivo toxicology studies. Variability analyses were evaluated and summarized for a variety of toxicological endpoints: ocular irritation, dermal sensitization and irritation, acute oral and inhalation lethality, subchronic and chronic toxicity, carcinogenicity, neurotoxicity including DNT, endocrine, and genotoxicity.ResultsThis review summarizes published investigations of variability within mammalian toxicological studies that have been largely conducted in accordance with health effects test guidelines. The results of this review suggest that replicability of in vivo toxicological guideline studies varies widely by study type, endpoint complexity, and classification approach.DiscussionWhile any test system will have inherent variability, understanding its sources and impact on study interpretation will help ensure that appropriate confidence is applied when using the test method. Furthermore, such information aids in establishing relevant metrics to serve as baselines for informing performance characterization of new approach methodologies (NAMs). Future evaluation of NAMs should be contextualized using estimates of uncertainty and variance of the traditional study data to demonstrate “better” performance compared to traditional testing approaches. Robust understanding of guideline study performance is important for risk assessments, where it is important to find species-relevant NAMs that can perform at least as well as existing bioassays. PY - 2026 ST - Perspectives on variability of in vivo toxicology studies T2 - Frontiers in Toxicology TI - Perspectives on variability of in vivo toxicology studies: considerations for next-generation toxicology UR - https://www.frontiersin.org/journals/toxicology/articles/10.3389/ftox.2026.1778353/full VL - 8 Y2 - 2026-03-10 SN - 2673-3080 ER -