Skip to content

Evaluation

This project aims to evaluate how well structured reasoning improves reliability of outputs.

Current Status

  • Qualitative examples demonstrate behavior
  • System identifies unsupported or exaggerated claims

Planned Evaluation

  • Dataset of claims (news / public statements)
  • Comparison with baseline LLM outputs
  • Metrics:
  • Credibility accuracy
  • Detection of unsupported claims
  • Quality of reasoning

Goal

To measure whether structured reasoning reduces misleading or unsupported outputs.