The Challenge of Low-Resource Data

Dyslexic learners are increasingly adopting AI tools for academic support, yet their specific lived experiences remain under-researched. Analyzing this data is difficult because online forum discussions are inherently "noisy" and lack the structure required for traditional NLP analysis. DysLexLens addresses this by providing an end-to-end architecture that transforms raw social media data into a structured, verifiable corpus.

The DysLexLens Architecture

The framework operates through a four-stage pipeline designed to ensure data relevance and response accuracy:

  • Dictionary-Driven Filtering: To overcome the noise of general social media, the framework uses a specialized dictionary-driven method to isolate relevant posts regarding dyslexia and AI, creating a focused corpus.
  • KG-Based Reasoning: It integrates LLM-assisted semantic analysis with Knowledge Graphs (KG) to perform complex query reasoning, allowing for deeper pattern discovery than standard retrieval methods.
  • Verifiable Response Generation: The system is built to be evidence-traceable, ensuring that the responses generated by the LLM can be mapped back to the source data.
  • Dual-Layer Evaluation: The framework employs quantitative metrics—specifically RAGAS and Query Robustness—alongside structured qualitative validation guidelines to assess response quality, with a primary focus on minimizing hallucinations and ensuring evidence alignment.

Practical Application and Generalizability

The authors demonstrated the framework's effectiveness using 30 specific questions derived from Reddit forum data. Beyond the immediate use case of dyslexia research, the architecture is designed for generalizability, offering a template for researchers working with other low-resource or niche online forum contexts where data quality is a significant barrier to insight.