The paper Scaling Assessment of Student Models with LLMs: Integrating Feedback into Practice has been accepted at the ICSE-SEET track (Software Engineering Education and Training) and will be presented at ICSE’26 in Rio de Janeiro, Brazil.
Automated assessment for UML modeling exercises is hard to scale: diagrams are open-ended, graphical, and highly contextual. This paper presents a production-ready extension of Athena, integrated into Artemis, that supports human-in-the-loop assessment by generating feedback suggestions that graders can review, adapt, or discard within their existing workflow.
To make LLM-based assessment feasible for graphical models, the approach introduces ApollonUML, a domain-specific textual representation that improves LLM interpretability while preserving precise links back to individual diagram elements. This enables contextualized, element-level feedback that is shown inline in the modeling environment.
In a large-scale retrospective evaluation on authentic student submissions, LLM-generated assessments showed moderate alignment with human grading and exhibited systematic deviations depending on submission quality and model complexity. While these results are not sufficient for fully automated grading, they demonstrate that LLM-based feedback can serve as effective draft assessments, reducing grader effort and supporting more consistent evaluation practices under human oversight.
- Scaling Assessment of Student Models with LLMs: Integrating Feedback into Practice by Maximilian Sölch and Stephan Krusche

Citation
Scaling Assessment of Student Models With LLMs: Integrating Feedback Into Practice
Maximilian Sölch, and
Stephan Krusche.
48th International Conference on Software Engineering
(ICSE-SEET '26
)
.
Rio de Janeiro, Brazil,
April
2026.