Large language systems (LLMs) have achieved remarkable performances in various natural language processing tasks. Scientific text summarization is a particularly challenging task due to the technical nature of scientific documents. Evaluating LLMs on this unique task requires carefully constructed benchmarks and assessment tools. Several investig