Veröffentlicht: von

Benchmarking as an established approach to make the performance of systems measurable and, above all, comparable, has recently been increasingly applied to the code generation capability of LLMs. Prominent examples are HumanEval, CodeXGLUE or CoNaLu. The development of benchmark performances over time demonstrates the growing maturity and increasing usability of LLMs, especially for widely used programming languages such as Python or JAVA. From the perspective of computer science education, this raises the question of at what level of maturity for which languages and for which aspects of teaching AI tools can be used reasonably in teaching. However, this is not just about code generation. Use cases for the use of LLM concern both learners (e.g. generating exemplary solutions, varying solutions, checking code) and Logic flow of the GenAi ABAP Benchmarkteachers (e.g. generating exercises, illustrative programming examples, explaining and documenting program code).

In a project at TH Köln, Stephan Wallraven and Tim Köhne examined the maturity status of ChatGPT (as of fall 2023) for supporting the ABAP learning process. As a proprietary language, ABAP is relevant for the implementation of business functions in SAP systems, for example for business informatics courses. In the project, the HumanEval benchmark was adapted to ABAP so that the ABAP function modules generated by ChatGPT could be automatically evaluated for the extensive set of benchmark tasks. In addition, the benchmark was extended to include aspects such as error detection, error correction and code explanation.

Initial results show that the success rate for the generated ABAP code is many times lower than for ChatGPT-generated Python code. However, the code explanation provides promising results.

Numerous announcements and previews from major players in the ERP market such as Microsoft and SAP indicate that the next generation of development platforms will include built-in AI-based code generators in the near future. And that certainly applies to ABAP too. Here, the ABAP benchmark environment can be used in further investigations to monitor and compare the progress of the upcoming releases.