Examining ChatGPT5’s performance in EFL writing assessment and teacher acceptance in Chinese college English classrooms
Main Article Content
Abstract
Using univariate and multivariate generalizability (G-) theory, qualitative feedback analysis, and follow-up teacher interviews, this study assessed ChatGPT5's holistic and analytic scoring reliability, feedback effectiveness, and teacher acceptance in Chinese English-as-a-foreign-language (EFL) classroom writing assessments. Thirty CET-4 (College English Test-Band 4) essays from non-English major students were scored holistically and analytically (with time intervals) by ChatGPT5 Auto and ChatGPT5 Thinking, and by four college English teachers. They also provided feedback on the essays at the levels of language, content, and organization. Finally, teachers were interviewed about their acceptance of ChatGPT5 in their college English classrooms. The findings showed that ChatGPT5 Thinking consistently achieved higher scoring reliability that ChatGPT5 Auto and human raters. Both ChatGPT5 versions provided more comprehensive feedback than teachers, particularly in content and organization. Teachers viewed ChatGPT5 positively as a complementary assessment tool, but highlighted the need for combined feedback, better accessibility, and improved usability. Implications for Chinese college English teachers and their students to adopt ChatGPT5 in EFL writing classrooms are discussed.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Allagui, B. (2023). Chatbot feedback on students’ writing: Typology of comments and effectiveness. In O.
Gervasi et al. (Eds.), ICCSA 2023 Workshops (LNCS 14112, pp. 377–384). Springer.
https://doi.org/10.1007/978-3-030-91394-1_35
American Educational Research Association (AERA), American Psychological Association (APA), &
National Council on Measurement in Education (NCME). (2014). Standards for educational and
psychological testing. American Educational Research Association.
Ansari, A. N., Ahmad, S., & Bhutta, S. M. (2023). Mapping the global evidence around the use of ChatGPT
in higher education: A systematic scoping review. Education and Information Technologies, 1–
41. https://doi.org/10.1007/s10639-023-12223-4
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles,
Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
Brennan, R. L. (2001a). Generalizability theory. Springer-Verlag.
https://link.springer.com/10.1007/978-1-4757-3456-0
Brennan, R. L. (2001b). Manual for mGENOVA (Version 2.1). Iowa Testing Programs.
Camilleri, M. A. (2023). Artificial intelligence governance: Ethical considerations and implications for
social responsibility. Expert Systems, e13406. https://doi.org/10.1111/exsy.13406
Carless, D., Salter, D., Yang, M., & Lam, J. (2011). Developing sustainable feedback practices.
Studies in Higher Education, 36(4), 395–407. https://doi.org/10.1080/03075071003642449
Creswell, J. W., & Creswell, J. D. (2023). Research design: Qualitative, quantitative, and mixed methods
approaches (6th ed.). Sage Publications.
Crick, J. E., & Brennan, R. L. (1983). Manual for GENOVA: A generalized analysis of variance system
(Version 2.1). The American College Testing Program.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral
measurements: Theory of generalizability for scores and profiles. John Wiley & Sons, Inc.
Dehghani, H., & Mashhadi, A. (2024). Exploring Iranian English as a foreign language teachers’
acceptance of ChatGPT in English language teaching: Extending the technology acceptance
model. Education and Information Technologies, 1–22. https://doi.org/10.1007/s10639-024-12660-9
Farazouli, A., Cerratto-Pargman, T., Bolander-Laksov, K., & McGrath, C. (2023). Hello GPT! Goodbye
home examination? An exploratory study of AI chatbots’ impact on university Teachers’
assessment practices. Assessment & Evaluation in Higher Education, 49(3), 363–375.
https://doi.org/10.1080/02602938.2023.2241676
Gao, X., & Brennan, R. L. (2001). Variability of estimated variance components and related statistics in a
performance assessment. Applied Measurement in Education, 14(2), 191–203.
https://doi.org/10.1207/s15324818ame1402_5
Guo, K., Li, Y., Li, Y., & Chu, S. K. W. (2024). Understanding EFL students’ chatbot-assisted argumentative
writing: An activity theory perspective. Education and Information Technologies, 29(1), 1–20.
https://doi.org/10.1007/s10639-023-12230-5
Guo, K., & Wang, D. (2023). To resist it or to embrace it? Examining ChatGPT’s potential to support
teacher feedback in EFL writing. Education and Information Technologies, 1–29.
https://doi.org/10.1007/s10639-023-12146-0
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–
112. https://doi.org/10.3102/003465430298487
Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments?—A
generalizability theory approach. Assessing Writing, 13(3), 201–218.
https://doi.org/10.1016/j.asw.2008.10.002
Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large-scale ESL
writing assessment. Assessing Writing, 17(3), 123–139.
https://doi.org/10.1016/j.asw.2011.12.003
Huang, J., & Whipple, P. B. (2023). Rater variability and reliability of constructed response questions in
New York State high-stakes tests of English language arts and mathematics: Implications for
educational assessment policy. Humanities & Social Sciences Communications, 10(1), 1–10.
https://doi.org/10.1057/s41599-023-02385-4
Li, J., & Huang, J. (2022). The impact of essay organization and overall quality on the holistic scoring of
EFL writing: Perspectives from classroom English teachers and national writing raters. Assessing
Writing, 51, 100604. https://doi.org/10.1016/j.asw.2021.100604
Li, J., Huang, J., Wu, W., & Whipple, P. (2024). Evaluating the role of ChatGPT in enhancing EFL writing
assessments in classroom settings: A preliminary investigation. Humanities and Social Sciences
Communications, 11, 1268. https://doi.org/10.1057/s41599-024-03755-2
Link, S., Mehrzad, M., & Rahimi, M. (2022). Impact of automated writing evaluation on teacher
feedback, student revision, and writing improvement. Computer Assisted Language Learning,
35(4), 605–634. https://doi.org/10.1080/09588221.2020.1743323
Liu, G., & Ma, C. (2024). Measuring EFL learners’ use of ChatGPT in informal digital learning of English
based on the technology acceptance model. Innovation in Language Learning and Teaching,
18(2), 125–138. https://doi.org/10.1080/17501229.2023.2240316
Liu, Y., & Huang, J. (2020). The quality assurance of a national English writing assessment: Policy
implications for quality improvement. Studies in Educational Evaluation, 67,
100941. https://doi.org/10.1016/j.stueduc.2020.100941
Lu, Q., Yao, Y., Xiao, L., Yuan, M., Wang, J., & Zhu, X. (2024). Can ChatGPT effectively complement
teacher assessment of undergraduate students’ academic writing? Assessment & Evaluation in
Higher Education, 1–18. https://doi.org/10.1080/02602938.2024.2301722
Mohamed, A. M. (2024). Exploring the potential of an AI-based chatbot (ChatGPT) in enhancing English
as a foreign language (EFL) teaching: Perceptions of EFL faculty members. Education and
Information Technologies, 29(3), 3195–3217. https://doi.org/10.1007/s10639-023-11917-z
Praphan, P. W., & Praphan, K. (2023). AI technologies in the ESL/EFL writing classroom: The villain or the
champion? Journal of Second Language Writing, 62, 101072.
https://doi.org/10.1016/j.jslw.2023.101072
Qu, K., & Wu, X. (2024). ChatGPT as a CALL tool in language education: A study of hedonic motivation
adoption models in English learning environments. Education and Information Technologies, 1–
33. https://doi.org/10.1007/s10639-024-12598-y
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer (Measurement Methods for the
Social Sciences Series 1). Sage Publications.
Song, C., & Song, Y. (2023). Enhancing academic writing skills and motivation: Assessing the efficacy of
ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14, 1260843.
https://doi.org/10.3389/fpsyg.2023.1260843
Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms.
Assessing Writing, 57, 100752. https://doi.org/10.1016/j.asw.2023.100752
Wu, W., Huang, J., Han, C., & Zhang, J. (2022). Evaluating peer feedback as a reliable and valid
complementary aid to teacher feedback in EFL writing classrooms: A feedback giver perspective.
Studies in Educational Evaluation, 73, 101140. https://doi.org/10.1016/j.stueduc.2022.101140
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation.
Education and Information Technologies, 28(11), 13943–13967.
https://doi.org/10.1007/s10639-023-11742-4
Yeh, H. C. (2024). The synergy of generative AI and inquiry-based learning: Transforming the landscape
of English teaching and learning. Interactive Learning Environments, 1–15.
https://doi.org/10.1080/10494820.2024.2335491
Zhang, R., Zou, D., & Cheng, G. (2023a). Chatbot-based learning of logical fallacies in EFL writing:
Perceived effectiveness in improving target knowledge and learner motivation. Interactive
Learning Environments, 1–18. https://doi.org/10.1080/10494820.2023.2220374
Zhang, R., Zou, D., & Cheng, G. (2023b). Chatbot-based training on logical fallacy in EFL argumentative
writing. Innovation in Language Learning and Teaching, 17(5), 932–945.
https://doi.org/10.1080/17501229.2023.2197417
Zhao, C., & Huang, J. (2020). The impact of the scoring system of a large-scale standardized EFL writing
assessment on its score variability and reliability: Implications for assessment policymakers.
Studies in Educational Evaluation, 67, 100911. https://doi.org/10.1016/j.stueduc.2020.100911
Zou, M., & Huang, L. (2023a). The impact of ChatGPT on L2 writing and expected responses: Voice from
doctoral students. Education and Information Technologies.
https://doi.org/10.1007/s10639-023-12397-x
Zou, M., & Huang, L. (2023b). To use or not to use? Understanding doctoral students’ acceptance of
ChatGPT in writing through the technology acceptance model. Frontiers in Psychology, 14,