Examining ChatGPT5’s performance in EFL writing assessment and teacher acceptance in Chinese college English classrooms

Main Article Content

Wa Hong
Jinyan Huang
Yaxin Dong
Chen Cheng
Jun Yan

Abstract

Using univariate and multivariate generalizability (G-) theory, qualitative feedback analysis, and follow-up teacher interviews, this study assessed ChatGPT5's holistic and analytic scoring reliability, feedback effectiveness, and teacher acceptance in Chinese English-as-a-foreign-language (EFL) classroom writing assessments. Thirty CET-4 (College English Test-Band 4) essays from non-English major students were scored holistically and analytically (with time intervals) by ChatGPT5 Auto and ChatGPT5 Thinking, and by four college English teachers. They also provided feedback on the essays at the levels of language, content, and organization. Finally, teachers were interviewed about their acceptance of ChatGPT5 in their college English classrooms. The findings showed that ChatGPT5 Thinking consistently achieved higher scoring reliability that ChatGPT5 Auto and human raters. Both ChatGPT5 versions provided more comprehensive feedback than teachers, particularly in content and organization. Teachers viewed ChatGPT5 positively as a complementary assessment tool, but highlighted the need for combined feedback, better accessibility, and improved usability. Implications for Chinese college English teachers and their students to adopt ChatGPT5 in EFL writing classrooms are discussed.

Downloads

Download data is not yet available.

Article Details

Section

Assessment and Evaluation

Author Biography

Jinyan Huang, Jiangsu University, China

Jinyan Huang, Yaxin Dong, and Wa Hong share first authorship.

References

Allagui, B. (2023). Chatbot feedback on students’ writing: Typology of comments and effectiveness. In O.

Gervasi et al. (Eds.), ICCSA 2023 Workshops (LNCS 14112, pp. 377–384). Springer.

https://doi.org/10.1007/978-3-030-91394-1_35

American Educational Research Association (AERA), American Psychological Association (APA), &

National Council on Measurement in Education (NCME). (2014). Standards for educational and

psychological testing. American Educational Research Association.

Ansari, A. N., Ahmad, S., & Bhutta, S. M. (2023). Mapping the global evidence around the use of ChatGPT

in higher education: A systematic scoping review. Education and Information Technologies, 1–

41. https://doi.org/10.1007/s10639-023-12223-4

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles,

Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102

Brennan, R. L. (2001a). Generalizability theory. Springer-Verlag.

https://link.springer.com/10.1007/978-1-4757-3456-0

Brennan, R. L. (2001b). Manual for mGENOVA (Version 2.1). Iowa Testing Programs.

Camilleri, M. A. (2023). Artificial intelligence governance: Ethical considerations and implications for

social responsibility. Expert Systems, e13406. https://doi.org/10.1111/exsy.13406

Carless, D., Salter, D., Yang, M., & Lam, J. (2011). Developing sustainable feedback practices.

Studies in Higher Education, 36(4), 395–407. https://doi.org/10.1080/03075071003642449

Creswell, J. W., & Creswell, J. D. (2023). Research design: Qualitative, quantitative, and mixed methods

approaches (6th ed.). Sage Publications.

Crick, J. E., & Brennan, R. L. (1983). Manual for GENOVA: A generalized analysis of variance system

(Version 2.1). The American College Testing Program.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral

measurements: Theory of generalizability for scores and profiles. John Wiley & Sons, Inc.

Dehghani, H., & Mashhadi, A. (2024). Exploring Iranian English as a foreign language teachers’

acceptance of ChatGPT in English language teaching: Extending the technology acceptance

model. Education and Information Technologies, 1–22. https://doi.org/10.1007/s10639-024-12660-9

Farazouli, A., Cerratto-Pargman, T., Bolander-Laksov, K., & McGrath, C. (2023). Hello GPT! Goodbye

home examination? An exploratory study of AI chatbots’ impact on university Teachers’

assessment practices. Assessment & Evaluation in Higher Education, 49(3), 363–375.

https://doi.org/10.1080/02602938.2023.2241676

Gao, X., & Brennan, R. L. (2001). Variability of estimated variance components and related statistics in a

performance assessment. Applied Measurement in Education, 14(2), 191–203.

https://doi.org/10.1207/s15324818ame1402_5

Guo, K., Li, Y., Li, Y., & Chu, S. K. W. (2024). Understanding EFL students’ chatbot-assisted argumentative

writing: An activity theory perspective. Education and Information Technologies, 29(1), 1–20.

https://doi.org/10.1007/s10639-023-12230-5

Guo, K., & Wang, D. (2023). To resist it or to embrace it? Examining ChatGPT’s potential to support

teacher feedback in EFL writing. Education and Information Technologies, 1–29.

https://doi.org/10.1007/s10639-023-12146-0

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–

112. https://doi.org/10.3102/003465430298487

Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments?—A

generalizability theory approach. Assessing Writing, 13(3), 201–218.

https://doi.org/10.1016/j.asw.2008.10.002

Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large-scale ESL

writing assessment. Assessing Writing, 17(3), 123–139.

https://doi.org/10.1016/j.asw.2011.12.003

Huang, J., & Whipple, P. B. (2023). Rater variability and reliability of constructed response questions in

New York State high-stakes tests of English language arts and mathematics: Implications for

educational assessment policy. Humanities & Social Sciences Communications, 10(1), 1–10.

https://doi.org/10.1057/s41599-023-02385-4

Li, J., & Huang, J. (2022). The impact of essay organization and overall quality on the holistic scoring of

EFL writing: Perspectives from classroom English teachers and national writing raters. Assessing

Writing, 51, 100604. https://doi.org/10.1016/j.asw.2021.100604

Li, J., Huang, J., Wu, W., & Whipple, P. (2024). Evaluating the role of ChatGPT in enhancing EFL writing

assessments in classroom settings: A preliminary investigation. Humanities and Social Sciences

Communications, 11, 1268. https://doi.org/10.1057/s41599-024-03755-2

Link, S., Mehrzad, M., & Rahimi, M. (2022). Impact of automated writing evaluation on teacher

feedback, student revision, and writing improvement. Computer Assisted Language Learning,

35(4), 605–634. https://doi.org/10.1080/09588221.2020.1743323

Liu, G., & Ma, C. (2024). Measuring EFL learners’ use of ChatGPT in informal digital learning of English

based on the technology acceptance model. Innovation in Language Learning and Teaching,

18(2), 125–138. https://doi.org/10.1080/17501229.2023.2240316

Liu, Y., & Huang, J. (2020). The quality assurance of a national English writing assessment: Policy

implications for quality improvement. Studies in Educational Evaluation, 67,

100941. https://doi.org/10.1016/j.stueduc.2020.100941

Lu, Q., Yao, Y., Xiao, L., Yuan, M., Wang, J., & Zhu, X. (2024). Can ChatGPT effectively complement

teacher assessment of undergraduate students’ academic writing? Assessment & Evaluation in

Higher Education, 1–18. https://doi.org/10.1080/02602938.2024.2301722

Mohamed, A. M. (2024). Exploring the potential of an AI-based chatbot (ChatGPT) in enhancing English

as a foreign language (EFL) teaching: Perceptions of EFL faculty members. Education and

Information Technologies, 29(3), 3195–3217. https://doi.org/10.1007/s10639-023-11917-z

Praphan, P. W., & Praphan, K. (2023). AI technologies in the ESL/EFL writing classroom: The villain or the

champion? Journal of Second Language Writing, 62, 101072.

https://doi.org/10.1016/j.jslw.2023.101072

Qu, K., & Wu, X. (2024). ChatGPT as a CALL tool in language education: A study of hedonic motivation

adoption models in English learning environments. Education and Information Technologies, 1–

33. https://doi.org/10.1007/s10639-024-12598-y

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer (Measurement Methods for the

Social Sciences Series 1). Sage Publications.

Song, C., & Song, Y. (2023). Enhancing academic writing skills and motivation: Assessing the efficacy of

ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14, 1260843.

https://doi.org/10.3389/fpsyg.2023.1260843

Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms.

Assessing Writing, 57, 100752. https://doi.org/10.1016/j.asw.2023.100752

Wu, W., Huang, J., Han, C., & Zhang, J. (2022). Evaluating peer feedback as a reliable and valid

complementary aid to teacher feedback in EFL writing classrooms: A feedback giver perspective.

Studies in Educational Evaluation, 73, 101140. https://doi.org/10.1016/j.stueduc.2022.101140

Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation.

Education and Information Technologies, 28(11), 13943–13967.

https://doi.org/10.1007/s10639-023-11742-4

Yeh, H. C. (2024). The synergy of generative AI and inquiry-based learning: Transforming the landscape

of English teaching and learning. Interactive Learning Environments, 1–15.

https://doi.org/10.1080/10494820.2024.2335491

Zhang, R., Zou, D., & Cheng, G. (2023a). Chatbot-based learning of logical fallacies in EFL writing:

Perceived effectiveness in improving target knowledge and learner motivation. Interactive

Learning Environments, 1–18. https://doi.org/10.1080/10494820.2023.2220374

Zhang, R., Zou, D., & Cheng, G. (2023b). Chatbot-based training on logical fallacy in EFL argumentative

writing. Innovation in Language Learning and Teaching, 17(5), 932–945.

https://doi.org/10.1080/17501229.2023.2197417

Zhao, C., & Huang, J. (2020). The impact of the scoring system of a large-scale standardized EFL writing

assessment on its score variability and reliability: Implications for assessment policymakers.

Studies in Educational Evaluation, 67, 100911. https://doi.org/10.1016/j.stueduc.2020.100911

Zou, M., & Huang, L. (2023a). The impact of ChatGPT on L2 writing and expected responses: Voice from

doctoral students. Education and Information Technologies.

https://doi.org/10.1007/s10639-023-12397-x

Zou, M., & Huang, L. (2023b). To use or not to use? Understanding doctoral students’ acceptance of

ChatGPT in writing through the technology acceptance model. Frontiers in Psychology, 14,

1259531. https://doi.org/10.3389/fpsyg.2023.1259531

Similar Articles

You may also start an advanced similarity search for this article.