Sessions / Testing and Evaluation

Cancelled Goodwill or wind chill? Why is gaining insight into school ethos important? #2981


Institutional evaluation, a requirement to improve the quality and effectiveness of teaching, is carried out through the introduction of end-of-semester student evaluation of teaching. Is this sufficient? The “ethos” of an educational institution is the bedrock of all that takes place within a school but which often get bypassed in formal evaluations of quality and standards. While difficult to define, a school’s ethos can be described in terms of first impressions, the “feel” of the environment, and is composed of values and beliefs, attitudes, relationships. Many studies of the use of end of semester evaluation show that students respond by how they feel on the day of evaluation which is subject to extraneous variables. What are student evaluations influenced by? What is the everyday reality inside an institution? What values do students suggest within an institution? This presentation examines in more depth the importance of measuring a school’s ethos, shows the results from 250 questionnaire respondents, and offers some suggestions for improving the tertiary learning environment.

Comparing the online and paper-based versions of the TOEIC L&R #2893

Fri, Jul 8, 17:50-18:15 Asia/Tokyo | LOCATION: F33 HYBRID

Shōzan University (a pseudonym) uses the Computerized Assessment System for English Communication (CASEC) for placement. Students complete the CASEC at home in late March, then the TOEIC Listening & Reading (L&R) a few weeks later once classes have begun. Two cohorts, 2018 and 2019, completed a paper-based version of the TOEIC L&R; however, two cohorts, 2020 and 2021, completed a novel online version of the TOEIC L&R, which reportedly results in similar scores as the conventional paper-based TOEIC L&R (IIBC, 2020). Author (2020) revealed that all cohorts had similar CASEC scores; however, TOEIC L&R scores from the online version were significantly higher than from the paper-based test. Author (2020) also reported on a sub-group (n = 57) who completed both versions of the TOEIC L&R, with online test scores being significantly higher than the conventional paper-based scores. However, the former was completed without proctors. The current presentation will report on a study from February 2022. Participants (n = 80) will complete the two different versions of the TOEIC L&R with participants randomly assigned to two groups (i.e., paper-online or online-paper). Both groups will complete both tests under supervision.

Assessing the validity of essay marking rubrics #2887

Sat, Jul 9, 11:45-12:10 Asia/Tokyo | LOCATION: F31

As English high school curricula becomes increasingly communication-oriented, it is becoming more necessary to develop university entrance tests which assess students’ ability to produce target language based on communicative goals rather than to translate between languages or select correct answers. A potential problem with these more communication-oriented test questions is they may risk sacrificing reliability for validity; however, the use of rubrics can ensure that both reliability and validity remain high (Jonsson and Svingby, 2007). This presentation looks at the results of a preliminary study to determine if a university entrance exam rubric results in high inter-rater reliability. The study looks at the test scores of three types of markers: 1) those trained to apply the rubric; 2) those who have seen the rubric but have not been trained to apply it; and 3) those who have not seen the rubric. It aims to answer the following questions: Does the rubric achieve a Cohen's kappa value greater than 0.7 for inter-rater reliability 1. between trained markers? 2. between trained and untrained markers? 3. between trained markers and markers who have not seen the rubric?

The findings of this study will interest educators involved in test and assessment design.

Cancelled An online application for advancing quantitative data analysis #2694

Sat, Jul 9, 16:00-16:40 Asia/Tokyo CANCELLED

One aspect of reimagining language learning research involves new approaches to data analysis. For quantitative research into foreign language learning or teaching, the dominant approach is arguably inferential statistics for statistical significance testing. Here, a test statistic (p-value) is calculated from sampled data, and decisions on the variables being tested-whether to accept or reject them as in some way contributing to the processes under study-are made based on the calculated p-value. However, this approach has long been recognised by numerous methodologists and theorists as potentially flawed, possibly holding back much research from contributing to substantive theory creation. Alternative measures are recommended; if not rejecting the approach outright, it is suggested that the results of statistical significance tests are augmented with measures of effect size, confidence intervals, robust variations of inferential statistics, and data-rich graphical plots. This presentation introduces an online application designed to help researchers carry out quantitative analysis focused on these alternatives to significance testing. Aimed particularly at less-experienced researchers, it requires little more than the input of data for the output of a range of useful statistics and plots. The rationale behind and usage of the application will be covered.

English private tutoring and washback from university entrance exams #2826

Sat, Jul 9, 16:25-16:50 Asia/Tokyo | LOCATION: F31: DO NOT RECORD

It is commonplace in Japan for students in formal education to receive additional English tuition that is provided outside of regular school hours and typically for a fee. Such English Private Tutoring (EPT) (Yung & Bray, 2017) has largely escaped the attention of researchers, though recent studies have sought to investigate its role in language education (see Yung & Hajar, forthcoming). In this presentation, the focus is on the role of juku (cram schools) and yobikō (preparatory schools) in preparing students for high-stakes university entrance exams, which is one of the well-established functions of EPT in Japan. The discussion is framed in terms of test washback, that is, the effect(s) that tests have on learning and teaching. Within this theoretical framework, the key findings will be synthesised from the available studies that have investigated learning, teaching and assessment of English in juku and yobikō. Based on these empirical studies, the presenter will illustrate how exam content drives teaching and learning within EPT and how narrowing of the curriculum in EPT can affect teaching and learning in mainstream education. The presentation will be concluded with a call for further research into this much overlooked area of education.

TOEIC listening/speaking prep: collaboration and real-world application #2967

Sun, Jul 10, 11:45-12:10 Asia/Tokyo | LOCATION: E25

Instructional practices for standardized tests like the TOEIC often rely on a teacher-centered “teach-model-practice-explain” pattern that is at odds with the Communicative Approach. Test prep materials similarly emphasize lower-order basic comprehension and recall. In this practice-oriented presentation, attendees will acquire four TOEIC listening/speaking section prep activities that focus on integrating into the “practice” stage the opportunity for students to collaborate, solve problems, and apply their learning to real-world contexts. First, the presenters will briefly cover the presentation’s motivation, context, and teaching gap (2 minutes). The presenters will then share step-by-step how-to's for the activities, which include a business consulting project for TOEIC speaking and reasoning gap for TOEIC listening (10 minutes). Participants will demo one activity (8 minutes) and finally collaborate in small groups applying the activities to their own teaching contexts (5 minutes). Attendees will walk away with effective, meaningful, motivating TOEIC prep activities that they can immediately begin using in their classes.