Computer says no? (Little Britain, 2004)


As third year students, we are beginning to write our dissertations. This piece of work, which will require hours of researching, writing and evaluating, will hopefully be something we are extremely proud of. How would you then feel if you were told your dissertation would never be read? Your work would never inspire a reader, simply uploaded onto a machine that churns out your grade. This controversial method is automated grading, which I introduced last week as a potential solution for the subjectivity within grading.

It is important to note the arguments in favour of using automated grading. Computerised marking is consistent and fairer than human marking, due to the removal of human error and subjectivity (Valenti, Neri & Cucchiarelli, 2003; Jordan, 2012). Additionally, as we have seen from several students’ blogs, technology has become an integral part our education system. We already use automated marking for MCQ’s, therefore perhaps it could be argued that automated marking within essays is the next logical step for technology. Supporting this, initial research into automated grading of short-answer questions has suggested that automated marking is just as effective as human marking (Butcher & Jordan, 2010).

However, short-answer questions often have a right or wrong answer. Conversely, there is often no right answer in university essays, which are of much greater length. This is where the significant flaw to this marking-style is evident. Consider language translators, such as ‘Google Translate’. This automated programme can translate text into any language quickly and accurately. However the translation is taken very literally, which can lead to fundamental issues regarding overall meaning. Similarly, automated grading cannot recognise the wider picture like a human can, and determine ‘What is the purpose of this essay?’ This relates to my discussion last week regarding creativity, where a student produces a novel idea.  Logical, original thinking are the most important qualities of effective writing, yet automated marking overlooks them (Byrne, Tang, Tranduc & Tang, 2010). In a module such as this where creativity is a necessity within our blogs, automated grading would be futile.

It is undeniable that these programmes are extremely clever, using artificial intelligence and complex statistics to determine a student’s grade (Valenti, Neri & Cucchiarelli, 2003). However some aspects of writing, such as fluency of knowledge, cannot be directly measured. As a result, programmes measure correlates to fluency, such as essay length (Valenti, Neri & Cucchiarelli, 2003). One must question if a concept as complex as fluency can be reduced to something as simple as essay length? Students may try and ‘beat the system’, producing long essays (regardless of content) in order to achieve a higher grade. Research has shown students’ work is extremely influenced by what they believe the system expects from them (Jordan, 2012), encouraging destructive compliance rather than valuable creativity.

Despite research suggesting automated marking is fairer than human marking, this doesn’t recognise the issues faced by children with dyslexia or those who are second-language English (Jordan, 2012). Automated programmes often fail to identify misspelt words, or analyse a sentence if it is poorly constructed (Mitchell, Russell, Broomhead & Aldridge, 2002). Computerised errors can also benefit the student unfairly, failing to identify incorrect statements when key words are used within the sentence (Mitchell, Russell, Broomhead & Aldridge, 2002). This evidence suggests automated marking isn’t as accurate as it initially appears.

Unsurprisingly, a lot of debate has emerged about this controversial method, with a petition created against the use of automated systems. Regardless of this method’s effectiveness, I believe it is important to question the effect this method would have on students if it was universally implemented. What would motivate students to engage in their learning if their work was never read? Students will become robots, striving for the right answer with no motivation to discover novel ideas beyond the guidelines. If automated grading is to be integrated within education in any form, it should be used alongside human marking as a confirmation of accuracy, and definitely not as a stand-alone pedagogical tool (Markoff, 2013).

Although my first two weeks of blogging have focused on the flaws of the grading system, I believe there are solutions. I will focus on viable solutions in a later blog; however automated marking isn’t one of them. Inevitably, the only way to determine an effective solution is to eliminate flawed alternative suggestions. Grading has become an integral aspect of our education system, yet its effectiveness is questionable. As psychologists we believe nothing and question everything, and this approach should (but often isn’t) applied to pedagogy.

I end this blog with a question, if we are not machines, why should we be evaluated by one?


11 responses »

  1. News articles similar to this have been ever increasing in recent years One in particular caused National uproar where students felt they were unfairly graded on their English GCSE exam. The quality of marking (accuracy and reliability) has been questioned on numerous occasions. Although I do not believe e-marking is a solution to the problem, a system needs to be created that is robust (Ofqual, 2013). The current system allows for a variety of techniques to be used and no one single method is the accepted way. Currently assessments and grading is a human judgement. The system needs to implement controls or punishments to prevent poor marking. Due to the unreliability of this system, multiple choice responses have become an accepted way of testing an individual’s knowledge. This is due to their only being one right answer and there is no room for subjectivity. However, this prevents students using critical thinking skills and we become individuals who can just repeat the one and only right answer.

    Ofqual. (2013) Review of quality of marking in exams in A levels, GCSEs and other academic qualifications. Retrieved on 5th November 2013 from:

  2. Interesting take on automated marking, you have raised some very important points. The speed of feedback is important, studies have shown that immediate feedback is more valued by students and is more helpful in allowing them to achieve learning goals (Weaver, 2006) in this way automated marking of an MCQ test can be helpful, this is possible because there is only one right answer to the question. Another area in which automated marking has been shown to be useful and robust is in the area of programming, such as Java and C languages, programmes such as GAME have been in use for several years and when tested produced grades almost identical to human makers (Blumenstein et al 2004)

    For essays the issues are more complicated, measuring factors such as expression in writing, integrating disparate ideas to reach a conclusion and spotting inconsistencies in prose are all things which a computer programme would struggle with, since they tend to concentrate on things such as key words, sentence complexity and grading rules which are arbitrarily applied using statistical algorithms. (Valenti, Neri, & Cucchiarelli, 2003) specifically when considering higher education Gripps, (2005) points out that often the costs of developing such systems outweigh the befits they offer, particularly when it comes to smaller courses, where questions change frequently, totally automated marking and feedback are still a long way off because as she states: “complex performances still have to be marked by complex (human) brains”


    Blumenstein, M., Green, S., Nguyen, A., & Muthukkumarasamy, V. (2004, June). An experimental analysis of GAME: a generic automated marking environment. In ACM SIGCSE Bulletin (Vol. 36, No. 3, pp. 67-71). ACM.

    Gipps*, C. V. (2005). What is the role for ICT‐based assessment in universities?. Studies in Higher Education, 30(2), 171-180.

    Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2, 319-330.

    Weaver, M. R. (2006) Do student’s value feedback? Student perceptions of tutors’ written responses. Assessment and Evaluation in Higher Education, 379-394.

  3. What about automated feedback? Barker (1) found that an automated feedback tool reduced the time taken to mark an assignment by 30%; therefore quick and efficient feedback was given to students within 3 weeks of the assignment due date. Barker also found that students benefit from this automated feedback tool; which tests and marks exams and essays objectively. This feedback is achieved through a text folder within a template created by the teacher, where a group of feedback comments are provided for the automated marking system to adhere to. The teacher therefore also has the opportunity to modify or add any feedback comments they wish. Barker resulted in showing that this was a valuable tool to their marking system (within the computer sciences department at the University of Hertfordshire) and that everyone involved found it useful. However, this could only be restricted to computer sciences courses; where there possibly may already be no room for movement – take many modules within psychology, for example, where we are constantly taught there is are many different theories and there is not particularly one right answer. On the other hand, it may just be that computer science courses are not restrictive – it may well be the lecturers/teachers and students who are the restrictive ones.

    As much as I agree that automated marking systems would make the education system even more restrictive then it currently is, the research shows that automatic feedback is very beneficial to students. Having said this, maybe there should therefore be a mix between automated marking and human marking should be consider – where the automated marking gives the student automatic feedback, but the human marking takes into consideration the creativity within an essay.


  4. I really enjoyed reading this blog, it definitely gets you thinking.
    Essentially, automated marking itself isn’t all that different from human marking. Both are based upon the same curriculum which states how to determine whether an answer is right or wrong. It is the curriculum produced by people who hinder creativity not the machines themselves. Granted, humans possess qualities such as cognitive abilities no automated machine, no matter how advanced possesses. However, if the curriculum does not account for these qualities how can it be said that there is a difference between human and automated marking?
    Blumenstein, M., Green, S., Nguyen, A. and Muthukkumarasamy, V., (2004) investigated the automated marking system for University level courses know as GAME, comparing it to human marking. The results turned out to be rather encouraging.
    Firstly, out of the first group of 99 assessments only 8 did not corroborate with the human marker. In the second group of the same number of assessments, the discrepancy was slightly higher with 19 human markers disagreeing with the automated marking system. However, overall there was a 21% discrepancy between GAME and the human marker. Out of the total number of differences between scores given by the system and the human markers, 7 of these were due to human error, a factor not present when using automated marking. The remaining was due to human markers giving students extra marks for effort. From these results it is clear that there really isn’t that much of a difference between the two. Overall, agreement levels between marker and machine are high. Thus, this demonstrates my original point about the curriculum upon which academic work is marked. Surely if the curriculum upon which both humans and machines base their marks accounted for creative thinking and novel ideas in students essays then the marks given by teachers would be much different from those given by the automated marking system because of course computers can’t think the way people can. Yet the fact that there is not much difference suggests not a problem with the method of marking but the groundings upon which the work is marked.
    Blumenstein, M., Green, S., Nguyen, A. and Muthukkumarasamy, V., (2004) GAME: A Generic AutomatedMarking Environment for Programming Assessment, International Conference on Information Technology: Coding and Computing (ITCC 2004), (Las Vegas, USA 2004), 212-216.71

  5. Hannah, you have a brilliant writing style that compels the audience, and your blog is filled with evidence. I think that the new computer automated grading system is would not eradicate human error or a human’s subjectivity. A computer is programmed by a human who inputs their own bias onto the system. Therefore, there would be a single subjectivity, which would be based around the computer programmer’s ideas. Therefore, everybody within the grading system would be subjected to such a powerful subjectivity that they would desperately be trying to conform to the framework of a single piece of human written software. This creates a bias, and a mark scheme that is so predictable it would allow the intelligent children to learn the method of perfect exam marks without the knowledge of the course/ subject. Therefore, if something isn’t human checked at some point on the spectrum, I think it would lead to a collection of machine errors affecting the lives of academics. Subjectivity is a key element of every course. I spend half of my time on a course trying to discover what the teacher expects, and then the other half trying to produce that onto paper. Another method that could eradicate the pitfalls of single subject subjectivity and single-program subjectivity could be inter-subjectivity. Chinese educators*1 have opted for inter-subjectivity because it allows for a more solid review of the correct grading on a paper. Use of inter-subjectivity is more time-consuming and costly, however it is one way to ensure that no one is being cheated out of a good grade.


  6. This was a very interesting blog. I agree with you, automated grading should not be used to grade essays. Automated grading works wonderfully with MCQs because it is a matter of having definite right and wrong answers. Automated grading with MCQs are also fair (Skupiene & Zilinskas, 2009) as their is no way for it to be unfair. Also having humans marking MCQs could potentially bring a lot of error as the papers are very repetitive. Short answer questions and essays are much more complex to mark. When marking essays the marker needs to see the structure and flow of the essay and then mark the content as well as grammar. I personally don’t believe there could be a system which will be able to effectively and accurately grade an essay, it is just too complex. The module organiser would also have to provide the system with all the possible answers that could be included in each essay. As you said in your blog, if students know that it is automated grading and the system will look for key words then they will not input any creativity into their essays as they will just make sure that they have used all the key words and ramble on to make it long for maximum grades. As mentioned above, automated grading within MCQs makes the system fair however if automated grading were to be implemented within essays I strongly believe that this would become and unfair system. The computer would not pick up on the students that have revised for hours and inputed a great deal of creativity into their answer, it would just make sure that key words and a certain structure is included.

    Skupiene, J., & Zilinskas, A. (2009). Automated grading of programming tasks fulfilled by students: Evolution and perspectives. Communication and Cognition, 42, 3-18. Retrieved from

  7. To answer your first question, if I was told that a piece of my work was never to be read, I would question why I was doing it. Regardless of how fair (or not) the method would be, I immediately found the concept quite disheartening.

    Exceptional teachers are ones that guide students to greatness by inspiring them (Bowman, 2007). It has been suggested there are three overarching needs that drive intrinsic motivation: autonomy, mastery and purpose (Bowman, 2011). The purpose of completing a piece of work is for it to be read. By removing the reader, you remove the purpose and motivation for that work will decrease.

    Katzenbach (2006) argues that pride is an extremely powerful agent of performance, and that teachers function as ‘instinctive pride builders’. Effective teachers are adept at getting students to anticipate how proud they will be (Bowman, 2007). However, will not occur if the education system is seen to not care about the quality of the work itself.

    I suppose it is only natural for an education system that treats its students like machines, to try and test them via machine as well.

    Bowman, R. F. (2007). How Can Students Be Motivated: A Misplaced Question?. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 81(2), 81-86.
    Katzenbach, J. R. (2006). Motivation beyond money: Learning from peak performers. Leader to Leader, 2006(41), 59-62.

  8. Pingback: The only thing that interferes with my learning is my education (Albert Einstein) | hcrettie

  9. Pingback: The only thing that interferes with my learning is my education (Albert Einstein) | Science of Education

  10. Pingback: The only thing that interferes with my learning is my education (Albert Einstein) | Psychined Index

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s