Rank the Revival: A Numbery-Wumbery Breakdown (Part 2)
What’s the new dream & nightmare run? Best slot? Does recent episode syndrome exist? Guest contributor Joshua Yetman investigates.
Part 1 of this article presented a considerable number of cold, hard statistics from the recent Best of the Revival polls, where every episode of the revival was scored by the DWTV community, summarising 10 years of fantastic television with a comprehensive rank of episodes averages. In this second part, however, we will analyse and make conclusions about particular aspects of the revival, and gain a greater insight into the past 10 years of Doctor Who.
(1) The new ‘Dream and Nightmare’ run of episodes
This time last year, DWTV carried out a “Series 1-7 face-off”, pitting all the episode 1’s against each other, all the episode 2’s against each other, etc, in order to find a dream run and nightmare run of episodes, formed out of the best and worst episodes respectively in each episode slot (click here and here to see the results of these polls). It’s like fantasy football, just with Doctor Who episodes, and more interesting in every conceivable way.
By applying the results of the Rank the Revival poll, we can effectively create a Series 1-8 face-off by using the episode averages. The fact that Series 8 only had 12 episodes disadvantages it somewhat, but we’ll carry on regardless. The new dream run of episodes is:
- Best Episode 1: The Eleventh Hour
- Best Episode 2: Day of the Moon
- Best Episode 3: School Reunion
- Best Episode 4: The Girl in the Fireplace (very narrowly edging out Listen!)
- Best Episode 5: Flesh and Stone
- Best Episode 6: Dalek
- Best Episode 7: A Good Man Goes to War
- Best Episode 8: Silence in the Library
- Best Episode 9: The Empty Child
- Best Episode 10: Blink
- Best Episode 11: Dark Water
- Best Episode 12: The Pandorica Opens
- Best Episode 13: The Parting of the Ways
- Best RTD special: The Waters of Mars
- Best Moffat special: The Day of the Doctor
So, understandably, Moffat dominates the dream run, with an incredible 11 out of the 15 slots being occupied by a Moffat written episode. Furthermore, the dream run is fairly split between the two eras, with 7 Moffat era stories and 8 RTD era stories. Every series has at least one representative except Series 7 (unless you count The Day of the Doctor as part of Series 7), and the series with the greatest representation are Series 5 and Series 1, both with 3 episodes. There have been 4 changes since the Series 1-7 face-off (in slots 4,5,11 and 13), though only one of these changes was down to a Series 8 episode.
The new nightmare run of episodes is:
- Worst Episode 1: New Earth
- Worst Episode 2: The Beast Below
- Worst Episode 3: The Curse of the Black Spot
- Worst Episode 4: Daleks in Manhattan
- Worst Episode 5: Evolution of the Daleks
- Worst Episode 6: The Lazarus Experiment
- Worst Episode 7: The Idiot’s Lantern
- Worst Episode 8: Cold War
- Worst Episode 9: Night Terrors
- Worst Episode 10: Love & Monsters
- Worst Episode 11: Fear Her
- Worst Episode 12: Nightmare in Silver
- Worst Episode 13: The Wedding of River Song
- Worst RTD special: The Next Doctor
- Worst Moffat special: The Doctor, The Widow, and the Wardrobe
Three Mark Gatiss written episodes – plus two episodes in which he acted in, poor guy! – appear in this list, so it’s fair to say he dominates the nightmare run! Each series has at least one representative here except Series 1, 4 and 8. The series with the greatest representation is Series 2, with 4 episodes. There have been 3 changes from the Series 1-7 face-off (in slots 8,12, and the RTD special). Furthermore, the nightmare run is again fairly split between the two eras, with 7 Moffat era stories and 8 RTD era stories.
(2) The best episode slot
On average, the Episode 13 slot has the highest average score (8.147), bolstered by the considerable number of high quality finales we’ve had in the revival (as we will see in the next section). The episode 9, 8 and 12 slots also do very well on average. On the other hand, the Episode 7 slot has the lowest average score, with an average of 6.957, with the Episode 3 and 6 slots not far behind.
The diagram given above shows the average episode score for each episode slot, differentiated between the Moffat and RTD eras. From it, we could say that, on average and between both eras, a typical series “sags” in the middle slightly. We could even say that the typical series consists of 4 peak regions separated by interim “sags”. Also, it is to be noted that the RTD Era Episode 9 slot is the strongest slot of them all (which makes a lot of sense, given that it contains The Empty Child, Forest of the Dead, The Family of Blood and The Satan Pit!), followed closely by the Moffat Era Episode 1 slot.
(3) Best to worst finales (averaged as a story)
- 1: The Pandorica Opens / The Big Bang (8.86)
- 2: Bad Wolf / The Parting of the Ways (8.64)
- 3: The Name of the Doctor (8.45)
- 4: Dark Water / Death in Heaven (8.27)
- 5: The Stolen Earth / Journey’s End (8.16)
- 6: Utopia / The Sound of Drums / Last of the Time Lords (8.11)
- 7: Army of Ghosts / Doomsday (7.99)
- 8: The Wedding of River Song (7.21)
With the average score of the entire revival being about 7.50, these scores show that almost all of the finales lie in the upper half of the revival in terms of quality, and are, generally, very positively received stories. The Pandorica Opens / The Big Bang continues to dominate the other finales, an expected result given its popularity, and The Wedding of River Song once again bringing up the rear. The Moffat era finale average is 8.20, whilst the RTD era finale average is 8.23, another close result which again shows how the two eras – in reality – don’t differ that much in quality at all!
(4) Double-part story analysis
The best double-part story of the revival is Series 1’s The Empty Child / The Doctor Dances, with an average score of 9.05. The worst double-part story of the revival is Series 3’s Daleks in Manhattan / Evolution of the Daleks, with an average score of 5.85.
Now, which part of a double-parter is the better half? Part 1, or Part 2? Treating the Series 3 finale just as a double-parter consisting of The Sound of Drums and Last of the Time Lords, the average score for Part 1 of the 19 double-part stories in the revival is 7.897, and the average score for Part 2 is 7.840. So, on average, it seems DWTV readers prefer the first part of double-parters, albeit only slightly.
Which double-parter has the biggest upwards, downwards and overall jump in quality? In other words, which double-parter is the least consistent in quality between its parts? Well, the biggest upwards jump in quality between Part 1 and Part 2 is seen in Bad Wolf / The Parting of the Ways, with a jump of 0.47, followed by Army of Ghosts / Doomsday, with a jump of 0.36. Both of these double-parters have a relatively understated Part 1 followed by a critically acclaimed or fan loved Part 2, explaining the size of these jumps.
The biggest downwards (and overall) jump in quality between Part 1 and Part 2 is seen in Dark Water / Death in Heaven with a drop of 0.76, followed by The Sound of Drums / Last of the Time Lords with a drop of 0.60. Both of these finales have a highly praised and commended Part 1 followed by a controversial and/or polarising Part 2, again explaining the size of the jumps.
The most consistent double-parter is Silence in the Library / Forest of the Dead, where both parts have the exact same score!
But in any case, double-parters are better received, on average, than single-parters. The average score of all the 19 double-parters in the revival is 7.869, whilst the average score of all the single-parters is 7.330, a statistically significant difference of 0.539. Plus, double-part stories are more consistent than single-part stories on average (with a standard deviation of 0.945 for the former and 1.059 for the latter). Thus, with Series 9 set to feature 2 (maybe 3) double-parters, let’s hope the trend of excellent double-parters continue!
(5) Best and worst runs of 5 consecutive episodes
What run of 5 consecutive episodes (i.e. episodes broadcast one after the other) in the revival is the strongest? This is a simple application of a 5-point moving average, and the top 3 non-overlapping runs of 5 consecutive episodes are:
- 1: Human Nature – The Sound of Drums (average of 8.751)
- 2: Silence in the Library – The Stolen Earth (average of 8.659)
- 3: The Pandorica Opens – Day of the Moon (average of 8.608)
The bottom 3 non-overlapping runs of 5 consecutive episodes are:
- 1: The Idiot’s Lantern – Fear Her (average of 6.187)
- 2: Gridlock – 42 (average of 6.291)
- 3: Rose – World War Three (average of 6.849)
Remember, these runs are calculated as mere moving averages. Good episodes (like Gridlock, in my opinion) may be absorbed into these runs if their neighbouring episodes are negatively received.
Moreover, the least consistent run of 5 episodes is the run The Satan Pit – Army of Ghosts, which makes sense given that it contains two strongly received episodes, one fairly well received episode, and two heavily derided episodes, meaning this particular run is all over the place. The most consistent run of 5 episodes is the is the run The Pandorica Opens – Day of the Moon, which sounds reasonable as all these episodes are similarly acclaimed.
(6) Modelling the average episode scores
Statistical modelling is all about making a particular set of assumptions regarding the generation of a set of data. In this case, our data is the 117 episode averages, and we want to know if we can somehow model it with a probability distribution. If we can do this, then we make inferences about how likely an episode will receive a particular average score in the future!
Now, it may not seem very intuitive to model something like episode averages, which are very subjective quantities with no indication of having a model-able population distribution, but, hey, we can at least try and see what happens!
We can model episode averages as being dependent or independent of time. If we model them as being independent of time, then we can use a chi-squared test, with indicates that the Normal distribution N(7.505, 1.058) is a surprisingly good fit for the data. For non-statisticians, don’t worry, as I’ll interpret what all this means via probabilities. Under this distribution, then, the probability that a future episode will receive a community average score of 8 or above is 0.32, a score of 9 or above is 0.08, between 7 and 8 is 0.36, and below 7 is 0.32.
If we model them as being dependent on time, then we need to use autoregressive time series modelling. This is very complicated indeed, and so I let computer software do most of the work on this one (so shoot me)! Using this software, the computer predicts that The Magician’s Apprentice, the first episode of Series 9, will receive an long-term average score of 7.603, based on this data. I rather hope it is higher than that, to be honest!
Again, these are not necessarily realistic models, but it’s interesting to take a look at what the models say!
(7) Historic trends in episode, and “recent episode syndrome” analysis
“Recent episode syndrome” is the colloquial term given to the phenomenon where recently broadcast episodes of the show tend to fare better than older ones, due to being “fresher” in people’s minds, or something to that extent. Does it really exist? Well, we can compare the results of this poll to polls undertaken in the past, and see if the long-term trendline in episode scores is indicative of the existence of recent episode syndrome.
Firstly, let’s compare the Matt Smith era episodes averages in this poll with the episode averages recorded in the Best of Matt Smith polls, carried out in early 2014. It turns out that every single one of Matt Smith’s 44 episodes decreased between these two polls, with the biggest decrease being Nightmare in Silver (down 0.98) and the smallest decrease being Amy’s Choice (down 0.05). The average of the Matt Smith era in the 2014 poll was 7.818, whilst the average of the Matt Smith era in this 2015 poll was 7.486. This points towards the possible existence of recent episode syndrome, but, to be more sure, we will now examine three sets of data for Series 7B, the series with the most data available and also the most recent series at the time of the 2014 poll. Its behaviour is summarised in the diagram above.
As you can see, the average score of each and every episode of Series 7B (except for The Day of the Doctor) decreased between each poll. The Day of the Doctor defies all the other episodes of 7B by having a higher average score today than it did immediately after broadcast, so The Day of the Doctor has, on average, been growing in popularity over time! The sheer collapse in the average score of Nightmare in Silver should also be noted, falling by over 2 whole points in as many years.
Also, comparing the initial average scores of Series 8 episodes to their average scores in this poll reveals a decrease in the average score from 8.097 to 7.699. Every episode decreased between these two polls except Mummy in the Orient Express. Still, it should be noted that both polls are still highly correlated, i.e. the order of the average scores in both polls is pretty much the same.
At the very least, all this evidence indicates a definite downward sloping trend to the average scores of episodes over time. More data (from older episodes) is probably required in order to resolutely conclude anything about the existence of recent episode syndrome, but there does seem to be a distinct tendency for it. That said, if a new episode is rated highly by the fandom, we should never jump to the conclusion that the high score is solely down to recent episode syndrome. The intrinsic quality of the episode itself should always be considered as well.
(8) Changes in the divisiveness of Series 8 episodes
Throughout Series 8, I calculated the standard deviations of each episode from the initial polls, measuring how divisive each episode was. The results, along with the rest of the Series 8 analysis, can be seen here. We can compare these standard deviations to their equivalents in this poll, and see whether there has been any significant changes in divisiveness.
It turns out that 10 out of the 13 episodes of Series 8 (including Last Christmas) have actually decreased in divisiveness. Out of these, the standard deviation of Death in Heaven fell the most, falling from a standard deviation of 2.346 to 2.068. Highly divisive episodes like Kill The Moon, Robot of Sherwood and In The Forest of the Night also all decreased in standard deviation slightly, though still remain considerably divisive in the grand scheme of things. Out of the 3 episodes which increased in divisiveness, Listen, rather interestingly, increased the most, jumping from a standard deviation of 1.636 to 1.822. Curiously, Flatline was another episode which has apparently become more divisive according to this poll, increasing by 0.073.
What this suggests is that, although episode averages apparently fall over time (as we discussed earlier), the standard deviations seem to fall as well (in general), suggesting that most episodes become less divisive over time. This makes sense, as, in the short-run, our initial (often pragmatic) emotions can dictate the score we give, leading to a significant number of very high or very low scores, which generate a high standard deviation. In the long-run, we have more time to rewatch and critically evaluate episodes, which would effectively “soften” the scores we give and reduce the number of extreme scores, reducing the standard deviation.
(9) Other statistics
The average score of episodes set on or around Earth is significantly lower than episodes set on other worlds, with an average score of 7.422 for the former and 7.707 for the latter!
There is fairly negative correlation between the average score of episodes and the standard deviation of episodes, suggesting better received episodes also happen to be much less divisive on average.
The average of all the Cybermen episodes is 6.933, with their best story being Dark Water / Death in Heaven. In general, the Cybermen have not fared well at all in the revival. The average of all the Dalek episodes is 7.751, with their best story being Dalek.
(10) Conclusion
So, that brings an end to this numbery-wumbery breakdown! It’s been a lot of fun discussing the last 10 years of the show throughout the polls, and it’s been even more fun calculating and analysing all these numbers and statistics. It’s great to see how positive and enthusiastic the DWTV community has been about these 117 episodes (in general!) and also how well both the RTD era and Moffat era performed.
I hope these two articles have been as interesting to you as they have been to me, and here’s to the next 10 years of the show: may they be as brilliant – and statistically fascinating and rewarding – as the previous 10!