This project is maintained by Japhiolite
This quick’n’dirty webpage is a presentation of Jan’s analysis of the Zylonensender, where he is one of the podcasters. The podcast is about Battlestar Galactica, and after each episode, the podcasters give a rating.
So after the whole series is watched, we’ll have a short look at the overall ratings. We’ll have a look at questions like:
The analysis is done using python and presented as a Jupyter Notebook.
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
In this notebook, we'll go through the grades, each member of the zylonensender crew gave over the course of the series Battlestar Galactica). We'll have some look over the average grades per episode, how our individual podcasters' verdict was (so, who was the most "critical" and who was the most "benelovent"), what the worst episodes were, and which were the best...and how our verdicts compare to the IMDB one.
# some libraries
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('whitegrid')
sns.set_context('talk')
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage,AnchoredOffsetbox
%matplotlib inline
# define colors and identifyers for the podcasters
podcasters = ['Tim', 'Benjamin', 'Marijo', 'Phil', 'Jan']
colors = {'Tim': "#2e8b57",
'Benjamin': "#e07b39",
'Phil': "#195e83",
'Marijo': "#b63132",
'Jan': "#7e839c"}
imgs = {'Tim': "../imgs/tim_50x50.jpg",
'Benjamin': "../imgs/ben_51x50.png",
'Phil': "../imgs/phil_50x50.png",
'Marijo': "../imgs/marijo_50x50.png",
'Jan': "../imgs/jan_50x50.png"}
# import the data
data = pd.read_csv('../grades/zys_bewertungen.csv')
data = data.dropna()
data.head()
Before we go into the votes and stuff, let's have a short look on the attendance. Presumably, Tim will come out on top, as the producer :) But who is the most disciplined podcaster following him?
ranking = data['Sprecher'].value_counts()
ranking_percent = ranking / ranking['Tim'] * 100
fig, axs = plt.subplots(1,2, figsize=[10,4])
sns.barplot(x=ranking.index, y=ranking.values, ax=axs[0], palette=colors);
axs[0].set_ylabel('Episodes attended')
sns.barplot(x=ranking.index, y=ranking_percent.values, ax=axs[1], palette=colors);
axs[1].set_ylabel('%')
fig.tight_layout()
It's Phil, whose calm voiced opinions are such an important part of this podcast.
Let's have a look at the vote distributions of each podcaster. Do they resemble a bell curve, are they skewed? We'll see...
def place_image(im, loc=3, ax=None, zoom=1, **kw):
if ax==None: ax=plt.gca()
imagebox = OffsetImage(im, zoom=zoom*0.72)
ab = AnchoredOffsetbox(loc=loc, child=imagebox, frameon=False, **kw)
ax.add_artist(ab)
fig, axs = plt.subplots(nrows=2,ncols=3, sharex=True, sharey=True, figsize=[15,7])
fig.subplots_adjust(hspace=0.7)
fig.suptitle('Distributions of each podcaster')
for ax, name in zip(axs.flatten(), podcasters):
# prepare bins to reflect the 1 to 10 points
subs = data.query(f"Sprecher=='{name}'")['Wertung']
d = np.diff(np.unique(subs)).min()
left_of_first_bin = subs.min() - float(d)/2
right_of_last_bin = subs.max() + float(d)/2
# plot the histogram
ax.hist(subs, bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color=colors[name])
ax.set_title(name)
ax.set_xlim([1, 11])
# add some images
im = plt.imread(imgs[name])
place_image(im, loc=2, ax=ax, pad=0, zoom=2)
axs[1,1].set_xlabel('Points')
axs[0,0].set_ylabel('Amount')
axs[1,0].set_ylabel('Amount')
fig.tight_layout()
Here, Phil and Benjamin show a rather similar, benelovent distribution, with a strong jump from 6 to 7 points, and with Ben giving a few more 10 points. Tim gave 10 points most, about 5 times. Marijo seems to be the only one who gave an episode 2 points.
Overall, Ben's and Phil's most often verdict was 7 points, whereas the three other podcasters gave 8 points most often.
In a similar plot without homogenized axes, let's additionally plot the summed up distribution:
# including total count and no shared axes
fig, axs = plt.subplots(nrows=2,ncols=3, figsize=[15,7])
fig.subplots_adjust(hspace=0.7)
fig.suptitle('Distributions of each podcaster')
for ax, name in zip(axs.flatten(), podcasters):
# prepare bins to reflect the 1 to 10 points
subs = data.query(f"Sprecher=='{name}'")['Wertung']
d = np.diff(np.unique(subs)).min()
left_of_first_bin = subs.min() - float(d)/2
right_of_last_bin = subs.max() + float(d)/2
# plot the histogram
ax.hist(subs, bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color=colors[name])
ax.set_title(name)
ax.set_xlim([1, 11])
# add some images
im = plt.imread(imgs[name])
place_image(im, loc=2, ax=ax, pad=0, zoom=2)
axs[1,2].hist(data['Wertung'], bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color='#1e212f')
axs[1,2].set_title('All together')
axs[1,1].set_xlabel('Points')
axs[0,0].set_ylabel('Amount')
axs[1,0].set_ylabel('Amount')
fig.tight_layout()
Did one season perform better or worse than the others? And what are the best / worst rated episodes by the podcasters?
seasons = ['S1', 'S2', 'S3', 'S4']
data_seasons = {}
for i in seasons:
data_seasons[i] = data.query(f'Episode.str.contains("{i}")', engine='python')
pod_seasons={}
for j in podcasters:
pod_seasons[j] = np.round([data_seasons[i].query(f'Sprecher=="{j}"')['Wertung'].mean() for i in seasons],2)
average_ratings = pd.DataFrame.from_dict(pod_seasons, orient='index', columns=seasons)
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(average_ratings, annot=True, linewidths=.5, ax=ax, cmap='viridis');
ax.set_title('Average ratings per season');
So it looks like the series got a lot of traction between season 2 and season 3, and - on average - had the best season in season 4. While this gives an overall picture of seasons, and the probability may be high that the overall best rated episodes are in season 4 (or the worst rated in season 1 or season 2), we have a look at exactly that next:
Which episodes had the overall best rating and which the worst? With our 5 podcasters, the best achievable rating are 50 Points, and the worst (logically) 0 Points.
If one is interested in how the data is re-structured, here's the code.
# re-structure the dataframe to have each podcaster as a column, episodes as index and rating as value.
episodes = np.sort(data.Episode.unique())
data_structured = pd.DataFrame(columns=podcasters, index=episodes)
data_structured['Summe'] = np.nan
data_structured['Mittelwert'] = np.nan
data_structured['Folgentitel'] = ""
# as producer tim was always present, we take him to get to the Folgentitels
tim = data.query('Sprecher=="Tim"').sort_values(by=['Episode'])
tim = tim.set_index(tim['Episode'])
data_structured['Folgentitel'] = tim['Folgentitel']
# now we fill in the numbers/votes for each podcaster
for i in podcasters:
podc = data.query(f"Sprecher=='{i}'").sort_values(by=['Episode'])
podc = podc.set_index(podc['Episode'])
data_structured[i] = podc['Wertung']
data_structured
data_structured['Summe'] = data_structured[podcasters].sum(axis=1)
data_structured['Mittelwert'] = data_structured[podcasters].mean(axis=1)
data_structured.to_csv('../grades/grades_restructured.csv')
But as we don't need to do that everytime, we will load the re-structured dataframe now and continue to work with the loaded version:
data_structured = pd.read_csv('../grades/grades_restructured.csv', index_col=0)
Now that we structured the ratings in a more convenient matter, let's have a look at the highes and lowest episodes. For this, we can easily sort the structured dataframe by its sum Summe
. It should be taken into account, that this only checks the sum. If, however, only X out of 5 podcasters (with X < 5 ) were present during recording, and all X gave 10 points, the sum will not be the maximum point-value.
data_structured.sort_values(by=['Summe'])
Overall, with all podcasters present, the best episodes were the final ones of season 4, S4E20 & S4E21
and also S4E10
. Among worst episodes is S2E14
, Sherlock Lee and Snacks für Fisk ... which is.... Surprise! the episode Black Market, coming in with devastating 6.5 stars on IMDb.
But the worst Episode of BSG, according to 4 out of 5 Podcasters (Jan was not present for that recording) is S3E09
Technischer KO bei Klitschkos Regiedebut, the episode Unfinished Business which has a rating of 7.7 on IMDb. To know why the 4 stooges hated that episode, listen to the corresponding Zylonensender-Episode.
Still, it would be a good episode, so we'll check whether all present podcasters gave 10 points. Checking the re-structured dataframe yields: Yes, this is the case with Episode S3E04
, Kein guter Hitler where Tim, Benjamin, and Jan all gave 10 points.
Let's have a small look, what the best and worst episodes were, when all podcasters were present:
data_all_podcasters = data_structured.dropna()
data_all_podcasters.sort_values(by=['Summe'])
Disregarding the Webisodes, S1E04
Chamalla Extrakt mit den grünen Streifen is the worst rated episode, when all 5 podcasters were present for a recording. (Not so) fun fact: Including special episodes like the webisodes or the movie The Plan, all podcasters were present in 24 of 76 podcast episodes recorded here. Thats 31.6 %.
However the database here is incomplete, as there are in total 83 episodes of Zylonensender available... in the near future, I shall be going on a quest to find the lost episodes! queue Indy Music and shameless plug of Impulssender Raiders of the lost Ark.
x = np.arange(77)
x_rep = data_structured.index
figure = plt.figure(figsize=[24,7])
fig, axs = plt.subplots(5,1, figsize=[25,25], sharex=True)
for ax, name in zip(axs.flatten(), podcasters):
m = 'o-'
ax.plot(x, data_structured[name], m, c=colors[name])
# add some images
im = plt.imread(imgs[name])
place_image(im, loc=2, ax=ax, pad=0, zoom=2)
plt.sca(axs[4])
plt.xticks(x, x_rep, rotation='vertical');
plt.tight_layout()