Cylon Numbers

This project is maintained by Japhiolite

Welcome to the Zylonensender

This quick’n’dirty webpage is a presentation of Jan’s analysis of the Zylonensender, where he is one of the podcasters. The podcast is about Battlestar Galactica, and after each episode, the podcasters give a rating.

So after the whole series is watched, we’ll have a short look at the overall ratings. We’ll have a look at questions like:

Analysis

The analysis is done using python and presented as a Jupyter Notebook.

Auswertung_Zylonensender
In [43]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
Out[43]:

Episode grading

In this notebook, we'll go through the grades, each member of the zylonensender crew gave over the course of the series Battlestar Galactica). We'll have some look over the average grades per episode, how our individual podcasters' verdict was (so, who was the most "critical" and who was the most "benelovent"), what the worst episodes were, and which were the best...and how our verdicts compare to the IMDB one.

In [1]:
# some libraries
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('whitegrid')
sns.set_context('talk')

import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage,AnchoredOffsetbox
%matplotlib inline
In [2]:
# define colors and identifyers for the podcasters
podcasters = ['Tim', 'Benjamin', 'Marijo', 'Phil', 'Jan']
colors = {'Tim': "#2e8b57",
         'Benjamin': "#e07b39",
         'Phil': "#195e83",
         'Marijo': "#b63132",
         'Jan': "#7e839c"}
imgs = {'Tim': "../imgs/tim_50x50.jpg",
       'Benjamin': "../imgs/ben_51x50.png",
       'Phil': "../imgs/phil_50x50.png",
       'Marijo': "../imgs/marijo_50x50.png",
       'Jan': "../imgs/jan_50x50.png"}
In [3]:
# import the data
data = pd.read_csv('../grades/zys_bewertungen.csv')
data = data.dropna()
data.head()
Out[3]:
Folgentitel Episode Sprecher Wertung
0 The Sound of Cylons S4E21 Benjamin 10.0
1 Präsident Cool trifft Admiral Sowieso S4E20 Benjamin 10.0
2 Kein guter Hitler S3E04 Benjamin 10.0
3 Die Drei von der Luftschleuse S4E10 Benjamin 10.0
4 Schmu der Luftschleuse S4E14 Benjamin 10.0

Attendance

Before we go into the votes and stuff, let's have a short look on the attendance. Presumably, Tim will come out on top, as the producer :) But who is the most disciplined podcaster following him?

In [4]:
ranking = data['Sprecher'].value_counts()
ranking_percent = ranking / ranking['Tim'] * 100
fig, axs = plt.subplots(1,2, figsize=[10,4])
sns.barplot(x=ranking.index, y=ranking.values, ax=axs[0], palette=colors);
axs[0].set_ylabel('Episodes attended')
sns.barplot(x=ranking.index, y=ranking_percent.values, ax=axs[1], palette=colors);
axs[1].set_ylabel('%')
fig.tight_layout()

It's Phil, whose calm voiced opinions are such an important part of this podcast.

Vote distributions

Let's have a look at the vote distributions of each podcaster. Do they resemble a bell curve, are they skewed? We'll see...

In [5]:
def place_image(im, loc=3, ax=None, zoom=1, **kw):
    if ax==None: ax=plt.gca()
    imagebox = OffsetImage(im, zoom=zoom*0.72)
    ab = AnchoredOffsetbox(loc=loc, child=imagebox, frameon=False, **kw)
    ax.add_artist(ab)
In [6]:
fig, axs = plt.subplots(nrows=2,ncols=3, sharex=True, sharey=True, figsize=[15,7])
fig.subplots_adjust(hspace=0.7)
fig.suptitle('Distributions of each podcaster')

for ax, name in zip(axs.flatten(), podcasters):
    # prepare bins to reflect the 1 to 10 points
    subs = data.query(f"Sprecher=='{name}'")['Wertung']
    d = np.diff(np.unique(subs)).min()
    left_of_first_bin = subs.min() - float(d)/2
    right_of_last_bin = subs.max() + float(d)/2
    
    # plot the histogram 
    ax.hist(subs, bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color=colors[name])
    ax.set_title(name)
    ax.set_xlim([1, 11])
    
    # add some images
    im = plt.imread(imgs[name])
    place_image(im, loc=2, ax=ax, pad=0, zoom=2)
axs[1,1].set_xlabel('Points')
axs[0,0].set_ylabel('Amount')
axs[1,0].set_ylabel('Amount')
fig.tight_layout()

Here, Phil and Benjamin show a rather similar, benelovent distribution, with a strong jump from 6 to 7 points, and with Ben giving a few more 10 points. Tim gave 10 points most, about 5 times. Marijo seems to be the only one who gave an episode 2 points.

Overall, Ben's and Phil's most often verdict was 7 points, whereas the three other podcasters gave 8 points most often.

In a similar plot without homogenized axes, let's additionally plot the summed up distribution:

In [7]:
# including total count and no shared axes
fig, axs = plt.subplots(nrows=2,ncols=3, figsize=[15,7])
fig.subplots_adjust(hspace=0.7)
fig.suptitle('Distributions of each podcaster')

for ax, name in zip(axs.flatten(), podcasters):
    # prepare bins to reflect the 1 to 10 points
    subs = data.query(f"Sprecher=='{name}'")['Wertung']
    d = np.diff(np.unique(subs)).min()
    left_of_first_bin = subs.min() - float(d)/2
    right_of_last_bin = subs.max() + float(d)/2
    
    # plot the histogram 
    ax.hist(subs, bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color=colors[name])
    ax.set_title(name)
    ax.set_xlim([1, 11])
    
    # add some images
    im = plt.imread(imgs[name])
    place_image(im, loc=2, ax=ax, pad=0, zoom=2)

axs[1,2].hist(data['Wertung'], bins=np.arange(left_of_first_bin, right_of_last_bin + d, d), color='#1e212f')
axs[1,2].set_title('All together')
axs[1,1].set_xlabel('Points')
axs[0,0].set_ylabel('Amount')
axs[1,0].set_ylabel('Amount')

fig.tight_layout()

Ratings over the seasons

Did one season perform better or worse than the others? And what are the best / worst rated episodes by the podcasters?

In [8]:
seasons = ['S1', 'S2', 'S3', 'S4']
data_seasons = {}
for i in seasons:
    data_seasons[i] = data.query(f'Episode.str.contains("{i}")', engine='python')
pod_seasons={}
for j in podcasters:
    pod_seasons[j] = np.round([data_seasons[i].query(f'Sprecher=="{j}"')['Wertung'].mean() for i in seasons],2)
average_ratings = pd.DataFrame.from_dict(pod_seasons, orient='index', columns=seasons)

f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(average_ratings, annot=True, linewidths=.5, ax=ax, cmap='viridis');
ax.set_title('Average ratings per season');

So it looks like the series got a lot of traction between season 2 and season 3, and - on average - had the best season in season 4. While this gives an overall picture of seasons, and the probability may be high that the overall best rated episodes are in season 4 (or the worst rated in season 1 or season 2), we have a look at exactly that next:

Which episodes had the overall best rating and which the worst? With our 5 podcasters, the best achievable rating are 50 Points, and the worst (logically) 0 Points.

If one is interested in how the data is re-structured, here's the code.

# re-structure the dataframe to have each podcaster as a column, episodes as index and rating as value.
episodes = np.sort(data.Episode.unique())
data_structured = pd.DataFrame(columns=podcasters, index=episodes)
data_structured['Summe'] = np.nan
data_structured['Mittelwert'] = np.nan
data_structured['Folgentitel'] = ""

# as producer tim was always present, we take him to get to the Folgentitels
tim = data.query('Sprecher=="Tim"').sort_values(by=['Episode'])
tim = tim.set_index(tim['Episode'])

data_structured['Folgentitel'] = tim['Folgentitel']

# now we fill in the numbers/votes for each podcaster
for i in podcasters:
    podc = data.query(f"Sprecher=='{i}'").sort_values(by=['Episode'])
    podc = podc.set_index(podc['Episode'])
    data_structured[i] = podc['Wertung']
    data_structured
data_structured['Summe'] = data_structured[podcasters].sum(axis=1)    
data_structured['Mittelwert'] = data_structured[podcasters].mean(axis=1)

data_structured.to_csv('../grades/grades_restructured.csv')

But as we don't need to do that everytime, we will load the re-structured dataframe now and continue to work with the loaded version:

In [20]:
data_structured = pd.read_csv('../grades/grades_restructured.csv', index_col=0)

Now that we structured the ratings in a more convenient matter, let's have a look at the highes and lowest episodes. For this, we can easily sort the structured dataframe by its sum Summe. It should be taken into account, that this only checks the sum. If, however, only X out of 5 podcasters (with X < 5 ) were present during recording, and all X gave 10 points, the sum will not be the maximum point-value.

In [21]:
data_structured.sort_values(by=['Summe'])
Out[21]:
Tim Benjamin Marijo Phil Jan Summe Mittelwert Folgentitel
S3E09 3.0 3.0 2.0 3.0 NaN 11 2.75 Technischer KO bei Klitschkos Regiedebut
Pilotfilm – Miniserie Teil 2 3.0 3.0 3.0 3.0 NaN 12 3.00 Das hoffen wir alle
S2E02 4.0 4.0 NaN 4.0 NaN 12 4.00 Kopfschuss, nachladen, ... leer
S2E14 3.0 NaN 2.0 4.0 3.0 12 3.00 Sherlock Lee und Snacks für Fisk
S2E03 NaN 6.0 NaN 7.0 NaN 13 6.50 NaN
... ... ... ... ... ... ... ... ...
Razor 8.0 8.0 9.0 7.0 8.0 40 8.00 Atomic Shawn of the Dead
S3E20 8.0 9.0 9.0 8.0 8.0 42 8.40 Vier Toaster im ironischen Nebel
S4E21 10.0 10.0 10.0 10.0 10.0 50 10.00 The Sound of Cylons
S4E20 10.0 10.0 10.0 10.0 10.0 50 10.00 Präsident Cool trifft Admiral Sowieso
S4E10 10.0 10.0 10.0 10.0 10.0 50 10.00 Die Drei von der Luftschleuse

77 rows × 8 columns

Overall, with all podcasters present, the best episodes were the final ones of season 4, S4E20 & S4E21 and also S4E10. Among worst episodes is S2E14, Sherlock Lee and Snacks für Fisk ... which is.... Surprise! the episode Black Market, coming in with devastating 6.5 stars on IMDb.

But the worst Episode of BSG, according to 4 out of 5 Podcasters (Jan was not present for that recording) is S3E09 Technischer KO bei Klitschkos Regiedebut, the episode Unfinished Business which has a rating of 7.7 on IMDb. To know why the 4 stooges hated that episode, listen to the corresponding Zylonensender-Episode.

Still, it would be a good episode, so we'll check whether all present podcasters gave 10 points. Checking the re-structured dataframe yields: Yes, this is the case with Episode S3E04, Kein guter Hitler where Tim, Benjamin, and Jan all gave 10 points.

Let's have a small look, what the best and worst episodes were, when all podcasters were present:

In [22]:
data_all_podcasters = data_structured.dropna()
data_all_podcasters.sort_values(by=['Summe'])
Out[22]:
Tim Benjamin Marijo Phil Jan Summe Mittelwert Folgentitel
Webisodes: The Resistance 5.0 5.0 5.0 5.0 5.0 25 5.0 Für eine Handvoll Sellerie
S1E04 7.0 5.0 3.0 5.0 5.0 25 5.0 Chamalla-Extrakt mit den grünen Streifen
S3E14 7.0 6.0 7.0 5.0 4.0 29 5.8 Iehovah ist überall
S4E02 6.0 6.0 6.0 6.0 7.0 31 6.2 Wenns Pipi brennt
S1E06 5.0 7.0 9.0 5.0 6.0 32 6.4 Aggro JAG-Tante mit Hintergrund-Trommelmusik
S1E05 7.0 7.0 6.0 7.0 7.0 34 6.8 Die Zylonenfleischtheke und das vibrierende Fl...
S4E19 8.0 6.0 7.0 7.0 6.0 34 6.8 Luschis und Tim nach links
S3E19 6.0 7.0 7.0 7.0 7.0 34 6.8 Es ist im Schiff!
S4E01 7.0 7.0 7.0 7.0 8.0 36 7.2 Blutiger Rasierunfall
S2E09 6.0 7.0 8.0 6.0 10.0 37 7.4 Chief Bilbo und die Logikbombe
S2E19 + S2E20 7.0 8.0 7.0 8.0 7.0 37 7.4 Thanks Adama
S4E17 8.0 8.0 7.0 8.0 6.0 37 7.4 Mit dem Kopf durch das Waschbecken
S3E06 8.0 7.0 8.0 7.0 8.0 38 7.6 Cavil - Allein zu Haus
S4E09 7.0 8.0 8.0 6.0 9.0 38 7.6 Es wurde auch Zeit
S1E03 8.0 8.0 8.0 8.0 7.0 39 7.8 Starbucks Bitch und die nukleare Ohrenkrise
Razor 8.0 8.0 9.0 7.0 8.0 40 8.0 Atomic Shawn of the Dead
S4E11 9.0 8.0 8.0 8.0 7.0 40 8.0 So uncivilized
Film: The Plan 8.0 8.0 8.0 8.0 8.0 40 8.0 Ich liebe es, wenn ein Plan funktioniert
S3E05 8.0 8.0 8.0 8.0 8.0 40 8.0 Baltars Billy geluftschleust
S2E08 9.0 8.0 6.0 8.0 9.0 40 8.0 Lucy Lawless und die nackten Weltraumärsche
S3E20 8.0 9.0 9.0 8.0 8.0 42 8.4 Vier Toaster im ironischen Nebel
S4E21 10.0 10.0 10.0 10.0 10.0 50 10.0 The Sound of Cylons
S4E20 10.0 10.0 10.0 10.0 10.0 50 10.0 Präsident Cool trifft Admiral Sowieso
S4E10 10.0 10.0 10.0 10.0 10.0 50 10.0 Die Drei von der Luftschleuse

Disregarding the Webisodes, S1E04 Chamalla Extrakt mit den grünen Streifen is the worst rated episode, when all 5 podcasters were present for a recording. (Not so) fun fact: Including special episodes like the webisodes or the movie The Plan, all podcasters were present in 24 of 76 podcast episodes recorded here. Thats 31.6 %.

However the database here is incomplete, as there are in total 83 episodes of Zylonensender available... in the near future, I shall be going on a quest to find the lost episodes! queue Indy Music and shameless plug of Impulssender Raiders of the lost Ark.

In [24]:
x = np.arange(77)
x_rep = data_structured.index

figure = plt.figure(figsize=[24,7])

fig, axs = plt.subplots(5,1, figsize=[25,25], sharex=True)
for ax, name in zip(axs.flatten(), podcasters):
    m = 'o-'
    ax.plot(x, data_structured[name], m, c=colors[name])
    # add some images
    im = plt.imread(imgs[name])
    place_image(im, loc=2, ax=ax, pad=0, zoom=2)
plt.sca(axs[4])
plt.xticks(x, x_rep, rotation='vertical');
plt.tight_layout()
<Figure size 1728x504 with 0 Axes>