Loading [MathJax]/extensions/Safe.js

Loading the json with the grade data:

Extracting the relevant information out of the json for one course:

Creating a list of dicts, each dict containing the info for one course.

For each course, parse the grades out of its html file, and add to its dict:

Create a pandas dataframe with the data:

We want to calculate the Z-Score of each grade. For a single course, where I got the grade $x$, the mean grade is $\mu$ and the standard deviation is $\sigma$, then $Z = \frac{x - \mu}{\sigma}$. We now calculate that simulataneously for each course.

We already have the mean and the grade of each course, so now we calculate the standard deviation. For a single course we used the (biased) standard deviation $$\sigma = \sqrt{\frac{\sum_{x \in X}(x-\mu)^2}{N}}$$ where $X$ is the the set of all grades in this one exam and $N$ is the total number of students. Our data looks a bit different however. What we have is for every possible grade, the number of students with that grade, so our calculation will look like this: $$\sigma = \sqrt{\frac{\sum_{g \in G}n_g(g-\mu)^2}{N}}$$ Where $G$ is the set of possible grades ($1.0$, $1.3$, ...) and for a grade $g \in G$, $n_g$ is the number of students with that grade in this exam.

First, we create a matrix which has for each exam the row vector $(\ \ldots \ (g - \mu)^2 \ \ldots\ )_{g \in G}$ which contain the squared difference of every possible grade and the mean:

We now multiply this matrix element-wise with our data, and get the matrix $(n_{g,i}(g-\mu)^2)_{g \in G, i \in E}$ where $E$ is the set of exams and $n_{g,i}$ is the number of students who get the grade $g$ in exam $i$. We then sum each row, and divide it by the total number of students who took that exam, which gives us a vector where every element is the complete term inside the squre root above, for one exam. We then just take the square root element-wise, and get a vector of the standard deviations of all exams, which we add to our DataFrame.

Once we have all that, we just do the $\frac{x-\mu}{\sigma}$ calculation for each exam.

Of course we do it with numpy's vectorized operations.

Save the DataFrame data to a json:

This "total Z-Score" is calculated to take into account the number of students in each course, unlike when just taking an average (weighted for credits but not for students) of the Z-Scores of all courses. This result is less affected by courses with a small amount of participants. This might make sense if you are trying to compare me to the total population of computer science students. A course with a small number of participants could be like a study with a small number of participants. It is less likely to provide a selection representative of the entire population. In the "Mean Z-Score" we only weight courses by their ECTS, here we also take the number of participants into account.