Divergence and Disproportionality

In my last post, I discussed my remove some of the layers between the actual number of parties and derivatives like the effective number of parties and electoral disproportionality. That method focussed on building a logical model of the variance of party shares, breaking down the Gallagher index, and then using the former to model the latter. However, I was left stumped by a correlation coefficient and, after discussing the issue with Chris Hanretty, realised that that is where all the action is. As such, I’ve pivoted to another approach.

At the end of the post, I suggested some alternate ways forward. One was to develop a kind of “effective” measure of disproportionality that, much like the effective number of parties, is rooted in information theory. The general idea would be to measure disproportionality using the Rényi divergence. And, since we can decompose this measure into Rényi entropy and, therefore, into the effective number of parties, we might be able to get a better handle on the link between disproportionality and party-system fragmentation.

What follows is a preliminary attempt to elaborate such a measure along with some simple comparisons to the Gallagher index and a quick attempt at a constraint-based model.

Measuring Disproportionality

Political scientists most often measure disproportionality—the extent to which seat shares differ from vote shares—using the Gallagher index (Gallagher 1991).¹ We compute the Gallagher index as follows:

$G = \sqrt{\frac{1}{2} \sum_{i = 1}^{N_{V 0}} {(v_{i} - s_{i})}^{2}}$

Here, $v$ is some distribution of vote shares, $s$ is some distribution of seat shares, and $N_{V 0}$ is the number of vote winning parties². Where the distribution of votes and seats are identical—and, thus, the results are perfectly proportional— $G = 0$ . Instead, were the distribution of votes and seats are as distinct as possible—and, thus, the results are perfectly disproportional— $G = 1$ . In reality, both extremes are rare, though, as Figure 1 shows, data from several hundred national elections with “simple electoral systems” (Shugart and Taagepera 2017; Taagepera 2007) taken from the Constituency-Level Elections Archive (Kollman et al. 2024) tend to fall closer to the proportional than disproportional end of the scale.

Figure 1: Histogram of Gallagher index scores for national elections with “simple electoral systems”.

“Effective” Disproportionality

If we were to begin the study of electoral systems again, we might seek to measure all electoral phenomena using some consistent framework. To this end, information theory would be a promising candidate. The reason it would be appealing is that the effective number of parties—perhaps the most important index in the study of elections—is information theoretic in nature. In particular, it shares an exact equivalence with the Rényi entropy, which is given as:

$H_{α} (X) = \frac{1}{1 - α} \ln (\sum_{i = 1}^{n} p_{i}^{α})$

The parameter $α$ determines the weight we place on larger or smaller proportions of the vector $p$ . Where $α = 2$ , the equation for the Rényi entropy simplifies such that:

$H_{2} (X) = \ln (\frac{1}{\sum_{i = 1}^{n} p_{i}^{2}})$

Which is just the natural logarithm of the effective number of parties. So, by implication, those who have used the effective number of parties have worked with an information theoretic quantity for almost 50 years, whether they knew it or not.

Entropy measures, like the effective number of parties, measure how different some distribution is compared to uniformity. But this baseline is less useful if we want to measure disproportionality where we would instead prefer to compare one distribution to another. To do this, we can use the Rényi divergence, which compares two distributions and is given as follows:

$D_{α} (P ‖ Q) = \frac{1}{α - 1} \ln (\sum_{i = 1}^{n} \frac{p_{i}^{α}}{q_{i}^{α - 1}})$

Where $P$ and $Q$ are arbitrary probability distributions and $α$ is the same weighting parameter as I discuss above. We can replace these arbitrary distributions with shares of seats and votes, respectively.³ Likewise, we can also set $α = 2$ so that we compute our new “effective” disproportionality measure in the same way that we compute the effective number of parties. After substitution and simplification, this gives:

$D_{2} (S ‖ V) = \ln (\sum_{i = 1}^{N_{V 0}} \frac{s_{i}^{2}}{v_{i}})$

The logic here may not seem apparent at first, but consider that squaring any value and then dividing it by itself returns the original value. For instance, ${0.25}^{2} / 0.25 = 0.25$ . The sum of these values then equals 1, implying perfect proportionality. In principle, the logarithm of this sum can take any value between 0 and $\infty$ . As such, we must rescale it to use a zero-to-one scale. Thankfully, this is relatively straightforward and requires only that we negate it, exponentiate it, and then subtract the resulting value from one.⁴ For the sake of consistent, I call this measure $R$ in honour of Rényi until I can think of a better title. This gives:

$\begin{aligned} R & = 1 - e^{- D_{2} (S ‖ V)} \\ = 1 - (\sum_{i = 1}^{N_{V 0}} \frac{s_{i}^{2}}{v_{i}})^{- 1} \end{aligned}$

Some Comparisons

Figure 2: Measuring disproportionality using the Gallagher index and “Effective Disproportionality” based on the Rényi divergence. Here, I simulate a simple two party case where eachs parties’ vote and seat shares mirror each other.

Figure 2 compare effective disproportionality to the Gallagher index using a simple simulated two party case. At the far left extreme of the horizontal axis, the first party has all of the votes and the second party all of the seats. At the half way point, both have an equal share of votes and seats. And, at the far right extreme, the picture flips. As we can see, both indices follow the same trajectory, though the effective measure curves whereas the Gallagher index does not. This, however, is specific to the two party case where the Gallagher index reverts to a simple measure of absolute difference (and outputs exactly the same values as the Pedersen index). In more complex scenarios where the number of parties is greater than two, the Gallagher index also curves.

Figure 3: Effective disproportionality and the Gallagher index plotted against the number of vote winning parties at district level elections.

Figure 3 shows effective disproportionality (left panel) and the Gallagher index (right panel) plotted against the number of vote winning parties, $N_{V 0}$ , at district level elections.⁵ As we can see by inspecting the heat maps, both show the same pattern. Where there is only one vote winning party, the disproportionality score is 0 in all cases. After all, if only one party wins any votes, only one can win any seats and so the result must be perfectly proportional.⁶ It then widens, covering a larger range of the scale as the number of vote winning parties increases. The major difference between the two measures appears to be only that effective disproportionality covers a broader range of the scale.

Figure 4: Effective disproportionality and the Gallagher index plotted against district magnitude at district level elections.

Figure 4 now shows effective disproportionality (left panel) and the Gallagher index (right panel) plotted against the district magnitude, $M$ . Again, the patterns here are what we would expect: disproportionality is highest and most varied where $M = 1$ , then, as the district magnitude grows, it tends to decline. The major difference is, again, that effective disproportionality again uses more of the scale.

Figure 5: Effective disproportionality plotted against the Gallagher index for district level elections with magnitudes between 1 and 6.

Finally, Figure 5 shows the two indices plotted against each other for districts with magnitudes of 1 to 6. In all cases, the two indices are highly correlated. Where $M = 1$ , the two share a strong positive correlation, with little scatter. For cases where $M > 1$ , the scatter appears to increase. And, in all cases, it also appears to be larger at the disproportional end of the scale.

Steps Towards a Functional Form

So how might we build a logical model of effective disproportionality (or even disproportionality in general)? Absent any probabilistic logic that we might borrow, the most obvious way forward is to focus on constraints. As I mention above, one constraint is that where $N_{V 0} = 1$ , we should also expect $R = 1$ . However, I think that the information theoretic properties of the Rényi divergence might imply a more subtle constraint that we can put to better use.

What makes information theoretic quantities so useful is that they tend to have well known decompositions. One holds that $D_{2} (p_{x} (x) ‖ P_{U} (X)) = \ln (N) - H_{2} (X)$ . Or, in plain English, that when the reference distribution is uniform, $P_{U} (X)$ , the Rényi divergence equals the natural logarithm of the number of elements minus the Rényi entropy of the primary distribution. That would then imply that if the distribution of votes is perfectly uniform, the divergence will equal the number of vote winning parties⁷ minus the Rényi entropy of the distribution of seat shares, giving:

$\begin{aligned} R | V_{U} & = 1 - e^{- (\ln (N_{V 0}) - H_{2} (S))} \\ = 1 - e^{- \ln (N_{V 0})} \cdot e^{H_{2} (S)} \\ = 1 - \frac{1}{e^{\ln (N_{V 0})}} \cdot e^{H_{2} (S)} \\ = 1 - \frac{e^{H_{2} (S)}}{e^{\ln (N_{V 0})}} \\ = 1 - \frac{e^{H_{2} (S)}}{N_{V 0}} \end{aligned}$

And since, as I show in this paper, the effective number of parties shares an exact equivalence with the Rényi entropy, such that $N_{2} = e^{H (X)}$ , the equation further simplifies, giving:

$R | V_{U} = 1 - \frac{N_{S 2}}{N_{V 0}}$

So, when the distribution of votes is perfectly uniform, disproportionality equals one minus the ratio of effective seat winning parties to actual vote winning parties. This feels like a nice property. I suspect we could also establish a similar constraint were we to focus on a uniform distribution of seats too. But, for now, I will stick to votes alone.

Though this model considers only a particular edge case, let’s see how well it does as a logical model of disproportionality. To test it, we’ll use it to predict disproportionality and then compare how much of the variation it explains in both effective disproportionality and the Gallagher index. We’ll also compare it to the current state of the art model that Shugart and Taagepera (2017) propose based on first estimating vote and seat shares for the first-placed party.

At the national level, the results are:

Existing method predicting the Gallagher index: $R^{2}$ = 52%
Existing method predicting effective disproportionality: $R^{2}$ = 31%
New model method predicting the Gallagher index: $R^{2}$ = 55%
New model predicting effective disproportionality: $R^{2}$ = 54%

And, at the district level (where $M \geq 2$ ), the results are:

Existing method predicting the Gallagher index: $R^{2}$ = 73%
Existing method predicting effective disproportionality: $R^{2}$ = 56%
New model method predicting the Gallagher index: $R^{2}$ = 68%
New model predicting effective disproportionality: $R^{2}$ = 70%

In general, the model does well and more or less matches the current approach. Interestingly, however, the new model does well on both the effective disproportionality and the Gallagher index, whereas the current approach tends to do well for the Gallagher index but not the effective disproportionality. I’m going to wrap things up here, but I think this is a promising first step. Hopefully if we consider some further constraints, we might be able to improve it further still.

References

Gallagher, Michael. 1991. “Proportionality, Disproportionality and Electoral Systems.” Electoral Studies 10 (1): 33–51. https://doi.org/10.1016/0261-3794(91)90004-C.

Kollman, Ken, Allen Hicken, Daniele Caramani, David Backer, and David Lublin. 2024. “Constituency-Level Elections Archive.” Ann Arbor, MI.

Shugart, Matthew S., and Rein Taagepera. 2017. Votes from Seats: Logical Models of Electoral Systems. Cambridge, UK: Cambridge University Press.

Taagepera, Rein. 2007. Predicting Party Sizes: The Logic of Simple Electoral Systems. Oxford: Oxford University Press.

Footnotes

In Votes from Seats, Shugart and Taagepera (2017) refer to the index as $D_{2}$ and the Pedersen index of electoral volatility as $D_{1}$ . But, for reasons that will soon become apparent, I am going to refer to it as $G$ instead.↩︎
We sum over vote winning parties since not all vote winning parties are seat winning parties, but all seat winning parties are vote winning parties (at least in simple electoral systems).↩︎
Since seat shares may contain zeroes, we must use it as $P$ . Otherwise, division by zero can occur.↩︎
This final step is only necessary when measuring disproportionality. To measure proportionality, subtracting from one is not necessary.↩︎
I use district level data here since they include more cases and include important edge cases where $N_{V 0} = 1$ and, hence, disproportionality must be 0.↩︎
At least in simple electoral systems.↩︎
Again, it must be the number of vote winning parties since we must sum over vote winning parties when computing the Rényi divergence.↩︎