=SUMPRODUCT

I knew what I wanted: a way to define a “weight” column that would tell me how many times a particular pair of values appeared in my spreadsheet. I spent some time (a lot of it) searching the web using various sentences about frequency and counting. I read a lot of documentation for =FREQUENCY (which isn’t quite what I need) and =COUNTIF (which works for one column, but not for two). At some point I realized I was looking for =SUMPRODUCT. So here’s a secret: I don’t understand how or why SUMPRODUCT works. I kind of do know that you can use it for a test like this, but I was really struggling to wrap my head around why it is the right tool.

I get it now, though. What clicked for me was xldynamic’s explanation of the arrays that a SUMPRODUCT formula creates when you include comparisons.

And I want to know how many rows have the exact same values in columns D and E:

=SUMPRODUCT(($D$2:$D$224=D2)*($E$2:$E$224=E2))

Worth knowing: a boolean result can be represented as TRUE/FALSE, YES/NO or 1/0. I don’t know why that is, by which I mean, it makes sense to me intuitively but I can’t actually explain it to you. So perhaps we’ll explore that in the Ladygrammar Book Computing Club. For now, just trust me. We’re going to represent TRUE as 1 and FALSE as 0.

So. If I were dealing with a table like this:

Value A Value B
0 25
0 25
0 35
0 37
0 42
0 0
1 11
1.5 3
1.5 9
1.5 9
1.5 25
2 2
2 7

I’d use a =SUMPRODUCT formula to create two arrays about each row, and add up the product of each array. So the first instance of the formula would first compare my first Col A value (“0″) to every other value in Col A and create an array of the results (which are TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE). Then it would compare my first Col B value (“25″) to every other value in Column B and create an array of those results. Then it iterate through those two arrays and create a third array which contains the product of each pair of values. TRUE * TRUE is 1, FALSE * TRUE is 0, FALSE * FALSE is 0.

TRUE TRUE 1 * 1 = 1
TRUE TRUE 1 * 1 = 1
TRUE FALSE 1 * 0 = 0
TRUE FALSE 1 * 0 = 0
TRUE FALSE 1 * 0 = 0
TRUE FALSE 1 * 0 = 0
FALSE FALSE 0 * 0 = 0
FALSE FALSE 0 * 0 = 0
FALSE FALSE 0 * 0 = 0
FALSE FALSE 0 * 0 = 0
FALSE TRUE 0 * 1 = 0
FALSE FALSE 0 * 0 = 0
FALSE FALSE 0 * 0 = 0

Then it adds up the SUM of all those PRODUCTS. Get it? SUMPRODUCT.

1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 = 2

There are exactly two rows where Col A is 0 and Col B is 25. I repeat that formula for each row, et voila:

Value A Value B =SUMPRODUCT
0 25 2
0 25 2
0 35 1
0 37 1
0 42 1
0 0 1
1 11 1
1.5 3 1
1.5 9 2
1.5 9 2
1.5 25 1
2 2 1
2 7 1

Then, hypothetically, I could use that third column to calculate the radius of points on a scatterplot. Also handy? Libre Office Calc’s save as HTML option.

PS. If you’re clever, you’ll notice that the scatterplot doesn’t really show any kind of correlation. That’s okay — my project was just to figure out how to make it in D3.

Leave a comment