Cache friendly code

Example

int main()
{
    int arr[1000][1000];

    for (int a = 0; a < sizeof(arr)/sizeof(arr[0]); ++a)
    {
        for (int b = 0; b < sizeof(arr[0])/sizeof(arr[0][0]); ++b)
        {
            arr[a][b] = 0; // vs. arr[b][a] = 0;
        }
    }

    return 0;
}

Analysis

The array will be stored in memory like this:

[row_0, col_0] ... [row_0, col_999] ... [row_999, col_0] ... [row_999, col_999]

For better performance, the CPU will typically cache a specific amount of bytes after the accessed one. For better understanding, let’s assume it will always cache exactly one row of that array.

Cache friendly approach
With ‘arr[a][b]’ it will iterate over the columns, row by row. Because the current row is in the cache, it only has to reload after all cells of that row have been accessed.

Cache unfriendly approach
With ‘arr[b][a]’ it will iterate over the rows, column by column. Because only the current row is in the cache, it has to reload a whole new row after every cell access.