When you know sizes need to copy you have to use std::copy function - it is faster for some cases even memcpy.
I’ve reviewed some code for matrices manipulation, and saw that it was written with raw C array, I’ve changed it with std::vector and got big slowdown in matrices coping. After that i wrote test program and saw that the std::vector::assign method is very ineffective for the full container coping, and std::vector::operator= is too ineffective than simple memcpy(). Maybe i do something wrong? I alway thought that i can use std::vector instead C array in C++. But now i see that i have pay for that.
I’ve wrote sample program to play with SSE optimizations. And received some strange results: