Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
I am googling the question for past hour, but there are only points to Taylor Series or some sample code that is either too slow or does not compile at all. Well, most answer I've found over Google is "Google it, it's already asked", but sadly it's not...
I am profiling my game on low-end Pentium 4 and found out that ~85% of execution time is wasted on calculating sinus, cosinus and square root (from standard C++ library in Visual Studio), and this seems to be heavily CPU dependent (on my I7 the same functions got only 5% of execution time, and the game is waaaaaaaaaay faster). I cannot optimize this three functions out, nor calculate both sine and cosine in one pass (there interdependent), but I don't need too accurate results for my simulation, so I can live with faster approximation.
So, the question: What are the fastest way to calculate sine, cosine and square root for float in C++?
EDIT Lookup table are more painful as resulting Cache Miss is way more costly on modern CPU than Taylor Series. The CPUs are just so fast these days, and cache is not.
I made a mistake, I though that I need to calculate several factorials for Taylor Series, and I see now they can be implemented as constants.
So the updated question: is there any speedy optimization for square root as well?
EDIT2
I am using square root to calculate distance, not normalization - can't use fast inverse square root algorithm (as pointed in comment: http://en.wikipedia.org/wiki/Fast_inverse_square_root
EDIT3
I also cannot operate on squared distances, I need exact distance for calculations
Answer by KeatsKelleher for Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
The fastest way is to pre-compute values an use a table like in this example:
create sine lookup table in c++
BUT if you insist upon computing at runtime you can use the Taylor series expansion of sine or cosine...
For more on the Taylor series... http://en.wikipedia.org/wiki/Taylor_series
One of the keys to getting this to work well is pre-computing the factorials and truncating at a sensible number of terms. The factorials grow in the denominator very quickly, so you don't need to carry more than a few terms.
Also...Don't multiply your x^n from the start each time...e.g. multiply x^3 by x another two times, then that by another two to compute the exponents.
Answer by BigTailWolf for Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
For square root, there is an approach called bit shift.
A float number defined by IEEE-754 is using some certain bit represent describe times of multiple based 2. Some bits are for represent the base value.
float squareRoot(float x) { unsigned int i = *(unsigned int*) &x; // adjust bias i += 127 << 23; // approximation of square root i >>= 1; return *(float*) &i; }
That's a constant time calculating the squar root
Answer by Vinicius Miranda for Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
First, the Taylor series is NOT the best/fastest way to implement sine/cos. It is also not the way professional libraries implement these trigonometric functions, and knowing the best numerical implementation allows you to tweak the accuracy to get speed more efficiently. In addition, this problem has already been extensively discussed in StackOverflow. Here is just one example.
Second, the big difference you see between old/new PCS is due to the fact that modern Intel architecture has explicit assembly code to calculate elementar trigonometric functions. It is quite hard to beat them on execution speed.
Finally, let's talk about the code on your old PC. Check gsl gnu scientific library (or numerical recipes) implementation, and you will see that they basically use Chebyshev Approximation Formula.
Chebyshev approximation converges faster, so you need to evaluate fewer terms. I won't write implementation details here because there are already very nice answers posted on StackOverflow. Check this one for example . Just tweak the number of terms on this series to change the balance between accuracy/speed.
By the way: rule 0 for this kind of problem: if you want implementation details of some special function or numerical method, you should take a look on GSL code before any further action - GSL is THE STANDARD numerical library.
EDIT: you may improve the execution time by including aggressive floating point optimization flags in gcc/icc. This will decrease the precision, but it seems that is exactly what you want.
EDIT2: You can try to make a coarse sin grid and use gsl routine (gsl_interp_cspline_periodic for spline with periodic conditions) to spline that table (the spline will reduce the errors in comparison to a linear interpolation => you need less points on your table => less cache miss)!
Answer by Dhairya for Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
I use the following code to compute trigonometric functions in quadruple precision. The constant N determines the number of bits of precision required (for example N=26 will give single precision accuracy). Depending on desired accuracy, the precomputed storage can be small and will fit in the cache. It only requires addition and multiplication operations and is also very easy to vectorize.
The algorithm pre-computes sin and cos values for 0.5^i, i=1,...,N. Then, we can combine these precomputed values, to compute sin and cos for any angle up to a resolution of 0.5^N
template QuadReal_t sin(const QuadReal_t a){ const int N=128; static std::vector theta; static std::vector sinval; static std::vector cosval; if(theta.size()==0){ #pragma omp critical (QUAD_SIN) if(theta.size()==0){ theta.resize(N); sinval.resize(N); cosval.resize(N); QuadReal_t t=1.0; for(int i=0;i=0;i--){ sinval[i]=2.0*sinval[i+1]*cosval[i+1]; cosval[i]=sqrt(1.0-sinval[i]*sinval[i]); } } } QuadReal_t t=(a<0.0?-a:a); QuadReal_t sval=0.0; QuadReal_t cval=1.0; for(int i=0;i
Answer by milianw for Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
Based on the idea of http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 and some manual rewriting to improve the performance in a micro benchmark I ended up with the following cosine implementation which is used in a HPC physics simulation that is bottlenecked by repeated cos calls on a large number space. It's accurate enough and much faster than a lookup table, most notably no division is required.
template inline T cos(T x) noexcept { constexpr T tp = 1./(2.*M_PI); x *= tp; x -= T(.25) + std::floor(x + T(.25)); x *= T(16.) * (std::abs(x) - T(.5)); #if EXTRA_PRECISION x += T(.225) * x * (std::abs(x) - T(1.));
0 comments:
Post a Comment