Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Saturday, December 19, 2015

Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?

Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Are there are any cases where you would prefer O(log n) time complexity to O(1) time complexity? Or O(n) to O(log n)?

Do you have any examples?

Answer by Alistra for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


There is always the hidden constant, which can be lower on the O(log n) algorithm. So it can work faster in practice for real-life data.

There are also space concerns (e.g. running on a toaster).

There's also developer time concern - O(log n) may be 1000 easier to implement and verify.

Answer by Kevin K for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


In contexts where data security is a concern, a more complex algorithm may be preferable to a less complex algorithm if the more complex algorithm has better resistance to timing attacks.

Answer by NoseKnowsAll for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


I'm surprised nobody has mentioned memory-bound applications yet.

There may be an algorithm that has less floating point operations either due to its complexity (i.e. O(1) < O(log n)) or because the constant in front of the complexity is smaller (i.e. 2n2 < 6n2). Regardless, you might still prefer the algorithm with more FLOP if the lower FLOP algorithm is more memory-bound.

What I mean by "memory-bound" is that you are often accessing data that is constantly out-of-cache. In order to fetch this data, you have to pull the memory from your actually memory space into your cache before you can perform your operation on it. This fetching step is often quite slow - much slower than your operation itself.

Therefore, if your algorithm requires more operations (yet these operations are performed on data that is already in cache [and therefore no fetching required]), it will still out-perform your algorithm with fewer operations (which must be performed on out-of-cache data [and therefore require a fetch]) in terms of actual wall-time.

Answer by Warren Weckesser for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


My answer here Fast random weighted selection across all rows of a stochastic matrix is an example where an algorithm with complexity O(m) is faster than one with complexity O(log(m)), when m is not too big.

Answer by jpmc26 for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Consider a red-black tree. It has access, search, insert, and delete of O(log n). Compare to an array, which has access of O(1) and the rest of the operations are O(n).

So given an application where we insert, delete, or search more often than we access and a choice between only these two structures, we would prefer the red-black tree. In this case, you might say we prefer the red-black tree's more cumbersome O(log n) access time.

Why? Because the access is not our overriding concern. We are making a trade off: the performance of our application is more heavily influenced by factors other than this one. We allow this particular algorithm to suffer performance because we make large gains by optimizing other algorithms.

So the answer to your question is simply this: when the algorithm's growth rate isn't what we want to optimize, when we want to optimize something else. All of the other answers are special cases of this. Sometimes we optimize the run time of other operations. Sometimes we optimize for memory. Sometimes we optimize for security. Sometimes we optimize maintainability. Sometimes we optimize for development time. Even the overriding constant being low enough to matter is optimizing for run time when you know the growth rate of the algorithm isn't the greatest impact on run time. (If your data set was outside this range, you would optimize for the growth rate of the algorithm because it would eventually dominate the constant.) Everything has a cost, and in many cases, we trade the cost of a higher growth rate for the algorithm to optimize something else.

Answer by Loren Pechtel for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Alistra nailed it but failed to provide any examples so I will.

You have a list of 10,000 UPC codes for what your store sells. 10 digit UPC, integer for price (price in pennies) and 30 characters of description for the receipt.

O(log N) approach: You have a sorted list. 44 bytes if ASCII, 84 if Unicode. Alternately, treat the UPC as an int64 and you get 42 & 72 bytes. 10,000 records--in the highest case you're looking at a bit under a megabyte of storage.

O(1) approach: Don't store the UPC, instead you use it as an entry into the array. In the lowest case you're looking at almost a third of a terabyte of storage.

Which approach you use depends on your hardware. On most any reasonable modern configuration you're going to use the log N approach. I can picture the second approach being the right answer if for some reason you're running in an environment where RAM is critically short but you have plenty of mass storage. A third of a terabyte on a disk is no big deal, getting your data in one probe of the disk is worth something. The simple binary approach takes 13 on average. (Note, however, that by clustering your keys you can get this down to a guaranteed 3 reads and in practice you would cache the first one.)

Answer by Simulant for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


The possibility to execute an algorithm in parallel.

I don't know if there is an example for the classes O(log n) and O(1), but for some problems, you choose an algorithm with a higher complexity class when the algorithm is easier to execute in parallel.

Some algorithms cannot be parallelized but have so low complexity class. Consider another algorithm which achieves the same result and can be parallelized easily, but has a higher complexity class. When executed on one machine, the second algorithm is slower, but when executed on multiple machines, the real execution time gets lower and lower while the first algorithm cannot speed up.

Answer by Yakk for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Yes.

In a real case, we ran some tests on doing table lookups with both short and long string keys.

We used a std::map, a std::unordered_map with a hash that samples at most 10 times over the length of the string (our keys tend to be guid-like, so this is decent), and a hash that samples every character (in theory reduced collisions), an unsorted vector where we do a == compare, and (if I remember correctly) an unsorted vector where we also store a hash, first compare the hash, then compare the characters.

These algorithms range from O(1) (unordered_map) to O(n) (linear search).

For modest sized N, quite often the O(n) beat the O(1). We suspect this is because the node-based containers required our computer to jump around in memory more, while the linear-based containers did not.

O(lg n) exists between the two. I don't remember how it did.

The performance difference wasn't that large, and on larger data sets the hash-based one performed much better. So we stuck with the hash-based unordered map.

In practice, for reasonable sized n, O(lg n) is O(1). If your computer only has room for 4 billion entries in your table, then O(lg n) is bounded above by 32. (lg(2^32)=32) (in computer science, lg is short hand for log based 2).

In practice, lg(n) algorithms are slower than O(1) algorithms not because of the logarithmic growth factor, but because the lg(n) portion usually means there is a certain level of complexity to the algorithm, and that complexity adds a larger constant factor than any of the "growth" from the lg(n) term.

However, complex O(1) algorithms (like hash mapping) can easily have a similar or larger constant factor.

Answer by John Coleman for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


A more general question is if there are situations where one would prefer an O(f(n)) algorithm to an O(g(n)) algorithm even though g(n) << f(n) as n tends to infinity. As others have already mentioned, the answer is clearly "yes" in the case where f(n) = log(n) and g(n) = 1. It is sometimes yes even in the case that f(n) is polynomial but g(n) is exponential. A famous and important example is that of the Simplex Algorithm for solving linear programming problems. In the 1970s it was shown to be O(2^n). Thus, its worse-case behavior is infeasible. But -- its average case behavior is extremely good, even for practical problems with tens of thousands of variables and constraints. In the 1980s, polynomial time algorithms (such a Karmarkar's interior-point algorithm) for linear programming were discovered, but 30 years later the simplex algorithm still seems to be the algorithm of choice (except for certain very large problems). This is for the obvious reason that average-case behavior is often more important than worse-case behavior, but also for a more subtle reason that the simplex algorithm is in some sense more informative (e.g. sensitivity information is easier to extract).

Answer by Philipp Claen for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Let's say you're implementing a blacklist on an embedded system, where numbers between 0 and 1,000,000 might be blacklisted. That leaves you two possible options:

  1. Use a bitset of 1,000,000 bits
  2. Use a sorted array of the blacklisted integers and use a binary search to access them

Access to the bitset will have guaranteed constant access. In terms of time complexity, it is optimal. Both from a theoretical and from a practical point view (it is O(1) with an extremely low constant overhead).

Still, you might want to prefer the second solution. Especially if you expect the number of blacklisted integers to be very small, as it will be more memory efficient.

And even if you do not develop for an embedded system where memory is scarce, I just can increase the arbitrary limit of 1,000,000 to 1,000,000,000,000 and make the same argument. Then the bitset would require about 125G of memory. Having a guaranteed worst-case complexitity of O(1) might not convince your boss to provide you such a powerful server.

Here, I would strongly prefer a binary search (O(log n)) or binary tree (O(log n)) over the O(1) bitset. And probably, a hash table with its worst-case complexity of O(n) will beat all of them in practice.

Answer by Solomonoff's Secret for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


There is a good use case for using a O(log(n)) algorithm instead of an O(1) algorithm that the numerous other answers have ignored: immutability. Hash maps have O(1) puts and gets, assuming good distribution of hash values, but they require mutable state. Immutable tree maps have O(log(n)) puts and gets, which is asymptotically slower. However, immutability can be valuable enough to make up for worse performance and in the case where multiple versions of the map need to be retained, immutability allows you to avoid having to copy the map, which is O(n), and therefore can improve performance.

Answer by Mehrdad for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


People have already answered your exact question, so I'll tackle a slightly different question that people may actually be thinking of when coming here.

A lot of the "O(1) time" algorithms and data structures actually only take expected O(1) time, meaning that their average running time is O(1), possibly only under certain assumptions.

Common examples: hashtables, expansion of "array lists" (a.k.a. dynamically sized arrays/vectors).

In such scenarios, you may prefer to use data structures or algorithms whose time is guaranteed to be absolutely bounded logarithmically, even though they may perform worse on average.
An example might therefore be a balanced binary search tree, whose running time is worse on average but better in the worst case.

Answer by Salvador Dali for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


There can be many reasons to prefer an algorithm with higher big O time complexity over the lower one:

  • most of the time, lower big-O complexity is harder to achieve and requires skilled implementation, a lot of knowledge and a lot of testing.
  • big-O hides the details about a constant: algorithm that performs in 10^5 is better from big-O point of view than 1/10^5 * log(n) (O(1) vs O(log(n)), but for most reasonable n the first one will perform better. For example the best complexity for matrix multiplication is O(n^2.373) but the constant is so high that no (to my knowledge) computational libraries use it.
  • big-O makes sense when you calculate over something big. If you need to sort array of three numbers, it matters really little whether you use O(n*log(n)) or O(n^2) algorithm.
  • time complexity is not everything. Imagine an algorithm that runs in O(n^2) and requires O(n^2) memory. It might be preferable over O(n^3) time and O(1) space when the n is not really big. The problem is that you can wait for a long time, but highly doubt you can find a RAM big enough to use it with your algorithm
  • parallelization is a good feature in our distributed world. There are algorithms that are easily parallelizable, and there are some that do not parallelize at all. Sometimes is makes sense to run an algorithm on 1000 commodity machines with a higher complexity than using one machine with a slightly better complexity.
  • in some places (security) a complexity can be a requirement. No one wants to have a hash algorithm that can hash blazingly fast (because then other people can bruteforce you way faster)
  • somebody patented the lower-complexity algorithm and it is more economical for a company to use higher complexity than to pay money.

Answer by Reek for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


To put my 2 cents in:

Sometimes a worse complexity algorithm is selected in place of a better one, when the algorithm runs on a certain hardware environment. Suppose our O(1) algorithm non-sequentially accesses every element of a very big, fixed-size array to solve our problem. Then put that array on a mechanical hard drive, or a magnetic tape.

In that case, the O(logn) algorithm (suppose it accesses disk sequentially), becomes more favourable.

Answer by EJP for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


In a realtime situation where you need a firm upper bound you would select e.g. a heapsort as opposed to a Quicksort, because heapsort's average behaviour is also its worst-case behaviour.

Answer by Dmitry Rubanovich for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


At any point when n is bounded and the constant multiplier of O(1) algorithm is higher than the bound on log(n). For example, storing values in a hashset is O(1), but may require an expensive computation of a hash function. If the data items can be trivially compared (with respect to some order) and the bound on n is such that log n is significantly less than the hash computation on any one item, then storing in a balanced binary tree may be faster than storing in a hashset.

Answer by Compynerd255 for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Simply: Because the coefficient - the costs associated with setup, storage, and the execution time of that step - can be much, much larger with a smaller big-O problem than with a larger one. Big-O is only a measure of the algorithms scalability.

Consider the following example from the Hacker's Dictionary, proposing a sorting algorithm relying on the Multiple Worlds Interpretation of Quantum Mechanics:

  1. Permute the array randomly using a quantum process,
  2. If the array is not sorted, destroy the universe.
  3. All remaining universes are now sorted [including the one you are in].

(Source: http://catb.org/~esr/jargon/html/B/bogo-sort.html)

Notice that the big-O of this algorithm is O(n), which beats any known sorting algorithm to date on generic items.

Answer by Joel Coehoorn for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


  1. When the "1" work unit in O(1) is very high relative to the work unit in O(log n) and the expected set size is small-ish. For example, it's probably slower to compute Dictionary hash codes than iterate an array if there are only two or three items.

or

  1. When the memory or other non-time resource requirements in the O(1) algorithm are exceptionally large relative to the O(log n) algorithm.

Answer by Madusudanan for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


Adding to the already good answers.A practical example would be Hash indexes vs B-tree indexes in postgres database.

Hash indexes form a hash table index to access the data on the disk while btree as the name suggests uses a Btree data structure.

In Big-O time these are O(1) vs O(logN).

Hash indexes are presently discouraged in postgres since in a real life situation particularly in database systems, achieving hashing without collision is very hard(can lead to a O(N) worst case complexity) and because of this, it is even more harder to make them crash safe (called write ahead logging - WAL in postgres).

This tradeoff is made in this situation since O(logN) is good enough for indexes and implementing O(1) is pretty hard and the time difference would not really matter.

Answer by HoboBen for Is there any case where you would prefer a higher big-O time complexity algorithm over the lower one?


When n is small, and O(1) is constantly slow.

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.