Time complexity (Big-O notation) of Posterior Probability Calculation

Question

I got a basic idea of Big-O notation from Big-O notation's definition.

In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.

The posterior probability of m grid is calculated as follows:

and the marginal likelihood is given as:

Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.

What will be the time complexity of the above operation in terms of Big-O notation?

My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.

Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?

Thank you very much.

@d_kennetz I found lots of questions regarding Big-O notation on StackOverflow, hence I'm posting my question here. Could you let me know how can I move this question to another site, please? — santobedi, Nov 26 '18 at 01:58
@d_kennetz I want to move this question to https://math.stackexchange.com/. Could you help me, please? — santobedi, Nov 26 '18 at 10:21

score 1 · Accepted Answer · answered Nov 26 '18 at 12:07

1

Discrete sum and product

can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.

If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.

By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?

answered Nov 26 '18 at 12:07

kriss

23,497
17
97
116

A question related to my problem is (https://math.stackexchange.com/questions/2378932/naive-bayes-classifier-big-o-complexity). Is the time complexity O(MA) in my case? N is M. N is used for proper visualization. – santobedi Nov 27 '18 at 02:37
If the time complexity is _O(MA)_, and _M_ is divided into _T_ clusters where every cluster has _Q_ grids, then will the time complexity be _O((T+Q)A)_? – santobedi Nov 27 '18 at 02:46
Superficially, yes, it looks like time complexity is O(MA), and no if you divide M in T clusters where every cluster has Q grids then the time complexity won't change, it will still be O(MA), just written as O(TQA). Dividing in subproblems don't help by itself, it helps if you avoid repeating part of the computing (like if several grids are identical for instance). You should really try to translate your equation to code, it's not hard. In the current state it's unclear what is input and outputThe choice of data structure is also important for complexity (could be worse than expected). – kriss Nov 27 '18 at 10:28
After clustering, one of the cluster heads is selected from _T_ and the algorithm is run only on _Q_ grids. The time complexity of the _T_ determining algorithm is _O(T)_. In this situation, what will be the final computational complexity? _O(T)_ + _O(QA)_ or _O((T+Q)A)_? – santobedi Nov 27 '18 at 10:48
I don't understand what you describe. If you apply your algorithm only on one cluster head from T it will be O(QA). The way you choose the cluster from T is irrelevant (except if it is very slow, then it will become the main part of complexity). Now if you are doing that on every cluster from T one by one, in the end you get O(TQA). (apply formula for each cluster, for each grid from the cluster, for each feature). You can only reduce complexity if you find a way to reduce computing (In most CS examples data access is the limiting factor, seems not to be the case here). – kriss Nov 27 '18 at 12:34
Another point: if your algorithm is working in paralell (paralell implementation), then the time complexity can be divided by the number of paralell processes maybe with a modifier if the paralell results have to be reconciled afterward. Paralellism is also an implementation detail of the algorithm. – kriss Nov 28 '18 at 10:42
My algorithm needs to select a grid among _M_ grids based on the posterior probability of the grids. I'm trying to reduce the computational cost of the algorithm by reducing the search space (number of grids). For this, I divided all the _M_ grids into _T_ clusters (each cluster has _Q_ grids). So the algorithm first selects one of the _T_ clusters and final grid is selected from the _Q_ grids inside the selected cluster. _M_ search space reduced to _Q_ search space. Does _O(MA)_ reduce to _O(QA_) in this case? – santobedi Nov 29 '18 at 01:27
Maybe somewhat clearer. It's still you having the answer. How many posterior probability do you have to compute ? It will reduce to O(QA) only if you just have to compute Q posterior probability. What you are describing is raising questions. How do you create clusters ? (what is the cost of creating clusters). Also what is the cost of choosing a grid inside a cluster. Complexity using big-O is something trivial, it merely account for the amount of computing you need. If you can avoid computing some items you'll reduce complexy, otherwise no. – kriss Nov 29 '18 at 02:37
_Q_ posterior probability needs to be computed. Creating clusters is the different part of the algorithm which I don't have to consider for the computational cost. The cost of selecting a cluster among _T_ clusters is _O(T)_. The grid with the largest posterior probability is chosen among the _Q_ girds. Clustering helps to avoid computing posterior probability of _M-Q_ grids. In this scenario, I want to claim that clustering helps to reduce computational cost. Is my approach correct? – santobedi Nov 29 '18 at 02:53
1

If you are only computing posterior probability of Q grids instead of M grids, the complexity will be O(QA) no mystery. You will certainly have to justify why your clustering allows to do that and that the result is still correct, and that the cost of clustering by itself does not raise time complexity, but in terms of time complexity it looks correct. – kriss Nov 29 '18 at 13:52

Time complexity (Big-O notation) of Posterior Probability Calculation

1 Answers1