A group of software engineers want to develop an algorithm for finding strongly connected groups among members of a social networking site. A group of people are considered to be strongly connected if each person knows each other person within the group.
Given M nodes and N edges develop an algorithm that determines the minimum size of the largest strongly connected group.
Constraints:
1 <= N <= 100000
0 <= M <= 10000
0 <= M <= N*(N-1)/2
Notes:
This question seems one of the popular OA questions . In this link below,
http://canyon289.github.io/MinStrongNetwork.html#click-to-start-search
you can find a complete description of this question, plus some examples and ideas (but it seems that the idea on this page is not a good one to solve this problem.)
Let us begin with some small cases:
when N = 1, it always is 1;
when N = 2, it is 1 when M <1, and it is 2 when M>=1;
when N = 3, it is 1 when M<1, and it is 2 when M>=1 and M<3, and it is 3 when M>=3;
when N = 4, it is 1 when M<1, and it is 2 when (which is the key part), ... and it is 4 when M>=6;
Now let us focus on the key part:
when N = 4, if we link the 4 nodes into a circle with 4 edges (M=4), then it is still 2. If we add another edge (M=5), then it becomes 3; then add another one (M=6), they are fully connected.
Thus,
when N = 4, it is 1 when M<1, and it is 2 when M>=1 and M<=4, and it is 3 when M=5, and it is 4 when M>= 6.
So it seems that one way to solve this problem is that we just need to form a circle first, and then add more edges with the constrain that do not increase the size of any strongly connected components (scc) until have to. And this is essentially the idea in that link cited above. But it seems have some problems. Firstly, is this idea correct?
Let us look at the case when N = 5.
If we follow the idea in the link, we form a circle with 5 nodes which gives us 5 edges. Now we cannot add one more edge since it will always make the minimum size of the largest scc (lscc) to become 3.
However, if we form a circle with 4 nodes first which gives 4 edges, and we have last node unconnected. In order to better illustrate the idea, let us label the node from 1 to 5, and 1 to 4 are connected forming a circle consecutively. So we then can connect the 5th node with 1 and 3 forming another 2 edges without increase the minimum size of the lscc. (or connect 2 and 4 nodes, which gives the same results.)
But now we can have 6 edges with 5 nodes with the minimum size of lscc of 2!!!
So the above greedy way does not work.
Now let's try something different:
The answer to this question apparently is in a range: the min = 1, and the max = N. And it is asking for the minimum size of the lscc, so it can naturally lead the very fundamental binary search idea.
Now let's set Left = 1, Right = N, then Mid = Left + (Right - Left)/2. If we cannot connect the N nodes in such a way that the minimum size of the lscc is not larger than Mid with M edges, or there are more edges than needed and we have to increase Mid in order to run up all the M edges, then current Mid is too small (or invalid), and we can update Left = Mid + 1; otherwise, we can update Right = Mid.
In this way, we can eventually find the correct value which is Left == Right.
So now the key question becomes: given N nodes and the lscc is Mid, what is the maximum number of edges that can be added onto? And we need the answer to this question in the above binary searching.
Again let's start with small cases:
when N = 4, and Mid = 2. We firstly fully connect 1 to 2, and 3 to 4, to form two scc with a size of Mid = 2. Now we have 2 edges. Next, we can connect 1 to 3, and 2 to 4. (or 1 to 4, and 2 to 3). So the maximum is 4 edges;
when N = 4, and Mid = 3. We can firstly fully connect 1, 2, 3 to form a scc with a size of 3, then connect 4 to 1, and 4 to 2. (or 4 to 1, and 4 to 3; or 4 to 2, and 4 to 3). So one of the key observation is that 4 cannot connect all the nodes in a scc, otherwise it will increase the size of scc by 1. So the maximum is 5 edges.
It seem still not enough to derive the result that can be applied to all the cases. Let us continue,
when N = 5, and Mid = 2. Follow the similar idea of when N = 4, we can firstly form 2 scc's by fully connect 1 to 2, and 3 to 4, respectively. Next, we can connect 1 to 3 and 2 to 4. The last step is new: we have a unconnected node 5. After some trails and errors, it is found that we can connect 5 to 1, and 5 to 3. (or 5 to 1, and 5 to 4, or 5 to 2, and 5 to 3, or 5 to 2, and 5 to 4. The key is the same, 5 cannot connect all the nodes in a scc). So the maximum is 6 edges.
It is time to make some summaries:
1): we form as many scc's as possible with a size of Mid; the total number of the scc edges is easy to calculate
2): we need to calculate all the possible connections between every two scc's with a size of Mid;
3): we need to take care of the remainders whose size is smaller than Mid (it turns out later that this is essentially the same as 2)).
We need to figure out how many edges we can place between nodes.
For 1): the answer is straightforward: (N/Mid) * (Mid*(Mid-1)/2) = num_scc * edge_num_scc;
Step 2) seems not that easy. Let start again with some example, when N = 6, and Mid = 3. So we have 2 scc's with a size of 3. Let say 1, 2, 3 are in a scc, and 4, 5, 6 in the other scc. Now we can connect 1 to 4 and 5 (2 more edges) but not 6 (or it will increase the size of scc); then 2 to 5 and 6, then 3 to 6 and 4. The total maximum connections between 2 scc's with the same size of Mid is Mid * (Mid - 1).
But we may have more than 2 scc's. For example, when N = 9 and Mid = 3. In this case, every 2 scc's can be connected in the similar way, which gives a factor of F = (N/Mid)*(N/Mid - 1)/2. Thus, the total maximum number of edges for step 2 is: F * Mid * (Mid - 1) = (N/Mid)*(N/Mid - 1)/2 * Mid * (Mid - 1).
Before we move on to step 3), let look an interesting question first. What are the maximum number of connections between 2 scc's with different sizes, without increasing the minimum size of lscc?
Let say one scc has a size of Mid, and the other is R which is smaller than Mid. Similar to the above case of 2 scc's with the same size, every node in the second scc can connect (Mid - 1) nodes in the first one. Thus, the total number of edges is R * (Mid - 1).
If there are (N/Mid) scc's, then the total number of edges is (N/Mid) * R * (Mid - 1), which is one part of the answer to the step 3).
What is still missing?
If we have more than 1 node in the remaining. For example, Mid = 4, R = 3. The 3 nodes can form a scc first. So this part will give R* (R - 1)/2 edges.
Thus the final answer to step 3) is R* (R - 1)/2 + (N/Mid) * R * (Mid - 1).
The total number of edges are the sum the all three steps.
We are done. And the next step is coding.
Please see the code below:
#include <iostream> #include <vector> using namespace std; int minMax_scg(int &m, int &n); int maxEdge(int &m, int &max_scg); int main() { int num_node = 100, num_edge = 4900; return minMax_scg(num_node, num_edge); } int minMax_scg(int &m, int &n) { int left = 1, right = m; while(left < right) { int mid = left + (right - left) / 2; if(maxEdge(m, mid) < n) left = mid + 1; else right = mid; } cout<<"The minimum size of the largest scc = " <<left<<endl; return left; } int maxEdge(int &m, int &max_scg) { int num_scg_compo = m / max_scg; int remainer = m % max_scg; int a = max_scg, b = num_scg_compo, c = remainer; int maxE = b*a*(a-1)/2 + b*(b-1)/2*a*(a-1) + c*(c-1)/2 + b*c*(a-1); cout<<"num_node = "<<m<<" max_scg = "<<max_scg<<" maxNumEdge = "<<maxE<<endl; return maxE; }
Please leave any comment if you find some incorrect places or improvements. Any discussion is also welcome!
PS: after finishing the writing, I feel that the there may be existed formula to calculate the maximum number of edges for a un-directed graph with N nodes and the lscc of K (K <= N). Please let me know if you find this somewhere. Thanks.
Is there a leetcode problem similar to this?
ReplyDeleteSeems not yet. Please let me know if you find similar ones added.
ReplyDelete