Session 8 - Graph Algorithms and Mining_DSE_Batch 10.pptx
GururajaHebburSatyan
17 views
31 slides
Aug 24, 2024
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
Session 8 - GAM_DSE_Batch 10.pptx
Size: 4.51 MB
Language: en
Added: Aug 24, 2024
Slides: 31 pages
Slide Content
Patterns in Static Graphs
Real social networks tend to have many more triangles than random , because friends of friends often become friends themselves. Let ∆ be the number of triangles in a graph, and ∆ i the number of triangles that node i participates in. The first, triangle participation law or TPL, says that the distribution of ∆ i follows a power law with exponent σ . The TPL intuitively states that while many nodes have only a few triangles in their neighbourhoods, a few nodes participate in a huge number of triangles. The number of triangles ∆ i for node i is related to the clustering coefficient of the graph. TRIANGLE POWER LAWS (TPL)
Kang et al. in 2011 [2], studied the Twitter graph. DTPL scatter-plot is (degree d i vs ∆ i ), for Twitter accounts. We see that celebrities have high degree and reasonably connected followers, and some users, having superlinear relationship of degree/triangles. Those accounts have a relatively small degree ≈ 10 4 , but participate in about 10 8 triangles, inspection shows that these are accounts of adult advertisers, and possibly the same person with multiple accounts; following each other, to artificially boost their degrees and credibility. degree-triangle plots can be used to spot potentially dangerous accounts such as those of adult advertisers and spammers . Degree TRIANGLE POWER LAWS (DTPL) [2] U. Kang, et al. “ Spectral analysis for billion-scale graphs: Discoveries and implementation” . In PAKDD (2), pages 13–25, 2011
EigenSpoke
It is the study that have used mobile call graph data to examine and characterize the social interactions of cell phone users, with a focus on understanding the structural properties of the graph. Later it was extended to Patents citation, Bi-partite cores in sparse graphs. It was found that the some specific pattern was repeated in all. They name it as EigenSpokes . They investigate using the adjacency/incidence matrix . But before that lets talk about Singular Value Decomposition (SVD). EigenSpoke [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010]
SVD is one of the many matrix decomposition methods. Singular Value Decomposition (SVD) of an m × n matrix W is a factorization defined as: W = UΣV T , where U and V are m × m and n × n size matrices respectively, and Σ is an m × n diagonal matrix comprised of the singular values . singular values are absolute values of the non-zero eigenvalues : σ i = | λ i | and the singular vectors coincide with the non-null eigenvectors . Taking the top K values of Σ yields the best rank-K approximation (w.r.t. the Frobenius norm [9]) to the original matrix. SVD [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010] [9] G. Strang . “ Introduction to Linear Algebra” . Wellesley-Cambridge Press, 2003.
They define the EE-plot (Eigen-Eigen plot) as the scatter plot of vector U i and U j , for any i and j , i.e., they plot one point ( U in ,U jn ) for each node n in the graph. Surprisingly, They find that the EE-plots for the Mobile Call graph show clear separate straight lines that are often aligned with axes. They call this the “ EigenSpokes pattern” Eigenspoke [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010]
They identify the nodes lying on the extremities of the “spokes”; First 9 singular vectors, in which they identify the 20 nodes having highest magnitude projection along that vector. They then plot the induced sub-graph of these nodes. In that they identify that almost all of the induced sub-graphs contain near-cliques , hinting toward a strong connection between Eigen- Spokes and communities . Eigenspoke [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010]
Near-cliques, or near-bipartite-cores, loosely connected Eigenspoke [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010]
Near-cliques, or near-bipartite-cores, loosely connected So what? Extract nodes with high s cores high connectivity Good “communities” 12 spy plot of top 20 nodes Eigenspoke
patents from same inventor(s) ‘cut-and-paste ’’’’ bibliography! magnified bipartite community Bipartite Communities 13
The presence of spokes in EE-plots (axis-aligned or not) implies that nodes close to each other on a line have similar scores along two eigenvectors (‘score’ of node n along vector U i is U in ). we expect that nodes with similar connectivity will have similar scores along the vectors of U . Thus the EigenSpokes pattern is persistent across a wide array of diverse datasets. We show that the key factors responsible for these patterns are a large number of well-knit communities embedded in very sparse graphs . Eigenspoke [8]Aditya Prakash et al. “ EigenSpokes : Surprising Patterns and Scalable Community Chipping in Large Graphs” , PAKDD 2010]
Dynamics of Dynamic Graphs
The graphs which changes with respect to time. They may either grow, shrink or split as the time passes. Densification: Most real networks such as the web and social networks continue to become more dense over time. This law states that: The number of nodes in the network increases superlinearly with the number of nodes over time, whereas the number of edges increases superlinearly over time. In other words, if n(t) and e(t) represent the number of edges and nodes in the network at time t , then we have: (Densification Power Law: DPL) Dynamics of Dynamic Graphs (Time Evolving Graphs)
The graphs which changes with respect to time. They may either grow, shrink or split as the time passes. Shrinking Diameters: The small world phenomenon of graphs is well known. It was shown by researchers [3] that the average path length between two MSN messenger users is 6.6 . This can be considered a verification of the (internet version of the) widely known rule of “ six degrees of separation ” in (generic) social networks. Dynamics of Dynamic Graphs (Time Evolving Graphs) [3] J. Leskovec , E. Horvitz. Planetary-Scale Views on a Large Instant Messaging Network , WWW Conference, 2008.
The graphs which changes with respect to time. They may either grow, shrink or split as the time passes. Shrinking Diameters: It was further shown in [4], that the diameters of massive networks such as the web continue to shrink over time . The edges are added more rapidly to the network than nodes . As more edges are added to the graph it becomes possible to traverse from one node to another with the use of a fewer number of edges. Dynamics of Dynamic Graphs (Time Evolving Graphs) [4] J. Leskovec , J. Kleinberg, C. Faloutsos . “ Graphs over time: Densification laws, shrinking diameters and possible explanations .” ACM KDD Conference , 2005.
McGlohon et al. in 2010 [5] noticed while studying the effective diameter of the graphs, that There is often a point in time ( Gelling Point ) when the diameter spikes. Before that point, the graph typically consists of a collection of small, disconnected components. This “ gelling point ” seems to also be the time where the Giant Connected Component (GCC) forms and “takes off,” in the sense that the vast majority of nodes belong to it, and, as new nodes appear, they mainly tend to join the GCC, making it even larger. We shall refer to the rest of the connected components as “NLCC” (non-largest connected components). . Dynamics of Dynamic Graphs (Time Evolving Graphs) [5] Mary McGlohon . “ Structural analysis of networks: Observations and applications .” Ph.D. thesis, 2010.
McGlohon et al. in 2010 [5] also noticed that the weighted graphs also follow the (weighted) Power law. Weight Power Law (WPL): Let E(t) , W(t) be the number of edges and total weight of a graph, at time t . Then, they follow a power law W(t) = E(t) w , where w is the weight exponent. Dynamics of Dynamic Graphs (Time Evolving Graphs) [5] Mary McGlohon . “ Structural analysis of networks: Observations and applications .” Ph.D. thesis, 2010.
The networking community has studied the structure of the Internet for a long time, Tauro et al. in 2001 [6] had found out Jellyfish structure at the AS (Autonomous system ) level. This consists of: A core , consisting of the highest-degree node and the clique it belongs to; this usually has 8–13 nodes. Layers around the core, organized as concentric circles around the core; layers further from the core have lower importance. Hanging nodes , representing one-degree nodes linked to nodes in the core or the outer layers. The authors find such nodes to be a large percentage (about 40–45%) of the graph. The Structure of internet [6] Sudhir L. Tauro et al., “ A simple conceptual model for the Internet topology” , 2001. IEEE Computer Society Press.
Broder et al. in 2000 [7] find that the Web graph is described well by a “ bowtie ” structure. They find that the Web can be broken into four approximately equal-sized pieces. The core as Strongly Connected Component (SCC) IN component . OUT component . TENDRILS . Disconnected Components. The Structure of WWW [7] Andrei Z. Broder, et al. “ Graph structure in the web: experiments and models” . 2000. ACM Press.
Strongly Connected Component (SCC) of the graph: Each node in the SCC has a directed path to any other node in the SCC. IN component : Each node in the IN component has a directed path to all the nodes in the SCC. OUT component , Each node can be reached by directed paths from the SCC. TENDRILS There are webpages which can reach some pages in OUT and can be reached from pages in IN without going through the SCC; Occasionally, a tendril can connect nodes in IN and OUT; the tendril is called a TUBE in this case. The remainder of the webpages fall in disconnected components. The Structure of WWW [7] Andrei Z. Broder, et al. “ Graph structure in the web: experiments and models” . 2000. ACM Press.