240708_JW_labseminar[struc2vec: Learning Node Representations from Structural Identity].pptx

thanhdowork 70 views 18 slides Jul 08, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

struc2vec: Learning Node Representations from Structural Identity


Slide Content

struc2vec: Learning Node Representations from Structural Identity Jin-Woo Jeong Network Science Lab Dept. of Mathematics The Catholic University of Korea E-mail: [email protected] Leonardo F. R. Ribeiro, Pedro H. P. Saverese , Daniel R. Figueiredo KDD ’17

INTRODUCTION Motivation Introduction STRUC2VEC Properties Measuring structural similarity Constructing the context graph Generating context for nodes Learning a language model Complexity and optimizations EXPERIMENT AL EVALUATION Barbell graph Karate network Classification CONCLUSION Q/A

INTRODUCTION Motivation Nodes within a network typically have one or more functions that significantly determine their roles in the system. For example, individuals in a social network have social roles or positions, and proteins in a protein interaction network perform specific functions. Identifying these functions often leverages node and edge attributes, but a more challenging and interesting scenario arises when node function is defined solely by the network structure. In this context, node labels are not important; only their relationships with other nodes matter.

INTRODUCTION Introduction The key ideas of Assess structural similarity between nodes independently of node and edge attributes as well as their position in the network. Thus, two nodes that have a similar local structure will be considered so, independent of network position and node labels in their neighborhoods. Our approach also does not require the network to be connected, and identifies structurally similar nodes in different connected components. Establish a hierarchy to measure structural similarity, allowing progressively more stringent notions of what it means to be structurally similar. In particular, at the bottom of the hierarchy, structural similarity between nodes depend only on their degrees, while at the top of the hierarchy similarity depends on the entire network (from the viewpoint of the node). Generates random for nodes, which are sequences of structurally similar nodes as observed by a weighted random walk traversing a multilayer graph (and not the original network). Thus, two nodes that frequently appear with similar contexts will likely have similar structure. Such context can be leveraged by language models to learn latent representation for the nodes.  

STRUC2VEC Properties The distance between the latent representation of nodes should be strongly correlated to their structural similarity. Thus, two nodes that have identical local network structures should have the same latent representation, while nodes with different structural identities should be far apart. The latent representation should not depend on any node or edge attribute, including the node labels. Thus, structurally similar nodes should have close latent representation, independent of node and edge attributes in their neighborhood. The structural identity of nodes must be independent of its “position” in the network. (1) (2)

STRUC2VEC Measuring structural similarity Let denote the undirected, unweighted network under consideration with vertex set and edge set , where n = and its diameter Let denotes the ring of nodes at distance (hop count) exactly from in denotes the set of neighbors of and in general, denotes the ring of distance k Let denote the ordered degree sequence of a set of nodes By comparing the ordered degree sequences of the rings at distance from both and we can impose a hierarchy to measure structural similarity. In particular, let denote the structural distance between and when considering their k-hop neighborhoods (all nodes at distance less than or equal to and all edges among them). In particular, we define: where measures the distance between the ordered degree sequences and and In this paper, they measure the distance by using DTW   (1) (2)

STRUC2VEC Constructing the context graph denotes the original network and its diameter Let denote the multilayer graph where layer is defined using the -hop neighborhoods of the nodes Each layer is formed by a weighted undirected complete graph with node set , and thus edges. The edge weight between two nodes in a layer is given by: Since ,   (3)

STRUC2VEC Constructing the context graph We connect the layers using directed edges as follows Every vertex in layer is connected to the corresponding vertex in layer and where is number of edges incident to that have weight larger than the average weight of the complete graph in layer k. In particular:   (4) (5)

STRUC2VEC Generating context for nodes As in previous works, uses random walks to generate sequence of nodes Before each step, the random walk first decides if it will change layers or walk on the current layer (with probability q > 0 the random walk stays in the current layer). Given that it will stay in the current layer, the probability of stepping from node to node in layer is given by: where With probability , the random walk decides to change layers   (6) (7) (8)

STRUC2VEC Generating context for nodes  

STRUC2VEC Learning a language model In this paper, sequences are generated by biased random walks on the Multilayer graph . Given a node, Skip-Gram aims to maximize the likelihood of its context in a sequence, where a node’s context is given by a window of size centered on it.  

STRUC2VEC Complexity and optimizations where ; Instead of calculating the similarity between all pairs of nodes in each layer, focus only on the node pairs that are more likely to be similar. This reduces unnecessary computations and overall computational cost. Limit the number of layers used in the multilayer graph to reduce memory requirements. Use a fixed number of layers( ) instead of the entire graph's diameter( ) to construct the multilayer graph.   (10)

Barbell graph EXPERIMENTAL EVALUATION

Karate graph EXPERIMENTAL EVALUATION

Classification EXPERIMENTAL EVALUATION Airport dataset For each airport, we assign one of four possible labels corresponding to their activity. In particular, for each dataset, we use the quartiles obtained from the empirical activity distribution to split the dataset in four groups, assigning a different label for each group .  

Classification EXPERIMENTAL EVALUATION

Conclusion Conclusion We propose struc2vec, a novel and flexible framework designed to learn representations that capture the structural identity of nodes in a network. By evaluating the structural similarity of node pairs using a hierarchical metric based on the ordered degree sequence of nodes, and by utilizing a weighted multilayer graph to generate context, struc2vec effectively captures the structural identity of nodes. Our experiments demonstrate that struc2vec excels in capturing structural identity compared to state-of-the-art techniques like DeepWalk , node2vec, and RolX , by explicitly focusing on structural identity. Furthermore, struc2vec shows superior performance in classification tasks where node labels are more dependent on their role or structural identity. Different models tend to capture different properties, and our results highlight the importance of structural identity when considering possible node representations.

Q & A Q / A