CS224W Lecture 3 Node Embeddings

Introduction

traditional ml for graphs
- input-graph => feature-engineering => structured feature => learning algorithm => prediction
graph representation learning
- input-graph => ~~feature-engineering~~ representation learning => structured feature => learning algorithm => prediction
- goal: efficient task-independent feature learning
- task: map nodes into an embedding space
  - possible downstream tasks: node classification, link prediction, graph classification, anomalous node detection, clustering, …

random walk

similarity score approximates a probability that two nodes co-occur on a random walk over the graph
why random walk?
- random walk optimization is computationally expensive => use negative sampling
- how to solve optimization problem? => SGD
strategy to walk randomly
- simplest: just fixed-length, unbiased random walk -> DeepWalk
- issue: similarity is too constrained
- how can we generalize this? -> node2vec

goal: embed nodes with similar network neighborhoods close in the feature space
key observation: flexible notion of network neighborhood leads to rich node embeddings
- BFS strategy can capture local features
- DFS strategy can capture global features
hyperparameter for node2vec
- p: return back to the previous node
- q: in-out parameter, ratio of BFS vs DFS
biased 2nd-order random walks
- idea: remember where the walk came from
node2vec algorithm
1. compute random walk prob
2. simulate random walks of specific length starting from each node
3. optimize the node2vec objective using SGD

November 11, 2021

Tags: cs224w