iptables and Kubernetes

hongweiqiu 382 views 32 slides May 02, 2020
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

In this slide, we discussed the architecture of iptables and also showed how to implement your own IPTABLES module.
Upon the understanding of iptables, we implemented the DNS layer 7 parse in iptables module.
After that, we studied how Kubernetes service works and also explained why Kubernetes can&#...


Slide Content

IPTABLES & Kubernetes
HungWei Chiu

HungWei Chiu
•MTS @ ONF
•SDNDS-TW/CNTUG
•Linux/Network/Container/
Kubernetes
•Kuberentes Courses @Hiskio

IPTABLES Series
Introduction to IPTABLES
Learn IPTABLES by Docker environment.
Implementation of IPTABLES
User Space/Kernel Space
Implement our own iptables modules
Kubernetes Service discussion
Layer4 load-balancing, why ?
Modify the kernel module to make it support Layer7, really ?

IPTABLES/EBTABLES
Example
iptables -t nat -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
ebtables -t filter -I INPUT --log --log-prefix 'ctc/ebtable/filter-input' --log-level debug
Components
Chain. -> Insert/Append (-I/-A)
Table
Match (Module) -> build-in/module
Target (Module). -> build-in/module

Last Time

Demo Environment
ContainerA
Linux Bridge
Eth0
Eth0
Veth0
https://github.com/hwchiu/
network-study/tree/master/
iptables/k8s
Customize IPTABLES module
Drop DNS packet if its request
domain is what we expect
Layer 7 processing.

Architecture
UserSpace
Parameter, usage
Kernel space
Implementation
Communicate by
getsockopt
setsockopt

User Space
Repo: git.netfilter.org/iptables
Version: 1.6.1 (In my demo)
Includes
iptables commands, modules,
Parses parameters and then send out to kernel
Written by C language

User Space
Directory
extension
old/new style
iptables
Extension
Parse the argument and then store to a memory.
iptables send data to kernel space later.

Implement
Implement a basic module
Support one argument
domain (string)
Name-> hwchiu
Output
Iptables -L
Iptables-save

Kernel Space
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Version: 4.15 (In my demo)
Whole Linux kernel
Implement the function (match/target)
Build as build-in function or kernel module.
Written by C language

Kernel Space
net/netfilter/
Implement the function then register to kernel.
Be called when someone(iptables) calls it by
setSockOpt
getSockOpt

DNS Format (RFC 1035)

Summary
You can inspect the packet in iptables modules.
Do anything you want
Coding in kernel space

Kubernetes Service
Why People called it a Layer 4 Load-Balancer ?
We could do layer7 parser in iptables.

Kubernetes
Service
ClusterIP
NodePort
LoadBalancer
Headless

Demo Environment
Kubernetes
Pod
Service
ClusterIP
Deployment
Three Pods
Pod Pod
ClusterIP

Workflow
PRE_ROUTING
Packets
KUBE-SERVICES KUBE-SVC-XXX KUBE-SEP-XXX
DNAT
Jump
Jump if match protocol/clusterIPJump if random module return true
Match protocol
Module: TCP/UDP Module: Random
Choose ENDPOINTS
OUTPUT

Again

Random
If P < 0.25
If P < 0.33
If P < 0.5
Endpoint3 Endpoint4
Endpoint2
Endpoint1
Endpoint1
Endpoint2
Endpoint3
Endpoint4
10.244.1.3
10.244.2.32
10.244.3.63
10.244.3.23
Request
P=1/4 = 0.25
P= 3/4 * 1/3 = 0.25
P= 3/4 * 2/3 * 1/2 = 0.251-0.75 = 0.25

Workflow
PRE_ROUTING
Packets
KUBE-SERVICES KUBE-SVC-XXX KUBE-SEP-XXX
DNAT
Jump
Jump if match protocol/clusterIPJump if random module return true
Match protocol
Module: UDP Module: Random
Choose ENDPOINTS
dest=10.96.121.30
dest=10.96.121.30
dest=10.96.121.30
dest=10.244.0.3

Kubernetes Service
Two types of load-balancer
Client <----> LB <----> Server
Two connections
Client <----LB-----> Server
One connection.

Kubernetes
Conntrack help to cache the NAT result.
Packets will skip NAT table if there is a conntrack entry for it.
For TCP packets
Kubernetes did DNAT in three-way handshake.
Without application data
For UDP packets
No three-way handshake

Iptables
Layer 7 Firewall (IPS/IDS)
Nfqueue,NDPI
Layer 7 load-balancer
Connectionless protocol (limited)

(Demo)Kubernetes
Modify the statistic modules to support UDP load-balancing.
Just for fun, to demonstrate how can we do from iptables.
Parameters (/proc)
ClusterIP (focus on specific service)
Content, PodIP (forward to PodIP if it's data include content)
PodIP list (to know where we are)
Match workflow (limited by Random and iptables structure)
Rollback if it's TCP packet.
Rollback if it's destination IP != ClusterIP
Return true if (1) data contains content, (2) PodIP equals iptables rule's target.
Else, return false

One Question
If we send packets to ClusterIP from Kubernetes node, what
happen?
UDP (Statistic + UDP)
How about ARP?

Again

Socket Programming (TCP)
https://wrytin.com/ramathakur/tcpip-reference-model-jvb79y39

(fs/read_write.c) SYSCALL_DEFINE3(....)
(fs/read_write.c) vsf_write(....)
(fs/read_write.c) do_sync_write(....)
(net/socket.c) sock_aio_write(....)
(net/socket.c) __sock_sendmsg(....)
(security/security.c) security_socket_sendmsg(...)
(net/socket.c) __sock_sendmsg_nosec(...)
(net/ipv4/af_inet.c) inet_sendmsg(....)
(net/ipv4/tcp.c) tcp_sendmsg(...)
(net/ipv4/tcp.c) __tcp_push_pending_frames(...)
(net/ipv4/tcp.c) tcp_push_one(...)
(net/ipv4/tcp.c) tcp_write_xmit(...)
(net/ipv4/tcp.c) tcp_transmit_skb(...)
sk_buff

(net/ipv4/ip_output.c) ip_queue_xmit(...)
(include/net/route.h) ip_route_output_ports(...)
(net/ipv4/route.c) ip_route_output_flow(...)
(net/xfrm/xfrm_policy.c) xfrm_lookup(...)
(net/ipv4/ip_output.c) ip_local_out(...)
(include/net/dst.h) dst_output(...)
(net/ipv4/ip_output.c) ip_output(....)
(net/ipv4/ip_output.c) ip_finish_output(...)
(net/ipv4/ip_output.c) ip_fragment(...)
(net/ipv4/ip_output.c) ip_finish_output2(...)
(include/net/neighbour.h) neigh_output(...)
(net/core/dev.c) dev_queue_xmit(...)
(net/core/dev.c) dev_hard_start_xmit(...)
(net/core/neighbour.c) neigh_resolve_output(...)
(net/ipv4/ip_output.c) __ip_local_out(....)
nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT...)
Call IPTABLES
(net/ipv4/ip_output.c) __ipv4_neigh_lookup_noref(....)
(include/net/neighbour.h) __neigh_create(...)
Lookup ARP Table
Create ARP Record
sk_buff

One more thing
COSCUP CFP
https://j.mp/2WmOemq
Cloud Native Hub
Telegram: https://t.me/cntug
Github: https://github.com/cloud-native-taiwan/meetups
MyBlog: https://www.hwchiu.com

Q&A