Neutron结合OpenvSwitch网络部分浅析

Screenshots

  • 内涵最多的图

    CtNvz2

VM到虚拟交换机

TAP device

查看instance硬件信息

1
2
3
4
5
6
[root@node2 ~]# virsh list
Id Name State
----------------------------------------------------
14944 instance-00000070 running
18862 instance-0000002d running
18893 instance-00000076 running

以实例14944为例子,查看实例14944具体硬件信息

1
2
3
4
5
6
7
8
[root@node2 ~]# virsh edit 14944
<interface type='bridge'>
<mac address='fa:16:3e:52:43:c5'/>
<source bridge='qbr85bcbfb0-bc'/>
<target dev='tap85bcbfb0-bc'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

可以看到instance的虚拟网卡MAC地址为fa:16:3e:52:43:c5,连接到的bridgeqbr85bcbfb0-bctarget device也就是对应物理机上的虚拟显卡为tap85bcbfb0-bc

查看物理机上的网卡

1
2
3
4
5
[root@node2 ~]# ip a
42: tap85bcbfb0-bc: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast master qbr85bcbfb0-bc state UNKNOWN qlen 1000
link/ether fe:16:3e:52:43:c5 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc16:3eff:fe52:43c5/64 scope link
valid_lft forever preferred_lft forever

可以看到物理机上tap85bcbfb0-bc网卡的MAC地址为fe:16:3e:52:43:c5,对应的也就是instance 14944的虚拟网卡。

  • TAP设备主要是用来让宿主机可以接收到数据帧。

Linux Bridge

上面看到instance 14944的网卡连接的bridge是qbr85bcbfb0-bc
查看网桥

1
2
3
4
5
6
7
8
9
10
11
[root@node2 ~]# brctl show
bridge name bridge id STP enabled interfaces
qbr09d052bf-16 8000.b216e6c26e13 no qvb09d052bf-16
qbr263b3957-09 8000.d6a957a5b378 no qvb263b3957-09
tap263b3957-09
qbr2d4a7a05-ea 8000.2afc1c44c8a4 no qvb2d4a7a05-ea
tap2d4a7a05-ea
qbr85bcbfb0-bc 8000.c67808b20c2e no qvb85bcbfb0-bc
tap85bcbfb0-bc
qbrf010b8f6-df 8000.de5f220adb0c no qvbf010b8f6-df
tapf010b8f6-df

qbr85bcbfb0-bc上,有两个接口,一个是qvb85bcbfb0-bc,一个是tap85bcbfb0-bc。也就是说,instance 14944连接在虚拟网桥qbr85bcbfb0-bc上。

  • Linux Bridge类似一个集线器,用来过渡物理机到宿主机的数据帧。

VETH

查看物理机网络qvb85bcbfb0-bc网卡的详细信息

1
2
3
[root@node2 ~]# ip a | grep qvb85bcbfb0-bc
40: qvo85bcbfb0-bc@qvb85bcbfb0-bc: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP qlen 1000
41: qvb85bcbfb0-bc@qvo85bcbfb0-bc: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1450 qdisc noqueue master qbr85bcbfb0-bc state UP qlen 1000

很明显,这两个网卡有关联。

1
2
3
4
5
6
[root@node2 ~]# ethtool -S qvo85bcbfb0-bc
NIC statistics:
peer_ifindex: 41
[root@node2 ~]# ethtool -S qvb85bcbfb0-bc
NIC statistics:
peer_ifindex: 40

qvo85bcbfb0-bcqvb85bcbfb0-bc属于一对peer,可以理解为一个网线的两端,也就是说一根网线的一端qvb85bcbfb0-bc连接在虚拟网桥qbr85bcbfb0-bc上,而另外的一端qvo85bcbfb0-bc则是连接在OpenvSwitchbr-int上。

  • VETH可以直接理解为一个网线,一端连接在虚拟网桥上,另一端连接在虚拟交换机上,实现数据流的传递

OpenvSwitch虚拟交换机

查看OpenvSwitch的路由器的详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
[root@node2 ~]# ovs-vsctl show
4ece5774-21aa-4ff6-8e7e-343983396eee
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "vxlan-0a0a0a01"
Interface "vxlan-0a0a0a01"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="10.10.10.2", out_key=flow, remote_ip="10.10.10.1"}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port "qvoe0f204ea-12"
tag: 4095
Interface "qvoe0f204ea-12"
error: "could not open network device qvoe0f204ea-12 (No such device)"
Port br-int
Interface br-int
type: internal
Port "qr-764aecf0-56"
tag: 1
Interface "qr-764aecf0-56"
type: internal
Port "tapd87312fd-34"
tag: 1
Interface "tapd87312fd-34"
type: internal
Port "qvo09d052bf-16"
tag: 2
Interface "qvo09d052bf-16"
Port "qvof010b8f6-df"
tag: 2
Interface "qvof010b8f6-df"
Port "qvo85bcbfb0-bc"
tag: 1
Interface "qvo85bcbfb0-bc"
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "tap3287c19b-e3"
tag: 2
Interface "tap3287c19b-e3"
type: internal
Port "qvo2d4a7a05-ea"
tag: 2
Interface "qvo2d4a7a05-ea"
Port "qvo263b3957-09"
tag: 1
Interface "qvo263b3957-09"
Bridge br-ex
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port br-ex
Interface br-ex
type: internal
Port "enp4s0f1"
Interface "enp4s0f1"
Bridge br-vlan
Port br-vlan
Interface br-vlan
type: internal
ovs_version: "2.5.0"

上面的信息主要包含这么几点:

qvo85bcbfb0-bc连接在桥br-int上;
br-int通过int-br-ex和phy-br-ex这对配对接口相连;
br-ex中有物理网卡enp4s0f1接口,如果enp4s0f1网口连接外部网络设备交换机,则可以实现虚拟机br-ex和外网直通;
br-int和br-tun通过patch-int和patch-tun这对配对接口相连;
br-tun中的接口vxlan-0a0a0a01通过ip tunnel以及内部IP,连接到集群中另外一台物理设备上,实现不同宿主机上虚拟机网络的互通。
  • OpenvSwitch主要是通过不同功能的交换机之间的相互连接,从而实现虚拟机和外部网络的通信,进一步实现SDN,即软件定义网络。

上文中一些网络接口可以这么理解:

    软件+ID

比如说:

  qbr85bcbfb0-bc:qbr为qemu bridge + ID
  tap85bcbfb0-bc:tap为tap设备  + ID
  qvb85bcbfb0-bc:qvb为qemu virtual bridge  +ID
  qvo85bcbfb0-bc:qvo为qemu virtual OpenvSwitch +ID

DHCP

先查看到DHCP的空间位置

1
2
3
[root@hyhive01 ~]# ip netns ls
qdhcp-624031b6-981c-480c-94ee-f9459d1c07ed
qdhcp-d72f0c1e-cc46-4d8f-a918-60c0088cc60a

选取上面一个DHCP查看详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@hyhive01 ~]# ip netns exec qdhcp-624031b6-981c-480c-94ee-f9459d1c07ed ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
28: tap42681808-04: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
link/ether fa:16:3e:1c:f9:38 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.100/24 brd 192.168.100.255 scope global tap42681808-04
valid_lft forever preferred_lft forever
inet 169.254.169.254/16 brd 169.254.255.255 scope global tap42681808-04
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe1c:f938/64 scope link
valid_lft forever preferred_lft forever

可以看到的是IP地址会从tap42681808-04发送出去,并且网段为192.168.100.0/24网段

查看OpenvSwitchtap42681808-04的信息

1
2
3
4
5
Bridge br-int
Port "tap42681808-04"
tag: 3
Interface "tap42681808-04"
type: internal

tap42681808-04打的标签为3号,新建虚拟机的时候选择网卡网段,对应的就是tag

目前笔者比较好奇的是:多节点情况下,不同节点的tag ID只是本地有效,节点之间通过vxlan连接,而不是像交换机直接用trunk连接,每台交换机的vlan ID是互通的。

VROUTER

先查看VROUTER的网络位置

1
2
3
[root@hyhive02 ~]# ip netns ls
qrouter-8800b853-2938-4ecc-89ce-a7189f78ef0c
qdhcp-0be33604-c845-4e71-9e88-ba1facec20e9

查看VROUTER详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@hyhive02 ~]# ip netns exec qrouter-8800b853-2938-4ecc-89ce-a7189f78ef0c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
15: qr-0fd24acc-81: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
link/ether fa:16:3e:c0:e6:7e brd ff:ff:ff:ff:ff:ff
inet 192.168.0.1/24 brd 192.168.0.255 scope global qr-0fd24acc-81
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fec0:e67e/64 scope link
valid_lft forever preferred_lft forever
27: qg-ae1e7daf-58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
link/ether fa:16:3e:4c:e7:29 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.9/24 brd 192.168.10.255 scope global qg-ae1e7daf-58
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe4c:e729/64 scope link
valid_lft forever preferred_lft forever

查看VROUTER路由详细信息

1
2
3
4
5
6
[root@hyhive02 ~]# ip netns exec qrouter-8800b853-2938-4ecc-89ce-a7189f78ef0c route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 0 0 0 qg-ae1e7daf-58
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 qr-0fd24acc-81
192.168.10.0 0.0.0.0 255.255.255.0 U 0 0 0 qg-ae1e7daf-58

可以看到的是,路由器有两个接口,一个连接内部网络,图上的为192.168.0.0/24网段,一个为外部网络,图上为192.168.10.0/24,其中,内部网络可以连接多个,但是外部网络只能连接一个。外部网络可以是直接连接外网的br-ex网桥上的物理网卡接口;也可以是带有vlan ID的子网网段,相对应的,连接的外部网络设备上需要有带上该ID的vlan接口。

DHCP服务是如何跨国tag对虚拟机提供IP地址的?虚拟机的网络流量是如果可以从br-int流向br-ex再流向外网的?

先直接给答案:主要是通过iptables做转发,本篇主要讲解虚拟机通外网流量的流向,具体iptables是如何转发,通过iptables又可以实现哪些功能,后面还有一篇会详细讲解。