From 9642cd7ae24f0ba79ce5647c709b35ae8f06a285 Mon Sep 17 00:00:00 2001 From: Jasper Ras Date: Sun, 19 Jan 2025 21:14:51 +0100 Subject: vault backup: 2025-01-19 21:14:51 --- .../GroupVPS Platform/Add new provider networks.md | 112 +++++++++++++++++++++ .../Backup service/Backup verwijderen faalt.md | 52 ++++++++++ .../Compute VPS2-LEJ1 is mixed.md | 8 ++ .../Issues/High storage load 05-12-2024.md | 15 +++ .../GroupVPS Platform/Maintenance/10-12-2024.md | 5 + 2 Areas/GroupVPS Platform/OVN.md | 4 + 2 Areas/GroupVPS Platform/Our image updater.md | 9 ++ 7 files changed, 205 insertions(+) create mode 100644 2 Areas/GroupVPS Platform/Add new provider networks.md create mode 100644 2 Areas/GroupVPS Platform/Backup service/Backup verwijderen faalt.md create mode 100644 2 Areas/GroupVPS Platform/Compute VPS2-LEJ1 is mixed.md create mode 100644 2 Areas/GroupVPS Platform/Issues/High storage load 05-12-2024.md create mode 100644 2 Areas/GroupVPS Platform/Maintenance/10-12-2024.md create mode 100644 2 Areas/GroupVPS Platform/OVN.md create mode 100644 2 Areas/GroupVPS Platform/Our image updater.md (limited to '2 Areas/GroupVPS Platform') diff --git a/2 Areas/GroupVPS Platform/Add new provider networks.md b/2 Areas/GroupVPS Platform/Add new provider networks.md new file mode 100644 index 0000000..cd437fd --- /dev/null +++ b/2 Areas/GroupVPS Platform/Add new provider networks.md @@ -0,0 +1,112 @@ +#openstack #network +# Schematic on switch network + ![[Switch-network]] +# Procedure +Kevin configures the switches so that the public network is routed to the correct private network and sets up a VLAN. + +By now we should have a VLAN tag and a private subnet that will be used, for example we will use VLAN tag 150 as well as subnet 10.8.4.0/2. +# Make sure VLAN interface exists on network node +> For new network nodes this is done with Ansible, however for fear of disrupting live traffic we prefer to add additional ones on existing nodes by hand. + +Check whether an interface exists on the bond for the given vlan (e.g `bond0.150` given VLAN tag 150). + +If not add an entry in `/etc/network/interfaces` so it survives reboots: +``` +auto bond0.150 +iface bond0.150 inet manual + vlan-raw-device bond0 +``` +And then of course we add this interface with `sudo ifup bond0.150` with 150 being the VLAN tag we've been given. +# Create switch network on openstack +Define the switch network and OVN mapping in hieradata. Make sure to run Puppet on relevant controllers and network nodes. +```YAML +group/os-onecom-os1.yaml +profile::openstack::neutron::controller::networks: + switch-network-vps4-cph8: + provider_network_type: flat + provider_physical_network: switch-network-vps4-cph8 + router_external: true + shared: false + project_id: bb8fd38613c6464e8c00cbc332e2c67d + +domain/network.env.vps4-cph8.one.com.yaml +profile::openstack::neutron::ovn::controller::bridge_interface_mappings: + - 'ext-br150:bond0.150' +profile::openstack::neutron::ovn::controller::ovn_bridge_mappings: + - 'switch-network-vps4-cph8:ext-br150' +``` + +>When adding an external or public network openstack will automatically create a RBAC policy that allows any project to access it. Make sure it is removed: `openstack network rbac list --target-project '*'` will contain an entry with **object type network**. Show it, make sure it's the switch network, and delete it. + +>The Puppet module used for creating networks assigns the largest possible MTU to a network. We require it to be set to 1500. After changing the MTU to 1500 disable and enable DHCP so that the DHCP server also takes the configuration in effect. +# Create switch subnets on openstack +Once that's taken care of we can add the switch subnets to neutron via hieradata, usually a group yaml (e.g `group/os-onecom-os1) +```yaml +profile::openstack::neutron::controller::subnets: + switch-subnet-vps4-cph8-ipv4: + cidr: 10.8.4.0/24 + ip_version: 4 + allocation_pools: [ 'start=10.8.4.4,end=10.8.4.254' ] + gateway_ip: 10.8.4.1 + network_name: switch-network-vps4-cph8 + project_id: bb8fd38613c6464e8c00cbc332e2c67d + switch-subnet-vps4-cph8-ipv6: + cidr: 2a02:2350:a:105::/64 + ip_version: 6 + allocation_pools: [ 'start=2a02:2350:a:105::4,end=2a02:2350:a:105::ffff' ] + gateway_ip: 2a02:2350:a:105::1 + network_name: switch-network-vps4-cph8 + project_id: bb8fd38613c6464e8c00cbc332e2c67d + ipv6_address_mode: dhcpv6-stateful + ipv6_ra_mode: dhcpv6-stateful +``` + +> We want to have AZ reflected in the switch network name as shown in our example above "switch-network-vps4-cph8". Older switch-networks do not yet follow this convention. + +> We reserve the first three and last one IP in the pool of a given /24. Hence the allocation pool starts at .4 and ends at .254. These IP's are reserved for routers & switches, for example the .1 is assigned to the gateway. + +> When running Puppet on the controller node to create the subnet it can happen that it complains that the subnet overlaps with another. It might be caused because another controller is running Puppet at the same time and it created the subnet before your run. + +# Create network and subnet +In the correct group YAML we define the actual network and subnet that are to be used by tenants. +Example (`group/os-onecom-os1.yml`) +``` +profile::openstack::neutron::controller::networks: + hostnet_185_95_25: + router_external: false + shared: false + project_id: 5e9dbdce473543e093fb90c3db5cd8f3 + +profile::openstack::neutron::controller::subnets: + hostnet_185_95_25_ipv4: + allocation_pools: + - start=185.95.25.2,end=185.95.25.254 + cidr: 185.95.25.0/24 + dns_nameservers: + - 91.184.1.11 + - 91.184.8.21 + gateway_ip: 185.95.25.1 + ip_version: '4' + network_name: hostnet_185_95_25 + project_id: 5e9dbdce473543e093fb90c3db5cd8f3 +``` + +> Make sure these are added to the correct tenant project + +# Create router on openstack +`openstack router create --external-gateway switch-network-vps4-cph8 --fixed-ip subnet_id=switch-subnet-vps4-cph8-ipv4,ip-address=10.8.4.4 --fixed-ip=switch-subnet-vps4-cph8-ipv6,ip-address='2a02:2350:a:105::4' --disable-snat switch-network-vps4-cph8` +# Ensure reverse DNS zone +We should make sure the reverse DNS zone is added to the `service.g1-dns.one` zone to ensure PTR records can be added via SysAPI. +```shell +~ +❯ dig +short 25.95.185.in-addr.arpa DS +4550 13 4 6BFEE8B7692B15EC8EE01C17CF3F7FDD68F2F4A7581B7606A0CDB44A BDFE7BB171763C66938DFB285D4BF8680EA81B74 +4550 13 2 ADC65456F034323B3F1F3F010E637A04AB78B59D0176BE2B17702626 22B3AA39 + +~ +❯ dig +short 25.95.185.in-addr.arpa SOA +auth.g1-dns.one. hostmaster.one.com. 2024011601 1800 900 1209600 300 +``` +Should be via `service.g1-dns.one` rather than `auth`. We can make a ticket in SYSDNS to have it corrected; [example](https://group-one.atlassian.net/browse/SYSDNS-510). + +> Do mention in that ticket that we handle RIPE to prevent them from asking :) diff --git a/2 Areas/GroupVPS Platform/Backup service/Backup verwijderen faalt.md b/2 Areas/GroupVPS Platform/Backup service/Backup verwijderen faalt.md new file mode 100644 index 0000000..c7c42a1 --- /dev/null +++ b/2 Areas/GroupVPS Platform/Backup service/Backup verwijderen faalt.md @@ -0,0 +1,52 @@ +#groupone #openstack #backup-service #bug + +--- +# Summary +We openen libvirt connection alleen tijdens het starten van het proces. Daarna niet meer. Dus connection was closed door iets -> elk opvolgend request faalt. + +**Oplossing** +Connections e.d openen per request. + +# Onderzoek +```shell +2025-01-06 11:00:42.760 3562 INFO goba.cmd.agent [None req-3ac13f90-ebe4-482b-82ef-fded0df9be87 - - - - -] action='delete' type='backup' task_uuid='1957362f-44ec-475b-a5a1-96b53aa8be60' +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent [-] internal error: client socket is closed: libvirt.libvirtError: internal error: client socket is closed +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent Traceback (most recent call last): +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent File "/usr/lib/python3/dist-packages/goba/cmd/agent.py", line 137, in execute +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent execute_fn(ctx) +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent File "/usr/lib/python3/dist-packages/goba/cmd/agent.py", line 192, in execute_fn +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent backup.delete(req, self.storage, self.libvirt) +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent File "/usr/lib/python3/dist-packages/goba/backup.py", line 332, in delete +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent if not libvirt_client.is_running(instance_id): +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent File "/usr/lib/python3/dist-packages/goba/adapters/libvirt.py", line 134, in is_running +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent dom_state = self._get_domain(instance_id).state()[0] +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent File "/usr/lib/python3/dist-packages/libvirt.py", line 3146, in state +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent raise libvirtError('virDomainGetState() failed') +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent libvirt.libvirtError: internal error: client socket is closed +2025-01-06 11:00:42.763 3562 ERROR goba.cmd.agent +``` + +``` +[jasras@n14.compute.vps1-lej1 ~]$ systemctl status goba +● goba.service - Group.one OpenStack Backup Agent + Loaded: loaded (/lib/systemd/system/goba.service; enabled; vendor preset: enabled) + Active: active (running) since Thu 2024-12-12 06:26:50 UTC; 3 weeks 4 days ago + Docs: https://gitlab.group.one/groupvps/group-one-backup-agent + Main PID: 3562 (goba) + Tasks: 86 (limit: 4915) + Memory: 166.9M + CPU: 2h 23min 28.981s + CGroup: /system.slice/goba.service + ├─ 3562 /usr/bin/python3 /usr/bin/goba --config-file /etc/goba/goba.conf + └─14368 /usr/bin/python3 /usr/bin/privsep-helper --config-file /etc/goba/goba.conf --privsep_context goba.privsep.file_admin_pctxt --privsep_sock_path /tmp/tmpjfi8jt6b/privsep.sock +``` +Er draait hier nog een privsep-helper + +Na restart: +``` + CGroup: /system.slice/goba.service + └─30932 /usr/bin/python3 /usr/bin/goba --config-file /etc/goba/goba.conf +``` + +Tasks slagen nu wel. + diff --git a/2 Areas/GroupVPS Platform/Compute VPS2-LEJ1 is mixed.md b/2 Areas/GroupVPS Platform/Compute VPS2-LEJ1 is mixed.md new file mode 100644 index 0000000..65977e9 --- /dev/null +++ b/2 Areas/GroupVPS Platform/Compute VPS2-LEJ1 is mixed.md @@ -0,0 +1,8 @@ +#compute #openstack + +--- +VPS2-LEJ1 is a mixed bag of shared and local storage; +nodes 1-8 are BOTH shared and local storage +nodes 9-10 are exclusively shared storage + +.. wap diff --git a/2 Areas/GroupVPS Platform/Issues/High storage load 05-12-2024.md b/2 Areas/GroupVPS Platform/Issues/High storage load 05-12-2024.md new file mode 100644 index 0000000..7d26c6a --- /dev/null +++ b/2 Areas/GroupVPS Platform/Issues/High storage load 05-12-2024.md @@ -0,0 +1,15 @@ +#issue #groupone + +--- +Vraag uitgezet bij Allan voor meer informatie. +Allan: het komt al 1.5 maand voor; https://group-onecom.slack.com/archives/C02FT9KEFNH/p1729863978525299 + +Uit dat bericht blijkt dat Jerry zou kijken of het Acronis is. + +Jeroen gecontacteerd om ff te vragen of ze dat al gedaan hebben; jeroen denkt dat het niet door Acronis komt omdat dat vooral read ops zijn en verspreid over de nacht. + +Allan stuurt graph met kantteking: "seems to involve alot of LUNs": `https://prometheus2.env.vps1-cph8.one.com/graph?g0.expr=count(lun_write_ops%20%3E%20(lun_write_ops%20offset%2020m%20%2B%20100))&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=30m&g0.end_input=2024-12-05%2005%3A47%3A35&g0.moment_input=2024-12-05%2005%3A47%3A35` + +Jeroen dacht mogelijk mysql dumps van shared, maar zou ook verspreid moeten zijn: ging kijken wanneer die gdraaid hadden, niks meer over gehoord. + +O.b.v LUN graph heb ik wat LUNs gepakt en bleek allemaal managed vps te zijn; Jeroen op onderzoek naar wat daar binnen rond die tijd gebeurd. Verdenkt PSA crons. \ No newline at end of file diff --git a/2 Areas/GroupVPS Platform/Maintenance/10-12-2024.md b/2 Areas/GroupVPS Platform/Maintenance/10-12-2024.md new file mode 100644 index 0000000..c7f9824 --- /dev/null +++ b/2 Areas/GroupVPS Platform/Maintenance/10-12-2024.md @@ -0,0 +1,5 @@ +#maintenance + +--- +manually updated instance record host/node for instance `10924c62-7f0f-4df1-9dd8-9108e3cb0764` +suspended guest: virsh dompmwakeup and try again diff --git a/2 Areas/GroupVPS Platform/OVN.md b/2 Areas/GroupVPS Platform/OVN.md new file mode 100644 index 0000000..f53b84c --- /dev/null +++ b/2 Areas/GroupVPS Platform/OVN.md @@ -0,0 +1,4 @@ +#openstack #ovn + +--- +https://dani.foroselectronica.es/ovn-where-is-my-packet-665/g \ No newline at end of file diff --git a/2 Areas/GroupVPS Platform/Our image updater.md b/2 Areas/GroupVPS Platform/Our image updater.md new file mode 100644 index 0000000..73eff81 --- /dev/null +++ b/2 Areas/GroupVPS Platform/Our image updater.md @@ -0,0 +1,9 @@ +#groupone #openstack + +--- +https://gitlab.group.one/groupvps/openstack-image-updater + +Updater runs in CI. + +`images.yaml`; contains which images +`cloud_images.yaml`; can override params for all images in a specific cloud, sadly cannot override per specific image. -- cgit v1.2.3