Anthos 1.2 Time Sync issues on GKE On-Prem nodes

When you login into your GKE-OP nodes you might find out the time is synced with your ESXi host rather then with the Timeserver configured in you DHCP Options or Static IP files used for GKE-OP clusters provisioning.

This issue is actually related to Ubuntu 18 and is connected with settings of timesyncd service.

To see if you are experiencing the issue run

sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-timesyncd

ubuntu@gke-03-user0103:~$ sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-timesyncd
Failed to create state directory: Permission denied
ubuntu@gke-03-user0103:~$ sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-timesyncd
Added new server 172.20.101.25.
Added new server ntp.ubuntu.com.
Selected server 172.20.101.25.
Resolving 172.20.101.25…
Resolved address 172.20.101.25:123 for 172.20.101.25.
Selected address 172.20.101.25:123 of server 172.20.101.25.
Connecting to time server 172.20.101.25:123 (172.20.101.25).
Sent NTP request to 172.20.101.25:123 (172.20.101.25).
Server has too large root distance. Disconnecting.
Waiting after exhausting servers.

Root cause: there is network delay that causes timeout for the response from NTP

Solution: There is no permanent solution for this issue as the settings of NTP are created when Nodes are deployed using the DHCP or StaticIP files. You can only fix this issue after your nodes are deployed. The settings will be lost when you redeploy.

To workaround this issue set edit the timesyncd.conf file and set RootDistanceMaxSec=20 (you might need to find our the honey spot)

sudo cat /etc/systemd/timesyncd.conf

[Time]

#NTP=

#FallbackNTP=ntp.ubuntu.com

RootDistanceMaxSec=20

#PollIntervalMinSec=32

#PollIntervalMaxSec=2048

Now you should check if connection works fine

ubuntu@gke-03-user0103:~$ sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-timesyncd
Added new server 172.20.101.25.

freq offset : +0 (0 ppm)
interval/delta/delay/jitter/drift 64s/+0.033s/0.001s/0.000s/+0ppm
Synchronized to time server 172.20.101.25:123 (172.20.101.25).

Summary of 2019

It has been a great year for me with some major goals achieved. I am very thankful for all that have made this come true!

Book – special thanks to @Brian Gerrad co-author. Only you know what was the true cost of writing this book 🙂

Certificationstranscript

  • Professional Cloud Architect ’18
  • Professional Data Engineer ’18
  • Associate Cloud Engineer 
  • Professional Cloud Developer 
  • Professional Cloud Network Engineer
  • Professional Cloud Security Engineer 
  • Professional DevOps Engineer (Beta results pending)
  • There is one more that will be announced in January… cannot wait!

Conferences

  • BitConf Speaker – link
  • vBrownBags Speaker – link
  • Google Next San Francisco ’19
  • Goole Next London ’19
  • Google Developer Group Leads Lisbon
  • GSI Champions Conference in Sunnyvale

Google Developer Group Cloud Bydgoszcz

5 Meetups this year with around 50 participants each!

  • 4 Onsite Meetups
  • 1 Online Meetup

AtoS

  • GSI Champion
  • Google Cloud Platform Learning Ambassador
  • Start developing Anthos on DPC/DHC
  • Decided to stay with the company having a proposal to work in one of my top 5 companies to work for.

Missed goals

  • GCP Certified Trainer – lack of time
  • Google Developer Expert – builiding protfolio
  • Cloud Guru Instructor – lack of time

Goals 2020

Problems creating pre-check VM in Anthos 1.2 GKE-OP

With Anthos 1.2 there is a new feature that creates a test VM to check connectivities before you deploy your GKE-OP clusters. It helps to avoid issues during the installation.

When installing you GKE On-Prem using the following documentation: https://cloud.google.com/gke-on-prem/docs/how-to/install-dhcp you perform checks with the following commands

gkectl check-config --config [PATH_TO_CONFIG]

you will get an error as bellow:

  • Validation Category: F5 BIG-IP
    • [FAILURE] Admin Cluster VIP and NodeIP: Failed to create VM: failed to create VM (not retriable): failed to find VM template "gke-on-prem-osimage-1.14.7-gke.24mage-1.14.7-gke.24-20191120-f71f9a709b' not found
    • [FAILURE] User Cluster VIP and NodeIP: Failed to create VM: failed to create VM (not retriable): failed to find VM template "gke-on-prem-osimage-1.14.7-gke.24-age-1.14.7-gke.24-20191120-f71f9a709b' not found

Root cause: This is cause by the image not being present on the datastore. The installation steps in the GCP docs have wrong sequence.

Solution: run

gkectl prepare --config [CONFIG_FILE] --validate-attestations

After that the VMs get created and connectivity checks can be performed