[ntp:questions] Regular spike

Discussion:

[ntp:questions] Regular spike_detect on syslog

Sean Austin Critica

2018-09-14 08:50:47 UTC

Hello,

I'm new to configuring NTP, and am having a bit of trouble figuring out why
there are regular spike_detect messages on my syslog.

I already checked the connection to the NTP server and it is stable with no
packet drops.
Additionally, that same NTP server is used by another machine on the same
network and it is synchronizing fine. So it seems isolated to this
particular machine.

I find one particular sequence curious. From 17:30 to 18:53, there was a
positive spike, followed by a negative spike, followed by a positive spike
interleaved with clock_sync.
Can you give me some hints regarding what may be causing this?

Jun 5 14:48:32 info localhost ntpd[13659]: Listen normally on 48 bond0
10.119.218.249 UDP 123
Jun 5 14:48:32 info localhost ntpd[13659]: Listen normally on 49 lo ::1
UDP 123
Jun 5 14:48:32 info localhost ntpd[13659]: Listen normally on 50 bond0
fe80::217:a4ff:fe77:c38 UDP 123
Jun 5 14:48:32 info localhost ntpd[13659]: Listening on routing socket on
fd #68 for interface updates
Jun 5 14:48:32 info localhost ntpd[13659]: 0.0.0.0 c016 06 restart
Jun 5 14:48:32 info localhost ntpd[13659]: 0.0.0.0 c012 02 freq_set
kernel 0.059 PPM
Jun 5 14:50:21 info localhost ntpd[13659]: 0.0.0.0 c615 05 clock_sync
Jun 5 15:03:36 info localhost ntpd[13659]: 0.0.0.0 0618 08 no_sys_peer
Jun 5 15:07:55 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.269228 s
Jun 5 15:16:37 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.303049 s
Jun 5 15:16:37 info localhost ntpd[13659]: 0.0.0.0 0614 04 freq_mode
Jun 5 15:16:38 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 15:31:55 info localhost ntpd[13659]: 0.0.0.0 c612 02 freq_set
kernel -186.040 PPM
Jun 5 15:31:55 info localhost ntpd[13659]: 0.0.0.0 c615 05 clock_sync
Jun 5 15:42:11 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.147942 s
Jun 5 15:56:51 info localhost ntpd[13659]: 0.0.0.0 0618 08 no_sys_peer
Jun 5 16:02:50 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.338653 s
Jun 5 16:02:50 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 16:02:51 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 16:06:14 info localhost ntpd[13659]: 0.0.0.0 0628 08 no_sys_peer
Jun 5 17:30:59 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.130410 s
Jun 5 17:33:09 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 17:47:21 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
-0.136758 s
Jun 5 18:01:30 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 18:53:59 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.165088 s
Jun 5 19:01:40 info localhost ntpd[13659]: 0.0.0.0 0618 08 no_sys_peer
Jun 5 19:03:51 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.306834 s
Jun 5 19:03:51 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 19:03:52 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 19:31:08 info localhost ntpd[13659]: 0.0.0.0 0628 08 no_sys_peer
Jun 5 19:58:28 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.130703 s
Jun 5 20:05:00 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.281475 s
Jun 5 20:05:00 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 20:05:01 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 20:34:30 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
-0.167769 s
Jun 5 20:36:40 info localhost ntpd[13659]: 0.0.0.0 0618 08 no_sys_peer
Jun 5 20:40:00 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 20:53:04 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.184006 s
Jun 5 21:05:06 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.415902 s
Jun 5 21:05:06 info localhost ntpd[13659]: 0.0.0.0 0615 05 clock_sync
Jun 5 21:05:07 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 21:08:28 info localhost ntpd[13659]: 0.0.0.0 0628 08 no_sys_peer
Jun 5 21:32:26 info localhost ntpd[13659]: 0.0.0.0 0638 08 no_sys_peer
Jun 5 22:38:01 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.132145 s

Regards,
Sean Critica

Mike Cook

2018-09-15 20:55:04 UTC

Permalink

Post by Sean Austin Critica
Hello,
I'm new to configuring NTP, and am having a bit of trouble figuring out why
there are regular spike_detect messages on my syslog.
I already checked the connection to the NTP server and it is stable with no
packet drops.
Additionally, that same NTP server is used by another machine on the same
network and it is synchronizing fine. So it seems isolated to this
particular machine.
I find one particular sequence curious. From 17:30 to 18:53, there was a
positive spike, followed by a negative spike, followed by a positive spike
interleaved with clock_sync.
Can you give me some hints regarding what may be causing this?
<snip
kernel 0.059 PPM
Jun 5 14:50:21 info localhost ntpd[13659]: 0.0.0.0 c615 05 clock_sync
Jun 5 15:03:36 info localhost ntpd[13659]: 0.0.0.0 0618 08 no_sys_peer

I think that it is probably your servers clock. NTP is unable to decide if a potential peer is stable enough.

Post by Sean Austin Critica
Jun 5 15:07:55 info localhost ntpd[13659]: 0.0.0.0 0613 03 spike_detect
+0.269228 s
Jun 5 15:16:37 info localhost ntpd[13659]: 0.0.0.0 061c 0c clock_step
+0.303049 s

+64 PPM if my sums are OK

Post by Sean Austin Critica
Jun 5 15:16:37 info localhost ntpd[13659]: 0.0.0.0 0614 04 freq_mode
Jun 5 15:16:38 info localhost ntpd[13659]: 0.0.0.0 c618 08 no_sys_peer
Jun 5 15:31:55 info localhost ntpd[13659]: 0.0.0.0 c612 02 freq_set
kernel -186.040 PPM

Now -ve

This is anomalous . Not yet outside NTPs limit of 500ppm but modern hardware is usually much better.
Maybe your server is frequency shifting . Check your BIOS settings .
Are you using VMWARE or something like it. If it is not configured correctly virtual systems can get duff clock states.

< snip rest >

Post by Sean Austin Critica
Regards,
Sean Critica
_______________________________________________
questions mailing list
http://lists.ntp.org/listinfo/questions

People have only as much liberty as they have the intelligence to want and the courage to take.

Emma Goldman

Jan Ceuleers

2018-09-16 07:29:42 UTC

Permalink

Post by Mike Cook
Maybe your server is frequency shifting . Check your BIOS settings .

Sean: plug "bios spread spectrum" into your favourite search engine for
more info.
You want to disable that for accurate timekeeping.

Jan Ceuleers

2018-09-16 11:48:59 UTC

Permalink

I’m running on bare metal HP Gen9 blade. I have started looking at the
BIOS settings and as far as I can tell there are a lot of options that
affect CPU frequency and they are turned on.

I will check tomorrow if ‘Spread Spectrum’ is available and is turned
on as I don’t have access to the machine right now.

I assume that yours is a Class A device, where the need to resort to
spread spectrum is less (because the Class A EMC requirements aren't as
strict as the Class B requirements which apply to consumer devices), so
you may not find this option in the BIOS, in which case your problem is
caused by something else.

I do know that settings like Intel Turbo Boost, and dynamic power
control are on. Although HP support says that they are not aware of
any settings that impact NTP on the host.

Whereas these options do modulate the clock frequency, they do so with
the knowledge of the operating system. The OS is therefore able to take
that into account in its timekeeping (for example by selecting a
timekeeping clock source that remains constant notwithstanding Turbo
Boost and all the rest of it).

Jan Ceuleers

2018-09-16 14:34:24 UTC

Permalink

Can I directly observe these sources and see which ones are stable
(maybe by dumping them periodically, remotely from a machine with a
known stable clock)? The OS in this case is RedHat EL 7.

"Directly observe": no. But you can look at the kernel log to see which
clock source the OS has selected:

# dmesg | grep clocksource
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 79635855245 ns
[    0.053021] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.181027] clocksource: Switched to clocksource hpet
[    0.198695] clocksource: acpi_pm: mask: 0xffffff max_cycles:
0xffffff, max_idle_ns: 2085701024 ns
[    0.906180] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
0x2879c5f06f2, max_idle_ns: 440795220049 ns
[    1.931401] clocksource: Switched to clocksource tsc

You can then look in your CPU's manual to determine whether the selected
clocksource is influenced by SpeedStep (or whatever dynamic CPU speed
changes are called these days).

HTH, Jan

Mike Cook

2018-09-16 15:14:10 UTC

Permalink

People have only as much liberty as they have the intelligence to want and the courage to take.

Emma Goldman

Mike Cook

2018-09-16 16:42:59 UTC

Permalink

Post by Jan Ceuleers

Can I directly observe these sources and see which ones are stable
(maybe by dumping them periodically, remotely from a machine with a
known stable clock)? The OS in this case is RedHat EL 7.

you might check out https://www.pantz.org/software/cpufreq/usingcpufreqonlinux.html

I don’t have Redhat but I expect similar commands are available for showing scaling info.
The webpage mentions a Redhead package cpuspeed.
You should be able to get speed change stats from that. Works for Debian with their cpufrequtils commands.

It would be useful to check that you do not have any processes other than ntpd running or scheduled by cron, that set the clock.
Also check the clock drift rate without ntpd running. Best to do this after cold boot in case previous frequency mods are active.

Post by Jan Ceuleers
"Directly observe": no. But you can look at the kernel log to see which
# dmesg | grep clocksource
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
0xffffffff, max_idle_ns: 79635855245 ns
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.181027] clocksource: Switched to clocksource hpet
0xffffff, max_idle_ns: 2085701024 ns
0x2879c5f06f2, max_idle_ns: 440795220049 ns
[ 1.931401] clocksource: Switched to clocksource tsc
You can then look in your CPU's manual to determine whether the selected
clocksource is influenced by SpeedStep (or whatever dynamic CPU speed
changes are called these days).
HTH, Jan
_______________________________________________
questions mailing list
http://lists.ntp.org/listinfo/questions

People have only as much liberty as they have the intelligence to want and the courage to take.

Emma Goldman