[ntp:questions] PPS not working on newer kernel/distros

Discussion:

Brandon Applegate

2018-10-08 23:26:45 UTC

Hello,

For years I’ve have ntpd+NMEA+PPS working great. My OS was Ubuntu server 12.04 - and then (currently) 14.04. GPS is a Garmin LVC 18x.

I use setserial to set low_latency, I’ve reduced the NMEA sentence to bare minimum, etc. All the usual tweaks. All of this going into a real UART serial PCI slot board (no USB) on a full size PC.

My relevant ntp.conf for this:

server 127.127.20.0 mode 1 minpoll 4 maxpoll 4
fudge 127.127.20.0 flag1 1 flag2 0 flag3 1 time2 0.600

I decided to try to upgrade, and hit nothing but roadblocks. So far I’ve tried both Debian 9 and Ubuntu 18.04. What I observe is that my GPS offset steadily climbs and climbs. It never seems to decrease. I’ve also tried the above config with flag3 set to 0 (“soft” PPS). In all of these cases I’ve verified with ppswatch that I have pps coming in on the port. It really seems like something has changed in the kernel in the past few years that’s causing this. It’s about the only thing I can think of as a variable. I’ve tried distro ntpd packages as well as compiling a few versions of the latest source. All with the same behavior.

Anyone have any ideas what could cause this behavior ?

--
Brandon Applegate - CCIE 10273
PGP Key fingerprint:
0641 D285 A36F 533A 73E5 2541 4920 533C C616 703A
"For thousands of years men dreamed of pacts with demons.
Only now are such things possible."

Joachim Fabini

2018-10-09 09:43:10 UTC

Permalink

Hello Brandon,

Which version of ntp is deployed on your system?
Any relevant/suspect output in syslog when restarting ntpd?
What does ntpq report?

First guess: please remember http://bugs.ntp.org/show_bug.cgi?id=3367 -
this was staged (only) for ntp-4.2.8p11. Make sure that either your ntp
has the patch already or patch your sources manually.

br Joachim

Post by Brandon Applegate
Hello,
For years I’ve have ntpd+NMEA+PPS working great. My OS was Ubuntu server 12.04 - and then (currently) 14.04. GPS is a Garmin LVC 18x.
I use setserial to set low_latency, I’ve reduced the NMEA sentence to bare minimum, etc. All the usual tweaks. All of this going into a real UART serial PCI slot board (no USB) on a full size PC.
server 127.127.20.0 mode 1 minpoll 4 maxpoll 4
fudge 127.127.20.0 flag1 1 flag2 0 flag3 1 time2 0.600
I decided to try to upgrade, and hit nothing but roadblocks. So far I’ve tried both Debian 9 and Ubuntu 18.04. What I observe is that my GPS offset steadily climbs and climbs. It never seems to decrease. I’ve also tried the above config with flag3 set to 0 (“soft” PPS). In all of these cases I’ve verified with ppswatch that I have pps coming in on the port. It really seems like something has changed in the kernel in the past few years that’s causing this. It’s about the only thing I can think of as a variable. I’ve tried distro ntpd packages as well as compiling a few versions of the latest source. All with the same behavior.
Anyone have any ideas what could cause this behavior ?
--
Brandon Applegate - CCIE 10273
0641 D285 A36F 533A 73E5 2541 4920 533C C616 703A
"For thousands of years men dreamed of pacts with demons.
Only now are such things possible."
_______________________________________________
questions mailing list
http://lists.ntp.org/listinfo/questions

Brandon Applegate

2018-10-09 17:04:05 UTC

Permalink

Post by Joachim Fabini
Hello Brandon,
Which version of ntp is deployed on your system?
Any relevant/suspect output in syslog when restarting ntpd?
What does ntpq report?
First guess: please remember http://bugs.ntp.org/show_bug.cgi?id=3367 -
this was staged (only) for ntp-4.2.8p11. Make sure that either your ntp
has the patch already or patch your sources manually.
br Joachim

Hey Joachim,

Actually I think you helped me out a few years ago when I first found this issue. You gave me a patch which I put against 4.2.8p6 I believe at the time. I was getting the kcbind error on Ubuntu 14.04.

So yes, now that I’ve installed 4.2.8p12 (that includes the patch) - I no longer get this error. The bad news though is that it seems like I never lock on to PPS. I never get an ‘o’ in my billboard. If I put fudge flag3 back to 0, I seem to be in business (I get ‘o’ quickly and my offset settles down to near 0). But (and I don’t have actual before/after graphs or hard data to confirm this) but it seems like it wavers a bit more now than it did previously on Ubuntu 14.04 with fudge flag 3 set to 1.

Nevertheless - I’d still like to understand a bit more about fudge flag 3 on this. It was working in Ubuntu 14.04, and doesn’t seem to work on Debian 9 (what I settled on now) or Ubuntu 18.04.

I’m not quite sure I understand what the real difference in this flag is ? Is one mode more accurate than the other ? Why did it work just fine in an older distro and no longer does ?

Also - thanks for your reply and help on this so far - it’s much appreciated.

--
Brandon Applegate - CCIE 10273
PGP Key fingerprint:
0641 D285 A36F 533A 73E5 2541 4920 533C C616 703A
"For thousands of years men dreamed of pacts with demons.
Only now are such things possible."

Brandon Applegate

2018-10-14 20:22:56 UTC

Permalink

In case of 'flag3 0' you are using userlevel PPS handling, in case of
'flag3 1' and proper kernel support (and proper timepps.h header file
and properly compiled ntpd sources) you are using kernel PPS.
To have proper kernel PPS support you need to have timepps.h header
file from git://github.com/ago/pps-tools.git installed in the right
place and then ntpd sources recompiled. timepps.h file from this
repository has properly implemented the time_pps_kcbind() function.

Hey Vita,

I think Iâm following what you are saying. I definitely have timepps.h in the right place and ntpd compilation / configuration confirms itâs found and happy.

What Iâm experiencing is that it will run with flag3 set to 1 - but it never seems to âlock onâ to PPS. Iâm wondering / asking if this could be a kernel issue. I know Iâm not giving kernel versions etc. and Iâm speaking in terms of linux distro version. I suppose I could nail down versions and try to dig to see if thereâs PPS changes that have been made (for the worse) over the years. I was just hoping perhaps someone on this list would know offhand of a âsmoking gunâ WRT linux kernel version / changes that has (adversely) affected kernel PPS.

Brandon Applegate

2018-10-14 21:59:49 UTC

Permalink

Post by Brandon Applegate
What Iâm experiencing is that it will run with flag3 set to 1 - but it never seems to âlock onâ to PPS. Iâm wondering / asking if this could be a kernel issue. I know Iâm not giving kernel versions etc. and Iâm speaking in terms of linux distro version. I suppose I could nail down versions and try to dig to see if thereâs PPS changes that have been made (for the worse) over the years. I was just hoping perhaps someone on this list would know offhand of a âsmoking gunâ WRT linux kernel version / changes that has (adversely) affected kernel PPS.

Well it might seem that I was a bit hasty - and didnât give ntpd time enough to sync on PPS. In the past this was a very rapid process. Since Iâve reinstalled and rebuilt it certainly takes longer. Iâve set flag 3 to 1 and it did sync on PPS (âoâ in billboard). It seems to be settling down now.

Vitezslav Samel

2018-10-15 05:51:53 UTC

Permalink

Hi!

What I’m experiencing is that it will run with flag3 set to 1 - but
it never seems to ‘lock on’ to PPS. I’m wondering / asking if this
could be a kernel issue. I know I’m not giving kernel versions etc.
and I’m speaking in terms of linux distro version. I suppose I
could nail down versions and try to dig to see if there’s PPS
changes that have been made (for the worse) over the years. I was
just hoping perhaps someone on this list would know offhand of a
’smoking gun’ WRT linux kernel version / changes that has
(adversely) affected kernel PPS.

Well it might seem that I was a bit hasty - and didn’t give ntpd time
enough to sync on PPS. In the past this was a very rapid process.
Since I’ve reinstalled and rebuilt it certainly takes longer. I’ve
set flag 3 to 1 and it did sync on PPS (‘o’ in billboard). It seems
to be settling down now.

To see if you are using kernel PPS use 'ntpq -c kerninfo' and search
for line starting with 'kernel status:'

in case of userspace PPS:

kernel status: pll nano

in case of kernel PPS:

kernel status: pll ppsfreq ppstime ppssignal nano

Vita

Brandon Applegate

2018-10-15 11:31:02 UTC

Permalink

Post by Vitezslav Samel
ntpq -c kerninfo

Hmm, thanks for the tip. I guess even though I have flag3 set to 1 Iâm 'falling back' to userland PPS ?

***@ice:~# ntpq -c kerninfo | grep ^kernel
kernel status: pll nano

So would this be a kernel issue causing this ?

--
Brandon Applegate - CCIE 10273
PGP Key fingerprint:
0641 D285 A36F 533A 73E5 2541 4920 533C C616 703A
"For thousands of years men dreamed of pacts with demons.
Only now are such things possible."

Miroslav Lichvar

2018-10-15 12:01:42 UTC

Permalink

Post by Vitezslav Samel
ntpq -c kerninfo

Hmm, thanks for the tip. I guess even though I have flag3 set to 1 I’m 'falling back' to userland PPS ?

It's not a userland PPS. The PPS timestamps are still from the kernel.
The difference is in the clock discipline. Instead of kernel using the
PPS samples directly to control the clock, they are processed by ntpd
and fed back to the kernel PLL/FLL loop at the normal update interval.

kernel status: pll nano
So would this be a kernel issue causing this ?

The kernel PPS discipline is not supported in the NOHZ "tickless"
configuration, which I think is enabled in all major Linux
distributions.

AFAIK the only thing that has changed in the last few years is how
ntpd handles the error when the PPS discipline cannot be enabled. It
used to be ignored.

--
Miroslav Lichvar