Skip to content

Commit

Permalink
Finalized timer error handling (I think)
Browse files Browse the repository at this point in the history
  • Loading branch information
bsiever committed Jun 14, 2020
1 parent 53c421a commit ed19c7c
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 96 deletions.
171 changes: 77 additions & 94 deletions timeanddate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,35 @@

using namespace pxt;


// Enable debugging or not:
// #define DEBUG 1
//#define DEBUG 1

//From: https://github.com/ARMmbed/nrf51-sdk/blob/master/source/nordic_sdk/components/drivers_nrf/delay/nrf_delay.h
static void __INLINE nrf_delay_us(uint32_t volatile number_of_us) __attribute__((always_inline));
static void __INLINE nrf_delay_us(uint32_t volatile number_of_us)
{
register uint32_t delay __ASM ("r0") = number_of_us;
__ASM volatile (
".syntax unified\n"
"1:\n"
" SUBS %0, %0, #1\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" NOP\n"
" BNE 1b\n"
".syntax divided\n"
: "+r" (delay));
}

namespace timeanddate
{
Expand All @@ -28,125 +54,82 @@ namespace timeanddate
static uint64_t totalUs = 0;
#ifdef DEBUG
uint32_t retries = 0;
static uint32_t lastLastUs = 0;
#endif
uint32_t currentUs;
// Continue to get ticker values until they are valid
while(true) {
uint32_t currentUs = us_ticker_read();

// Try a read
currentUs = us_ticker_read();
// If it was near the end of the phase, read again
// (avoid non-atomic access / race condition error)
while((us_ticker_read() & 0xFFFF) > 0xFFC0) {
// Ensure timer before last 64uS of cycle
nrf_delay_us(100);
currentUs = us_ticker_read();
}
uint32_t newUs = currentUs - lastUs;
// Note: There appears to be a race-condition. I think it's related to atomic access to the ticker value
// and occasionally results in ticker values decreasing (Perhaps related to https://github.com/ARMmbed/mbed-os/issues/4026)
// Following "normal" rollover of the ticker counter values also decrease, so we can't simply check if currentUs>lastUs (it won't be on rollover).
// Instead we check the magnitude of the result. If it represents a large number then we discard it and don't update the total count.
// The update period needs to ensure that there will be multiple attempts again before a high-magnitude value would be valid.
// See notes / comment at the bottom of this file
// Prior attempt:
// if(newUs<4294000000u) {

// Sanity check: only proceed if it's NOT negative
if(!(newUs & 0x80000000)) {
// Only add if it's positive / valid...
// Only add if it's positive (time increased) / valid NOT if the high bit is set)...
totalUs += newUs;
#ifdef DEBUG
lastLastUs = lastUs;
#endif
lastUs = currentUs;
break; // Leave the loop
break; // Leave the loop on valid update
}
#ifdef DEBUG
else {
retries++;
// TODO: Should there be a fiber_sleep(1) here?
// I'm assuming that a nearly immediate re-try will not
// be a problem...but not sure. Yup...That was wrong. Trying the fiber thingy.
// The read isn't valid. Try again soon.
fiber_sleep(0);
// uBit.serial.send("Error\ncurrentUs=");
// uBit.serial.send((int)currentUs);
// uBit.serial.send("\nlastUs=");
// uBit.serial.send((int)lastUs);
// uBit.serial.send("\n");
}
#ifdef DEBUG
retries++;
// Additional debugging info:
uBit.serial.send("\nError");
uBit.serial.send("\nlastLastUs=");
uBit.serial.send((int)lastLastUs);
uBit.serial.send("\nlastUs=");
uBit.serial.send((int)lastUs);
uBit.serial.send("\ncurrentUs=");
uBit.serial.send((int)currentUs);
uBit.serial.send("\n");
#endif

}
}
#ifdef DEBUG
if(retries!=0) {
uBit.serial.send("Retries=");
uBit.serial.send((int)retries);
uBit.serial.send("\nFinal\nlastUs=");
uBit.serial.send((int)lastUs);
uBit.serial.send("\ncurrentUs=");
uBit.serial.send((int)currentUs);
uBit.serial.send("\nlastLastUs=");
uBit.serial.send((int)lastLastUs);
uBit.serial.send("\n");
}
#endif

return totalUs / 1000000;
}
} // namespace timeanddate



/*
The hypothesis is that failures happens by non-atomic access to a counter that rolls
over at a 16-bit boundary (because the examples below were all off by ~65k off)
Example Failure Cases for further review are below. These were caught by specifically looking for "large" failures via an if-statement.
Other small failures may have been missed. These show the values of the last timestamp before the error, currentUs, lastUs, the timestamp
after the error, and the difference between currentUs and lastUs (which whould be positive and ~65k uS / 2s at the sample rate being used)
0000-01-01 00:01.07
Error
currentUs=67175591
lastUs=67239935
0000-01-01 00:01.08
currentUs - lastUs => -64344
0000-01-01 00:01.22
Error
currentUs=82576484
lastUs=82640895
0000-01-01 00:01.23
currentUs - lastUs => -64411
The negative value could be caused by a rollover from one byte to the next being accessed non-atomically. For example, consider two successive
reads of a mult-byte piece of data that is divided into two halves, a HIGH and a LOW:
HIGH LOW
Imagine they are read and then approximately 65k uS later (2s or 65k counts) they are read again.
If the "LOW" has rolled over from a high value (0xFF...) to a low value (0x00...) but the carry has not
yet been added to HIGH when the read occurs, then the value of the second read would be lower than the
value of the first read, leading to an inconsistent result. The difference in values would be negative
rather than a modest positive value.
This may be related to: https://github.com/ARMmbed/mbed-os/issues/4026
Consider:
Sequence of 2-part reads
95 50
96 00 <- Invalid read. Read as 95 00
96 50
97 00
last = 95 50
current = 95 00 <- Error detected; No update to overall timer
(should have added 50)
last = 95 00
current = 96 50 <- Valid --- Adds 150 (should be up a net of 100, not 150)
If instead
last = 95 50
current = 96 50 <- Adds 100 (correct)
Here's the us_ticker_read():
https://github.com/lancaster-university/mbed-classic/blob/427f56863c9f9a9132a213ec583469dfc61c6244/targets/hal/TARGET_NORDIC/TARGET_MCU_NRF51822/us_ticker.c
Returns:
TMR1_UNITS_TO_MICROSECONDS(tmr1_getCounter64());
Which expands to:
tmr1_getCounter64() (Yeah, that's dumb / unnesc)
O.k....The 24-bit counter will overflow every 512s, so checks need to be done at least that often.
Reading the counter is done in: https://github.com/ARMmbed/mbed-hal-nrf51822-mcu/blob/master/source/us_ticker.c 's rtc1_getCounter64()
(it was called by us_ticker_read()). That clearly has a race or atomicity error. It checks overflow then gets the count (while the counter is
still running!!!!)
And tmr1_getCounter64() is an inline that returns:
return (((uint64_t)(overflowCount)) << 16) | (NRF_TIMER1->CC[2] & MAX_TMR1_COUNTER_VAL);
So, errors can happen every 512s (8.5min). They are due to a small atomicity problem. The error will occur in the lower 24-bits returned from rtc1_getCounter64()
being inconsistent (the total value won't appear to increase monotonically due to the non-atomic access at rollover. The Carry won't be added
to the upper bits yet while the counter will have rolled over)
NOTE THE 16-bit BOUNDARY!!!!
The problem would always exhibit as non-monotonic counts (a sample being lower than its predecessor), which is being detected / filtered out.
It's taking ~7ms to resolve (I think based on counts when fiber_sleep(1) is used). Which is a weird number.
*/
This is a non-atomic read problem. Just reject reads if NRF_TIMER1->CC[2] & MAX_TMR1_COUNTER_VAL is close to the
boundary (and of the phase / wrap around).
*/
4 changes: 2 additions & 2 deletions timeanddate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ namespace timeanddate {
leap = isLeapYear(y)
}

// sSinceStartOfYear and leap are now for "y", not "year". Don't yes "year"! Use "y"
// sSinceStartOfYear and leap are now for "y", not "year". Don't use "year"! Use "y"
// Find elapsed days
const daysFromStartOfYear = Math.idiv(sSinceStartOfYear, (24 * 60 * 60)) + 1 // +1 offset for 1/1 being day 1
const secondsSinceStartOfDay = sSinceStartOfYear % (24 * 60 * 60)
Expand Down Expand Up @@ -292,7 +292,7 @@ namespace timeanddate {
* @param day the day of the month 1-31
* @param the year 2020-2050
*/
//% block="set date to | Month $month | / Day $day | / Year $year"
//% block="set date to | month $month | / day $day | / year $year"
//% month.min=1 month.max=12 month.defl=1
//% day.min=1 day.max=31 day.defl=20
//% year.min=2020 year.max=2050 year.defl=2020
Expand Down

0 comments on commit ed19c7c

Please sign in to comment.