emonTxV3 3-phase monitor stops reporting after 20hours???

Hi,

i have loaded the emonTxV3_4_3Phase sketch on my emonTxV3 and after a period of 19-22hours after a reset it just stops reporting values to the Raspberry Pi. Transmission is done via RFM69CW.

While monitoring is running before it "hangs" its all just fine. effective power values in line with kWh readings, regular reportings are transmitted and logged to emoncms.org.

 

CT1,2,3 deployed, no CT4 used, so the CT4 commented out in the code and the corresponding adaptations as proposed in the code done for usage without CT4 (Samplesdelay, Phasecal)

 

Having tried now pretty all of the recommendations applied due to power instability of the grid here in Indonesia(reducing transmit power of radio module to -10dBm, supplying additional 5V DC power and removing jumper J2) but the result is always the same. Are there still any memory issues?

Setup is as follows:

emonTx V3.4

RFM69CW at 433MHz

9V AC power supply european type

sketch from:

https://github.com/openenergymonitor/emonTxFirmware/tree/master/emonTxV3...

raspberry Pi 2

Thanks for any suggestions or similar experience?

Gerry

 

JD's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

I have been running the previous sketch (emonTxV3_3Phase_Voltage_4CTs_Temp, no longer on GitHub) on an emonTXV3.3 for about 6 months without problems.

JD

emjay's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

@Gerry,

Do you have a "spare" device you can monitor the RF world with?  It will help to distinguish if the TX side has stopped sending or (more likely) the Rx side has stopped listening...

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi JD,

still can find the sketch on github.

will give it a try the next days and let you know if this works better.

thanks

Gerry

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi emjay,

thanks for your comment.

trouble is on the sending side because the raspberry PI receiving the data does not need to be reset.

it resumes receiving after the emontx was reset.

cheers gerry

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

How accurate is your "19-22 hours"? Has it never failed below 19 hours and has it never run for more than 22? Do you always restart it at about the same time of day? Does it always fail at about the same time of day?

I've been running a copy now for nearly 23 hours and so far it's fine. Mine has a 868MHz RFM69CW but it's working at 433 MHz.

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi Robert,

it could also be less than 19 hours. Often after a reset the monitoring keeps going for a few minutes only.

Attached file shows the reporting over the last week.

Yesterday, after another hang, I connected to the serial port of emontx and it was still writing the measured values regularly. So I conclude it's in the rfm69 transmission part where the problem seems to be lcoated.

Also I enabled the anti crash (restart) watchdog ( wdt_enable(WDTO_8S) ) in the code and uploaded it the UNO.

Let's see if that helps to keep it running.

Gerry

 

 

 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

What I immediately notice is, mostly it seems to stop between 4:45 and 5 am on 8 successive days (as best as I can estimate). Can that really be a coincidence? It makes me think there could be an external influence that's provoking it.

The transmit code was lifted straight out of JeeLib (minus the receive part) and simplified only in that a few levels of subroutine nesting were no longer necessary. So it shouldn't be any different to transmit-only sketches using JeeLib, except that the transmit power setting is readily accessible.

I only noticed yesterday that I hadn't put the watchdog back in, but if it is the transmit code that's not behaving but not hanging, that won't help.

And mine is still running after 36 hours now.

pb66's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Gerry, How have you determined it's "hanging". Is that confirmed or based just on the data seen in emonCMS?

You have found the emonTx is still running by connecting the programmer (although be wary there of what order you do things since most OS's seem to issue a reset on connection of the programmer, so you maybe reseting it during connection).

You have proved it's not the receiver as restarting it has no effect.

Restarting the emonTx seems to reset things so you conclude it's the rfm69. That maybe the case but unless you are sure no packets are sent you cannot be 100% and unless you have another receiver if difficult to test.

But you can do a couple of simple checks eg is the rfm2pi LED continuing to flash while the emonTx is "hung" or are there any clues in the emonhub.log during or immediately before the time it "hung".

There have been one or two cases with devices (usually rfm2pi which have a different set up) losing settings and a reset resets the settings eg node id group id frequency etc.could the emonTx still be sending packet but they not recognized as valid ?

There is also a chance of different size intervals slowly rotating at different times so that every n mins/hours/days they block each other, do you have other devices transmitting? (maybe not on same network)

What type of feed are you using in emoncms? if it's a fixed interval it could be a similar effect to the interval rotation above ie emonTx sending every 15s and fixed interval feed saving every 10 secs, some datapoints will have no data, this may not be obvious if it's the case as there only needs a couple of seconds difference to occur, this can be tested for by adding a phptimeseries feed in front of the "suspect" feed to check the arrival times etc

Paul

 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Though both are specified down to 1.8 V, I did wonder whether there's a supply brown-out at around 5 am when something big switches on that's going down just enough to make the RFM69 lose its marbles but which is not enough to cause the Atmel 328 to reset?

dBC's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Though both are specified down to 1.8 V...  supply brown-out... which is not enough to cause the Atmel 328 to reset?

The BOD on those AVR processors is pretty effective.  Hopefully the OEM production line doesn't disable it or set it low.  If they used the standard Arduino Uno board settings when flashing the bootloader, it should be set to 2.7V nominal (2.5V worst case).  That means the processor will be held in reset whenever the voltage drops below 2.7V and won't come out of reset until after it's above 2.7V.  You can check the contents of extended fuse bits to see what yours has actually been set to. 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

If Gerry is right and it is the radio that's locking up, then as long as the processor stops and resets at a higher voltage than the radio, it should recover and do a full reset - shouldn't it? That I suppose could depend on what happens to the reset line, but the radio will get the required registers set by the sketch at start-up. What we don't know is whether any default values get corrupted.

Mine has now been running for close to 60 hours non-stop, as far as I know (I'm not logging the output). It's certainly not locked up so far.

dBC's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

I'm not at all familiar with the radio module, but there'll be no indication on the /RESET line to say the BOD has reset the CPU... that line is only an input to the CPU, never an output.  

I always use a RESET_EXTERNAL_DEVICES signal that can be asserted by software to make sure that when the CPU resets (WDOG, BOD or power-up) it can also ensure all peripherals are in a known state.   This is particularly important in the case of watchdog resets.  A common cause of watchdog resets is a wedged peripheral device, so when the CPU resets it really needs a way to ensure all external hardware is also reset.

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

"but there'll be no indication on the /RESET line"

Because the reset line is pulled up by VCC, I'd have expected it to go down as VCC collapsed, and I was wondering how quickly it came back up. But I've just realised there's no reset to the RFM69, so I was barking up the wrong tree there.  It does have a reset pin, but it's the wrong sense and needs to be pulled high for 100 μs to work. Always assuming that it is a brown-out and it can be shown that the RFM is entering an unknown state as a result, "when the CPU resets it really needs a way to ensure all external hardware is also reset." would presumably mean driving the RFM69's reset from a AT328 output pin; or maybe it would be enough to program  every unchanged register to the default value in setup( ).

dBC's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

A very quick glance of the RFM69 datasheet reveals it has lots of BOD and glitch detection itself.  I've come unstuck with multiple BODs running in the one design... either the CPU gets reset and the peripheral IC doesn't, or the peripheral IC gets reset and the CPU still thinks it's set up just right.  Both are a bit messy to sort out in software, so I always use just the AVR BOD, and make sure the AVR is capable of yanking on RESET off all the external devices.

But I guess it's a bit late for that review now.  It looks like the RFM69 has a sensitive and insensitive setting for its BOD stuff, so playing around with that might reveal more clues.  And we still don't know what the AVR BOD is set to.  I'm  assuming 2.7V, but it may be set to 1.8V or even disabled.    Barring input from whoever flashed the bootloader, you could dump out the extended fuse bits to see which setting they chose.

If you're really keen you can monitor, count and store the AVR reset reasons which is what I ended up doing (see panel below), but I had to tweak the stk500 bootloader to support that.  I've not looked at optiboot.

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

We'll have to wait and see how Gerry responds to all this. At the moment, we're guessing. All I know is what is ostensibly the same sketch has been running here for 3 days now, so until there's some evidence to the contrary, it looks like the sketch is not the problem. Clearly I can't check anything on my emonTx until I stop it, and we've no proof that it's the same as his anyway.

emjay's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

@dBC,

Which datasheet are you referring to?  The RFM69CW-V1.1.pdf does not mention any BOD functionality.

It is present on a variant of the RF chip e.g. RegLowBat (0x0C), but AFAIK that does not match the RFM69CW.

 

dBC's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

oops, my bad.  I didn't stop to consider the possibility of variants and clicked on the first one I found.  Unfortunately I can't even find that now, but it did have quite a detailed description of the power-on reset process, plus a Vcc glitch detector.  There was also a bit in a register that would let you choose between one of two threshold voltages.  But it's all for nought if it's not the device in use.

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi

my apologies for not responding earlier to your posts.

A quick update:

after activating the watchdog it keeps reporting continuously. However a look at the serial port reveals that the setup loop is activated after every second regular loop report. So not an elegant solution and but an effective way to keep it running.

@Robert: Grid stability is an issue here and I have recorded minimum voltage drop to 183V. Attached a screenshot recording the voltage over the last week. However I use a DC power supply so I would assume it smoothens these drops out and furthermore I have seen the emontx going through voltage dips of 183V without any issue.

@Paul: I can rule out that interference due to another emontx is the cause or low RSSI values.

When connecting the emonTX to the serial port I don't think it resets the unit. I am using arduino 1.6.5 software and I have practiced this several times. If it was resetting the unit it would run through the setup function and print out the messages. So when connecting the serial port I haven't noticed the setup functions but directly the reporting in the loop().

Reporting is done for 5 seconds and on the emonCMS the feeds are averaged over 10sec period. So far I haven't had the time to look at the logfiles in emoncms. The unit is at my friends place, unfortunately I don't have a remote access. I will try to get some clues the next day.

So far for today. If you have further suggestions I would be happy to check these out.

Best regards

Gerry

 

 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

"I use a DC power supply so I would assume it smoothens these drops out..."

A dangerous assumption! It depends on the current drawn and how long the reservoir capacitor in the dc power supply holds the voltage up.

Where in the sketch have you put the watchdog reset? Remember that the maximum setting for the watchdog is 8 s, and the main loop sleeps for 10, therefore you can't put the watchdog reset directly in the main loop - you need to do something with

    delay(TIME_BETWEEN_READINGS*1000);

e.g. make it into a 'for' loop and reset the watchdog every 1000 ms.

Mine has run for 90 hours now, without a watchdog timer.

[Edit]
If you're using the shop 5 V USB adapter, the label says it will work down to 100 V. Powering a V3.2 emonTX, one I have here worked down to an impressive 29 V.

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi Robert,

thanks for your valuable comments.

Watchdog is at the end of the setup, not in the loop.

TIME_BETWEEN_READINGS I set to 5, so it sleeps for only 5 seconds.

Yes I use the DC power supply from the shop (EU type) so I conclude from from your statement that the power dips won't effect the power supply to RFM69 and emontx, right?

Cheers

gerry

 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

"the power dips won't effect the power supply to RFM69 and emontx, right"

Right - but only if they are dips, and not complete disappearances. I've no idea for how long that PSU will hold up in the complete absence of mains, but it's looking as if in all probability it would carry through all but the worst dip.

You initiate the watchdog ("wdt_enable(WDTO_8S);") in setup( ). You reset the watchdog ("wdt_reset();") in loop( ). If you're not resetting it, no wonder it's firing and restarting the sketch every 8 s. That's what it's supposed to do.

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Thanks Robert for the hint on resetting the watchdog. That explains the behavior. As I wasn't sure how that watchdog actual works, I wasn't aware of resetting it.

I came across the wdt_enable from another program and there isn't any reset made either.

best wishes

Gerry

 

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

"I came across the wdt_enable from another program and there isn't any reset made either."

An old engineering axiom: "If you're going to steal an idea, make sure it's a good one."

Robert Wall's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

I have had the sketch on soak test for a week or so both without and with the watchdog timer, and there's no evidence of it stopping. So until there's some more data, and bearing in mind that the failures happened at about the same time of day each day, I think I'm blaming something in the local environment.

Gerry's picture

Re: emonTxV3 3-phase monitor stops reporting after 20hours???

Hi Robert,

as said the environment is anything but consistent and stable here. Even the sockets where the power supply is plugged in are shaky. just last morning the system stopped reporting because the power supply went off.

The day before i realized that all values were negative. after checking on site i realized that someone must have disconnected and reconnected the 9V AC supply but in reverse polarity hence the negative reporting.

So as you can see i cannot rule out someone is fiddling around as well and the system setup might suffer from this.

A good day to all here in the community.

Cheers

Gerry

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.