Emonhub error log

Submitted by Paul Reed on Thu, 15/10/2015 - 21:44

On 3 occasions over the past 2 days, my v9.0 Raspberry Pi installation has stopped updating all feeds supplied via emonhub (MQTT feeds continue OK), and I've had to reboot, to get them working again.

To try and find out what was going on, I set the emonhub log to debug, and the attached log extract resulted a few hours later.

The first section of the log shows a normal handshake with emoncms, but the second clearly is causing some concern, and which emoncms does not recover from thereafter. Could it be interference as the RSSI is -100? if so, is there any way to screen/reject it?

Paul

Re: Emonhub error log

Submitted by pb66 on Thu, 15/10/2015 - 23:00.

Hi Paul

Try updating the rfm2pi firmware, there should not be any instance that a node id should be passed to emonHub that is higher than 31. the line showByte(rf12_hdr & 0x1F); was amended to include the "& 0x1F" mask to prevent acks etc causing a problem.

That "correction" will prevent the node id of 157 slipping through to emoncms and causing the error you see. I don't think emoncms is needing a restart, what is happening is because emoncms is not replying with "ok" upon receipt of the out of scope node, emonHub cannot delete the "successfully posted" packet and gets stuck retrying, each time rejected. You can test this theory by restarting emonhub or by forcing the reporter to rebuild by adding a "#" in front of the "Type = " line in emonhub.conf for that reporter, save and remove the "#" again,

I believe this is new behavior by emoncms, I think previously it was just accepting the payload and discarding out of scope stuff, fault feedback is great IF the receiving software is geared up for it and it doesn't cause further problems, This will become an issue if any single erroneous packet can cause an immediate halt to any and all data from emonHub.

However this "correction" to the rfm2pi firmware will of allowed that same packet through to emoncms with a node id of 29, which you will have to delete if you don't want it (assuming 29 isn't an active node), I believe that is a better outcome than a complete halt in data, but to remove the interference is a better solution,

I need to think on this a bit more, yes it would be easy to add a filter to eliviate your issue, if you keep the "faulty" firmware, but I feel this is papering over the cracks rather than tackling the issue. Plus I have previously put a <32 check on the node id and then we removed it again as it is user set in emoncms so a node id of over 32 is quite valid.

There is also a setting in settings.php I recall that enables/disables verbose error replies from emoncms is that active? I did ask Chaveiro do default that to off as a large number of users already have emonHub and I have seen several instances that a warning has been replied along with "ok" which will still stop emonHub deleting and moving on and it will then retry to post the same data until restarted.

Paul

Re: Emonhub error log

Submitted by Paul Reed on Fri, 16/10/2015 - 10:39.

Ok, updated RFM69Pi firmware & changed;

 $display_errors = true;

to false in settings.php, so we'll see if that helps.

Paul

Re: Emonhub error log

Submitted by pb66 on Fri, 16/10/2015 - 11:00.

My fingers are crossed :-)

Just to add to this emonhub could, as I mentioned have a <32 filter added but both the JeeLib rfm69 and LowPowerLabs libs use upto ~255 node ids so it would not be a long term fix.

I believe there is a rssi threshold in the mix now (or coming soon) in JeeLib so we should be able to filter all below say -80db in the rfm2pi firmware, in fact I believe -80db threshold is already in use in the JeeLib "RFxConsole" examples

I hope the "display_errors = false;" stops all messages as unfortunately the verbose error messaging from emoncms effectively breaks the emonhub to emoncms compatibility, even if the previous warnings that I encountered alongside an "ok" could be handled by changing what emonHub expects to hear from emoncms to includes "ok" rather than equals "ok" but that makes emonhub and any data in-transit vulnerable to misinterpreting any error message containing "ok" as a successful post (eg "took too long timed-out") and deleting the buffered data. Since in mant instances emoncms relays the error mesage it encounters there is no telling what could be returned to emonHub, So this isn't really a viable solution either.

Paul

Archived Forum