Previously on SucculentPi…
In the last blog, we built an IoT panopticon to perform seriously overkill monitoring of an unfortunate house plant. After building the solution and writing the blog, I kept an occasional eye on the data flow from the IoT device into AWS for a couple of weeks, and with no signs of trouble I eventually left it to quietly collect data.
Well, that was a mistake!
Recently I checked up on the system again, as I was planning to start looking at the next stages of the project soon, only to discover that it had died, silently, about a week after I stopped manually checking it regularly 🙄
Instead of a nice, orderly cadence of sending sensor data and an infrared photo every 5 minutes, data and photos had only been sent intermittently for about 6 weeks or so… and to rub salt in the wound, all of the GrovePi+ sensor data that was sent was garbage, with every sensor reading was either null, 0, or 65536
Clearly something was very wrong; time to investigate…
Hobbyist IoT Hardware – A Great Way To Lose Data
With everything on the AWS side working fine, I started investigating what was happening on the Raspberry Pi. It didn’t take long to find the problem.
Early on in the data capture Python code I instantiate the seeed_si114x module, which reads data from the sunlight sensor. During instantiation this module reaches out to the I2C bus on the GrovePi+ board and was receiving an intermittent error to this probe, leading to the code crashing. Trapping this error wouldn’t help however, as on further investigation everything connected to the GrovePi+ was returning junk data or errors!
When the GrovePi+ board stayed borked after a couple of reboots I got a bit worried; had it died a permanent death? Well…
A full, hard reboot fixed it. I had to physically remove power from the Raspberry Pi, give it a minute, then reconnect the power to revive the GrovePi+ board. A Cloudreach colleague mentioned that they’d seen similar issues with hobbyist IoT hardware in their projects, and that they need to hard power cycle their hardware every so often to keep it working.
So note to self; never use cheap IoT hardware in production grade systems!
Also, how the heck do I automate physically disconnecting and reconnecting the power periodically to keep the rig running?
An Unexpected Journey… to IKEA!
Initially this had me stumped; I mean, it’s not like I could build some sort of power plug disconnecting robot… 🤔maybe it could use a pair of cams to sandwich the power plug and apply pressure to remove and reinsert it… the motors would have to be super quiet, my apartment’s not that big… or perhaps a linear actuator to push the plug out of the socket…
Then a light bulb blinked on. As in literally, a light bulb came on!
It was getting dark so I tapped one of my handy IKEA Tradfri shortcut buttons to turn on my living room lights, illuminating both the room and how massively I’d been over-thinking this one. I have an IKEA Smart Home system, which supports scheduled events, and they sell a smart plug to switch things off and on 💡
One quick trip to the local IKEA later, and I’d acquired a shiny new Tradfri smart plug. Once linked to my system, I set up a scheduled event to switch that plug off at 00:01 every Sunday, and on again 60 seconds later:
I selected the times to fit around the data collection schedule. Switching off at 00:01 should allow time for the scheduled data collection at 00:00 to complete, as the code only takes a few seconds to run. And powering up again at 00:02 should allow enough time for the RaspberryPi to fully boot up before the next scheduled data collection at 00:05. A couple of tests cycles showed that the timing worked, so hard power cycles fully automated, no robots necessary 😁
Monitoring Timestream with Lambda, Just In Case
Just in case the GrovePi+ board dies again, despite my IKEA-based rebooting solution, I decided to build some special monitoring to alert me if the sensor data is missing or garbage again.
There’s a couple of approaches we could take here. Monitoring the data flow through AWS IoT Core, to check the Raspberry Pi is connecting and sending data on schedule, would be one. However this wouldn’t have caught all of the cases I’d observed in the data. Afterall it was sometimes connecting and sending data just fine, it just happened to be totally useless junk data.
So instead I opted for monitoring the data itself, after it’s been collected and routed through AWS IoT Core. This approach has the benefit of monitoring if the actual outcome I’m after – valid sensor data being collected and stored in Timestream – is being achieved.
Helpfully, AWS have published a nice blog post looking at how to do exactly this. The solution basically entails:
- Setting up an SNS topic to send notifications to
- Creating subscriptions to the SNS topic, in my case for email delivery
- Creating a little Lambda function to check the values of the latest content in the Timestream table, and send notifications when needed
- Defining a schedule in EventBridge (the successor to CloudWatch Events) to trigger the Lambda function
As a GrovePi+ failure seems to give data that is garbage in a handily consistent way, I decided to keep this simple and only look at one sensor’s reading. As the soil moisture sensors have a clearly defined range for valid readings, and consistently giving readings well outside this range in the event of a failure, I went with checking one of these to determine if a failure had occurred.
First I defined a Timestream query to fetch any readings from the specified soil moisture sensor in the last 5 minutes:
SELECT * FROM "succulentpi_readings"."metrics" WHERE (time between ago(5m) and now()) and (measure_name = 'plant_pot_soil_moisture_top')
This will return nothing if the IoT device has failed to send data on its regular 5 minute schedule, which would then be cause to fire a notification.
If the query does return data, we check if it falls in the valid range for the sensor (0 to 950), and if not that’s also a notification!
Of course this little Lambda could be fattened up to check the full set of sensor values each time, and therefore be more robust, however for my purposes keeping it small keeps it fast, which keeps it cheap 😉
You can find the full code for the little Lambda function on GitHub. It uses 4 environment variables to retrieve the details of what to look at in Timestream and which topic to send SNS notifications to, with the hopefully self-explanatory names of:
If your application doesn’t also happen to be named “SucculentPi”, these are simple enough to rename! You could also use AWS Systems Manager Parameter Store or AWS Secrets Manager to centrally manage and monitor configuration values and secrets in your Lambda functions, which would be advisable in a production environment. For my use case, this is another example of keeping things cheap.
I selected a 5 minute window as that’s the intended frequency of data collection on the Raspberry Pi. However this will only work as we want it to for monitoring if the check is correctly scheduled. If we schedule the checks to occur on the same minute as the IoT device is delivering data, we may get inconsistent results as to which timestamp we’re checking. We could work around this by also checking the value of the timestamp for the sensor reading, however that would make our Lambda less little again. Also, adding complexity also tends to add fragility, which we don’t want.
Instead, I scheduled it to occur 2 minutes after each run time for the data collection. The data capture Python script runs on this cron schedule:
*/5 * * * *
So it runs on the hour, and then at 5 minute intervals (5 minutes past, 10 minutes past, etc.). Therefore by scheduling the Lambda via EventBridge to run on this schedule:
7 6,12,20 * * ? *
The checks will run 2 minutes after the run time of the data capture script; enough to account for any small variance in the Raspberry Pi’s clock and ensure that a 5 minute window in the Timestream query can only pick up data from the latest intended run, ensuring the query returns no data if that run didn’t happen (rather than potentially returning data from the previous run).
To further control costs, I also limited the monitoring to 3 times a day. I felt this would be enough to promptly alert me to any issues (let’s be honest, I’m not going to be getting up in the middle of the night to fix this if it breaks, after all 😉) whilst reducing the amount of compute time consumed by the little Lambda, and compute time is, of course, money in the cloud!
With the addition of the monitoring solution, our architecture for the whole solution now looks like this:
Basically, it’s back to data collection. With the loss of most of the data that should have been collected since the last blog, we’re back to waiting a while longer before proceeding with visualisation and experimentation. So once again… watch this space, I guess!
In the meantime, if you’d like to see some not so absurd IoT solutions built by Cloudreach (for actual customers, rather than unfortunate house plants!) there’s one covering an energy company’s solution for managing their wind farms, a fitness studio chain’s automated customer check-in system using AWS DeepLens, and the traffic data management solution for a US state’s department of transportation.