Anyone wanna teach me to Yolo? (Offline)

So the mothbox deployment is going great! But now we’re running into the next set of problems.

We are getting like 20 gigs a night from our field team of photos of targets that may or may not have moths on them.

They are in somewhat remote places and don’t have the best internet always.

The long-term goal is to make some software that can look through all the photos and try to ID all the moths as best as it can.

But the short-term goal is just to make some much more simple software that just learns what backgrounds look like, And segments out things that aren’t the background (i.e. insect visitors).

So basically look at thousands of photos like these, and reliably cut out small cropped photos of all the insects.

This should be quite doable by training several different open computer vision models like YOLO

And I’ve done things like this with online things like roboflow or colab, But I need an offline solution.

This requires installing a local version of YOLO, And having a local offline annotation software (which are increasingly hard to find, Right now we are using X-anylabelling).

And then pointing YOLO towards a correctly organized set of folders that hold the training data and the labeled data. This is the part I tend to get lost in

This is all so that our field techs could try to run this on their own without having to upload hundreds of gigs of data to something like RoboFlow first.

If anyone out there has gone through this and can help give me just a very straightforward walkthrough like:

-Type these commands and install YOLO.
-Put all of your blank images without insects in a folder like this

  • label a bunch of files with images, And put them in a folder like this

  • now run this command to train YOLO

  • now run this command to give YOLO new photos and see if it works.

  • All of your cropped insect images will be in a folder like this…


Hi hikinghack.

First of all, congratulations for running this great example of an open science project! I am following the project for a while and always happy to hear about the latest updates.

As your problem is the amount of data produced per night, maybe a combination of traditional computer vision and compression methodologies could be enough to reduce the storage size significantly.

A combination of blur, thresholding, edge detection, morphological erosion and dilation may be of use to separate the the background from the insects and filter out smaller objects that are not insects.

I’d be happy to give it a try!
I have relevant experience with the OpenCV toolbox, and the Mothbox project has certainly sparked my interest in getting involved.

Could you send me a representative sample of the variety of photos to be expected in different conditions?


This folder has plenty of different images you can play around with. Note the mothbox shoots HDR and so allmthe photos are in series of 3, with the “normal exposure” being the one that ends in HDR0

The main problem isn’t as much file size and compression, but we have to run a segmenting thing on all the files first anyway to send to taxonomists to ID. So more about finding a nice super accurate way to cut out individual insect photos from backgrounds across many different insect shapes and potentially slightly different backgrounds :slight_smile:

Yolo is quite good at this, but I’m just a bit lost in figuring out how to train it locally (not uploading everything to like Roboflow)

1 Like

Hi hikinghack,

I haven’t achieved a satisfactory result yet, but I thought I should share some insights.

First, the boring part (where I bug you about whether image quality could be further improved :sweat_smile:):

  • The exposure and focus conditions are quite good, with many images captured under ideal conditions. However, some images were recorded under less than ideal conditions for traditional computer vision. Given the 54-megapixel resolution, it would be beneficial to fully utilize the high pixel count by adjusting focus and exposure time.
  • What camera are you using? Is there a way to manually or automatically adjust focus through software? What software are you using to capture the images?

So far, I’ve managed to separate the imaging plate from the rest of the image, which can then be further analyzed. Adaptive thresholding seems promising, but it requires some parameter tweaking to work reliably across all image conditions.

Another approach could be to use a SegmentAnythingModel to extract bounding boxes of insects. This might seem like an excessive solution for a theoretically simple problem, but it could be implemented quickly and straightforwardly.

I managed to run the ViT-B model, using about 2GB of RAM to segment small portions of the 54MP image. The results are quite good; however, the processing load and time may be prohibitive for larger models.

The FastSAM model performed worse, but as the name suggests, it processes much faster and uses less memory.

Below is a rudimentary overview. Please forgive the hardly visible segmentation result as a layer; for the sake of development speed, I simply used the recommended implementation for visualization.


Super cool stuff Lars!

I did some SAM with it last year and had decent results too, but yeah hadn’t though could just use the segmentation to get the broader bounding box, smart idea!

it is all done with a Arducam 64 mp camera connected to a raspberry pi.

The 64mp cam is quite slow, and energy use is at a premium. So the way it takes photos is by having hard-coded expsosure and focus value, and then turning on the flash lights for the image, capturing pixels immediately, and then turning the lights back off. This takes about 1 second per photo.

before we were doing automatic exposure and focus, but the camera takes 20-30 seconds to adjust its focus, which meant tons of time that the big flash lights were on and the whole thing used twice as much power.

So most of the time the hard coded values work fine, but sometimes it might get set up weird i the field, or the lights on one might be a bit brighter than the others.

so im working on a hybrid approach where at the beginning of the night, the camera takes a minute and adjusts its own focus and exposure, then logs these values, and just uses those values for the rest of the night without having to re-focus or re-adjust the exposure.

1 Like