Depth Sensing Technologies for Camera Traps

@hpy had asked me about a quick review of potential technologies that could be used to incorporate depth sensing capabilities into camera traps. The idea is that if camera trappers can have decent depth information from their cameras they can automatically do a lot more stuff with high precision (like estimate the size of the animals passing by with greater accuracy).

I figured I might as well cross post this quick little list I made in case it inspires anyone, or if anyone has other ideas to toss into this arena!

Reminder also that there’s lots of fun ideas for new camera traps out there, but a huge difficulty always seems to be making good cases that can deal with lots of abuse from people, transportation, weather, and animals.

Here’s a quick and dirty list of technologies and possible ideas I talked about with my friends @juul and Matt Flagg:


TOF arrays (e.g. this 8x8 array from sparkfun SparkFun Qwiic ToF Imager - VL53L5CX - SEN-18642 - SparkFun Electronics )

  • Autonomous Low-power mode with interrupt programmable threshold to wake up the host
  • Up to 400 cm ranging
  • 60 Hz frame rate capability
  • Emitter: 940 nm invisible light vertical cavity surface emitting laser (VCSEL) and integrated analog driver

IR pattern projection (e.g. Kinect, Realsense)

  • Limits - some have difficulty in direct sunlight

calibrated Laser Speckle projection

  • Could flash really bright laser speckle and photograph it
  • could be visible in daylight, or have filters for specific channels
  • could be very sensitive to vibration if the laser shifts and decalibrates

Structural light projection

  • limits- very slow, can’t really work for moving things

LIDAR scanners

  • limits - VERY expensive (like 600$+)

AI Prediction Based

single view depth prediction (e.g.
Results are simply an inference of machine learning, not actual depth sensing. Would require lots of calibrated training.


Personally, the passive methods of depth estimation make me the most excited, since just using 2-D camera images doesn’t add much new hardware into the mix, and helps future-proof designs, since photogrammatic techniques (like can improve and still use old 2D images

Pre-calibrated Stereo Depth

  • passive stereo depth (no active illumination), accuracy requires adequate lighting and the texture of objects/scenes. The typical accuracy is approximately 3% of distance but varies depending on the object/actual distance.
  • Accuracy drops as the distance increases.

Off the shelf kits

OPENCV AI Kit lite -stereo grayscale cameras +

Min depth perception: ~18 cm (using extended disparity search stereo mode)
Max depth perception: ~18 m

Multi-camera arrays
(This is my favorite idea, so i even drew some pictures)

  • There are SUPER cheap ESP32 cameras available for like $6-$20 USD
    Like this one for $14 which has a display we wouldn’t need or this one for $20 that even has a case and nicer specs

  • You can put these ESP32 boards into “hibernation mode” which requires just like 3-5 microamps to stay on (meaning they could last months)

  • Get 5-10 of these cameras that you could set up as an array (this could cost about the same as a single off-the shelf camera trap

  • The array could be all connected to a single unit that is connected to a tree with telescoping arms

  • or several cameras could be independently connected around an area the animal might go through

  • then the cameras could be woken from hibernation by simple PIR motion detectors, grab images, and transfer them to a central node camera

  • finally the array of photos could be processed through something like COLMAP to get 3D reconstruction of each shot

  • a person may need to walk through the target area after setting up the cameras with something like a chessboard for calibration to make the 3D reconstruction easier

Other camera modules are also available if you want to have fancier optics that the 2MP


This is not something that I work on, but got curious if something had been implemented for OpenMV (which I quite like as an open project). This came out of a quick google search, maybe useful I hope

1 Like

The multi-camera array reminds me a lot of a Bullet Time implementation. Other sillinesses are possible.

I think probably stereoscopy would be the simplest to employ. Not necessarily as complicated as Intel’s RealSense or Microsoft’s Kinect (and there are other RGB+D cameras out there), but just 2 regular cameras a distance apart, triggered at the same time. @akiba has been working on an API to trigger regular camera traps, this would be useful since conservationists likely already have camera traps on hand.

1 Like

Sorry I’m a bit late to a topic that I initiated. :sweat_smile: I didn’t know there’s single view depth prediction, amazing stuff.

Thank you so much to @hikinghack for pulling this together! Not to mention cross-posting this to the Wildlabs forum. Awesome to see a response from @Freaklabs there.

I just read this again, and think it might be helpful to think about this from the following angles.

0. Which ecological/conservation questions might depth-sensing data answer?

Some off the top of my head:

  • Estimating wildlife populations - This is a big one, I know there are existing mathematical models that can make use of animal observation data only if there’s a good way to get depth-sensing info. Right now it’s a super labor intensive process that’s not practical (I can explain more if there’s interest).
  • Measuring the size of animals - This can be a proxy for age, which provides demographic information about the species in question.
  • Movement speed - If you can take images in burst mode (e.g. 3-4 photos per second) with depth data, you can estimate how fast an animal is moving.
  • What else?

1. What ecological/conservation questions can each tech help answer?

For example, since structural light project is slow and can’t work on moving things, that constrains the type of data you get. Or, LIDAR scanners are powerful but expensive, so might be hard to deploy a large number of them in an array so you don’t get as much spatially distributed data. What are the implications of each?

2. Common evaluation criteria for each tech

Such as:

  • Resolution
  • Range
  • Power needs
  • Response time
  • Spatial scalability (i.e. how feasible to deploy an array of these devices in the field)
  • Cost $$$
  • Can it sync with or replace existing camera trap images
  • Technical complexity for building a camera trap out of it

How does the above sound? Is there a better approach? Or maybe I should post this to the Wildlabs forum instead. :woman_shrugging:

Eventually, I think it would be super cool to develop an open source hardware camera trap. But like @hikinghack said even the case would be a challenge, not to mention other things like a quick triggering system, power requirements, etc. But I think it’s a worthwhile endeavor especially if we can bring new technology to the table like depth-sensing, something multiple ecologists have dreamed about but don’t have the ability to create.

Maybe a first step is to see if @Freaklab’s BoomBox system can be adapted??

I know @laola has also indicated an interest in this, so please chime in!

Just adding a possible ESP32 board here. The TinyPICO which is released under the CERN OHL 1.2 license!

I just been playing with a real-sense 415. its about 250USD. It’s fun, but too expensive for deploying all over in the wild. Kinect can be bought 2nd hand all over, sometimes super cheap. but they all need a computer attached.

also i bought 2 of those exp 32 ttgo camera modules, but havent played with them yet intensively. also wanted to do camera traps with them to monitor my dormouse population in the basement.

i am in for trying to do this more in-depth. mbe a focus call soon about this?


1 Like

Hi @dusjagr that’s super cool! I’m impressed by the high resolution. Who knows, it might be possible to adapt a Kinect into an addon module for a camera trap, or something like that. And even without a full Kinect, if we can figure out what sensor modules are in it, there might be a way to just buy the sensors?? IIRC @Freaklabs from the Wildlabs forum once posted something about how to tap into a camera trap’s triggering mechanism for this purpose, I can try to dig it up.

Yes, I think it would be great to have an initial, exploratory call about all this.

Is anyone else interested? If so, I can set up a poll to find a date soon-ish.

We briefly chatted about this thread’s topic in today’s GOSH Community Call. Thank you everyone pllus @hikinghack, @laola, and @dusjagr for coming!

@hikinghack shared a link to the ESP-32 based DIY camera trap (archived link).

@dusjagr’s group is thinking of a real time wild bear detection system. The idea is that a camera trap would trigger a warning for a village when a bear comes into range. I suggested that since this requires a real time response, the camera trap needs to send its images to a separate device that does onboard automatic recognition of the animal’s species. If it’s a bear, then it would send a warning to the village.

Interesting ideas!

That said, I’d love to hear more of your feedback on how to design a camera trap with depth-sensing abilities. With it, we can use the spatial data from these images to estimate animal populations.