This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| bird_bar [2023/09/09 00:21] – qlyoung | bird_bar [2026/02/22 19:24] (current) – [Training] picture size qlyoung | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== | + | ====== |
| - | At the start of 2021 I received | + | Bird bar is a bird feeder |
| - | {{:feeder-with-chickadee.jpg? | + | Live stream: https:// |
| - | Shortly after installing the feeder I had the idea to mount a camera pointing at it and stream it to Twitch, so that I could watch the birds while I was at my computer. While watching I found myself wondering about a few of the species I saw, and looking up pictures trying to identify them. Then it hit me - this is a textbook computer vision problem. I could build something that used realtime computer vision to identify birds as they appeared on camera. | + | Stats: https:// |
| - | Fast forward a few years and this has bloomed into a pretty large project. I run two feeders, one for all birds and one for hummingbirds. Both of them are livestreamed to Twitch. It's definitely the most popular project I've ever made; my friends think it's cool, and at $dayjob my current manager brought it up during my interview since he'd seen it on my website. | + | {{gallery>: |
| - | ===== The Feeder ===== | + | ===== History ===== |
| + | |||
| + | At the start of 2021 I received a window-mount bird feeder as a secret santa gift. As a bird lover I was excited to put it up and get a close up view of some of the birds that inhabited the woods around where I lived. Within around 3 days I had birds showing up regularly. | ||
| + | |||
| + | With the floor plan of my apartment at the time, the only sensible place to put the feeder was on the kitchen window; there was a screened porch on my bedroom window, or I would have put it there. Since my work desk was in my bedroom, this meant that I couldn' | ||
| + | |||
| + | Shortly after installing the feeder I had the idea to mount a camera pointing at it and stream it to Twitch, so that I could watch the birds while I was at my computer in another room. While watching I found myself wondering about a few of the species I saw and looking up pictures trying to identify them. Then it hit me - this is a textbook computer vision problem. I could build something that used realtime computer vision to identify birds as they appeared on camera. | ||
| + | |||
| + | Fast forward a few years and this has bloomed into a pretty large project, with multiple upgrades to both the hardware, software and feeder setup. It's definitely the most popular project I've made; my friends think it's cool. It's also served as a good test bed to keep up to date on advances in machine learning and accelerated computing. | ||
| + | |||
| + | ===== Feeder ===== | ||
| This section covers the evolution of the feeder construction & installation details. | This section covers the evolution of the feeder construction & installation details. | ||
| - | With the floor plan of my apartment, the only sensible place to put the feeder was on the kitchen window; there’s a screened porch on my bedroom window, or I would have put it there. This meant that I couldn' | + | ==== v1 ==== |
| - | Initially the feeder was mounted ' | + | The first feeder was a generic acrylic bird feeder. The camera was an old webcam I had lying around. Since it's mounted outside it needed to be weatherproofed. I did that with plastic wrap. Subsequent attempts greatly improved the design. |
| - | {{: | + | {{birdbar: |
| Additional weatherproofing measures included a plastic tupperware lid taped over the camera as a sort of primitive precipitation shield. | Additional weatherproofing measures included a plastic tupperware lid taped over the camera as a sort of primitive precipitation shield. | ||
| - | |||
| - | {{: | ||
| Say what you will, but this setup survived a thunderstorm immediately followed by freezing temperatures and several hours of snow. All for $0. | Say what you will, but this setup survived a thunderstorm immediately followed by freezing temperatures and several hours of snow. All for $0. | ||
| Line 29: | Line 37: | ||
| ===== Bird Identification ===== | ===== Bird Identification ===== | ||
| - | I’d read about [[https://pjreddie.com/darknet/ | + | Birds arriving at the feeder are identified using [[https://github.com/ultralytics/yolov5|YOLOv5]] fine tuned on [[https://dl.allaboutbirds.org/nabirds|NABirds]]. |
| + | ==== Background ==== | ||
| - | {{: | + | I’d read about [[https:// |
| + | |||
| + | {{: | ||
| Out of the box YOLOv5 is trained on COCO, which is a dataset of _co_mmon objects in _co_ntext. This dataset is able to identify a picture of a Carolina chickadee as “bird”. Tufted titmice are also identified as “bird”. All birds are “bird” to COCO (at least the ones I tried). | Out of the box YOLOv5 is trained on COCO, which is a dataset of _co_mmon objects in _co_ntext. This dataset is able to identify a picture of a Carolina chickadee as “bird”. Tufted titmice are also identified as “bird”. All birds are “bird” to COCO (at least the ones I tried). | ||
| - | {{: | + | {{: |
| Pretty good, but not exactly what I was going for. YOLO needed to be trained to recognize specific bird species. | Pretty good, but not exactly what I was going for. YOLO needed to be trained to recognize specific bird species. | ||
| - | ===== Dataset | + | ==== Dataset ==== |
| - | A quick Google search for “north american birds dataset” yielded probably the most convenient dataset I could possibly have asked for. Behold, NABirds! | + | A quick Google search for “north american birds dataset” yielded probably the most convenient dataset I could possibly have asked for. Behold, |
| >> NABirds V1 is a collection of 48,000 annotated photographs of the 400 species of birds that are commonly observed in North America. More than 100 photographs are available for each species, including separate annotations for males, females and juveniles that comprise 700 visual categories. This dataset is to be used for fine-grained visual categorization experiments. | >> NABirds V1 is a collection of 48,000 annotated photographs of the 400 species of birds that are commonly observed in North America. More than 100 photographs are available for each species, including separate annotations for males, females and juveniles that comprise 700 visual categories. This dataset is to be used for fine-grained visual categorization experiments. | ||
| Line 61: | Line 72: | ||
| YOLOv5 offers multiple network sizes, from n to x (n for nano, x for x). n, s and m sizes are recommended for mobile or workstation deployments, | YOLOv5 offers multiple network sizes, from n to x (n for nano, x for x). n, s and m sizes are recommended for mobile or workstation deployments, | ||
| - | {{: | + | {{ : |
| Since the webcam demo with the s model ran at a good framerate on my GPU I chose that one to start. | Since the webcam demo with the s model ran at a good framerate on my GPU I chose that one to start. | ||
| Line 77: | Line 88: | ||
| For the data oriented, here is the summary information for the training run of the model I ended up using: | For the data oriented, here is the summary information for the training run of the model I ended up using: | ||
| - | {{: | + | {{: |
| These metrics are all good and show that the model trained very nicely on the dataset. | These metrics are all good and show that the model trained very nicely on the dataset. | ||
| Line 85: | Line 96: | ||
| Trying it on an image with three species of chickadee, that to my eye look almost identical: | Trying it on an image with three species of chickadee, that to my eye look almost identical: | ||
| - | {{: | + | {{: |
| I’m not sure if these were in the training set; I just searched for the first images of each species I found on Google Images. | I’m not sure if these were in the training set; I just searched for the first images of each species I found on Google Images. | ||
| Line 95: | Line 106: | ||
| ===== Video flow ===== | ===== Video flow ===== | ||
| - | Having demonstrated that it could identify birds with relative accuracy, it was time to get it working on a live video feed. Even though YOLO is amazingly fast relative to other methods, it still needs a GPU in order to evaluate the model fast enough to process video in real time. I set a goal of 30fps; on my 3080, my final model averages roughly 0.020s per frame, sufficient to pull around 40-50fps. This is a good tradeoff between model size/ | + | Having demonstrated that it could identify birds with relative accuracy, it was time to get it working on a live video feed. |
| - | So while I had a GPU that can perform inference fast enough, | + | Even though YOLO is amazingly fast relative to other methods, it still needs a GPU in order perform inference fast enough to produce inference results for each frame of a video feed. I set a goal of 30fps; on my 3080, my final model averages roughly 0.020s per frame, sufficient to pull around 40-50fps. This is a good tradeoff between model size/ |
| + | |||
| + | So while I had a GPU that could perform inference fast enough, | ||
| * A very long USB extender to the webcam | * A very long USB extender to the webcam | ||
| - | I actually tried this, and it did work. The problem was that it required me to have my bedroom window open a little bit to route the cable through it, which isn’t ideal for several reasons. | + | I actually tried this, and it did work. The problem was that it required me to have my bedroom window open a little bit to route the cable through it, which isn’t ideal for several reasons. |
| * eGPU enclosure attached to the webcam host | * eGPU enclosure attached to the webcam host | ||
| - | By this point in the project I’d replaced the laptop with a NUC I happened to have lying around. If this was a recent NUC with Thunderbolt 3 support, an eGPU enclosure would have been the cleanest and easiest solution. | + | By this point in the project I’d replaced the laptop with a NUC I happened to have lying around. If this was a recent NUC with Thunderbolt 3 support, an eGPU enclosure would have been the cleanest and easiest solution. I wanted |
| * Some kind of network streaming setup | * Some kind of network streaming setup | ||
| - | No cables through windows, no expensive enclosures. I have a pretty good home network. This is what I went with. After several hours experimenting with RTMP servers, HTTP streaming tools, and the like, I ended up with this setup: | + | Advantages: no cables through windows, no new hardware. I have a pretty good home network |
| - | {{: | + | {{ : |
| I tried a bunch of other things, including streaming RTMP to a local NGINX server, using VLC as an RTSP source on the webcam box, etc, but this was the setup that was the most stable, had the highest framerate, and lowest artifacts. Actually detect.py does support consuming RTSP feeds directly, but whatever implementation OpenCV uses under the hood introduces some significant artifacts into the output. Using VLC to consume the RTSP feed and rebroadcast it locally as an HTTP stream turned out better. The downside to this is that VLC seems to crash from time to time, but a quick batch script fixed that right up: | I tried a bunch of other things, including streaming RTMP to a local NGINX server, using VLC as an RTSP source on the webcam box, etc, but this was the setup that was the most stable, had the highest framerate, and lowest artifacts. Actually detect.py does support consuming RTSP feeds directly, but whatever implementation OpenCV uses under the hood introduces some significant artifacts into the output. Using VLC to consume the RTSP feed and rebroadcast it locally as an HTTP stream turned out better. The downside to this is that VLC seems to crash from time to time, but a quick batch script fixed that right up: | ||
| Line 130: | Line 143: | ||
| Why yes, I do have Windows experience :’) | Why yes, I do have Windows experience :’) | ||
| - | Eventually the above setup proved unreliable. It also required running lots of software on both the camera host as well as my desktop, and since it used my desktop GPU for inference it limited what I could use my desktop computer for - not to mention the stream going down every time I rebooted my computer. | + | {{ :setup.jpg?400|}} |
| - | After deciding that I wanted to maintain this as a long term project I purchased a NUC and an eGPU enclosure. I initially tried to use the enclosure with an RTX 3070, but I couldn’t get it working with that card so I used a 1070 that was lying around instead which worked flawlessly. The 1070 runs at about 25fps when inferencing with my bird model which is more than enough. The whole thing sits on my kitchen floor and is relatively unobtrusive. | + | I ran it that way for a couple months or so, but eventually the above setup proved too unreliable. It required running lots of software on both the camera host as well as my desktop, and since it used my desktop GPU for inference it limited what I could use my computer for (read: no gaming). Also, the stream went down every time I rebooted |
| - | {{:setup.jpg?400|}} | + | After deciding that I wanted to maintain this as a long term installation I ponied up for a NUC and an eGPU enclosure. I initially tried to use the enclosure with an RTX 3070, but I couldn’t get it working with that card so I used a spare 1070 instead which worked flawlessly. The 1070 runs at about 25fps when inferencing with my bird model which is more than enough to look snappy overlaid on a video feed. The whole thing sits on my kitchen floor and is relatively unobtrusive. |
| - | What I was doing before was simply | + | ==== 60fps ==== |
| + | |||
| + | Up to this point I was streaming the window | ||
| + | |||
| + | Doing this turned out to be rather difficult because you cannot multiplex camera devices on Windows; only one program can have a handle on the camera and its video feed to the exclusion of all others. Fortunately there is some [[https:// | ||
| + | |||
| + | {{ : | ||
| For encoding I use Nvenc on the 1070. That keeps the stream at a solid 60fps, which the NUC CPU can’t accomplish. Between inferencing and video encode the card is getting put to great use. | For encoding I use Nvenc on the 1070. That keeps the stream at a solid 60fps, which the NUC CPU can’t accomplish. Between inferencing and video encode the card is getting put to great use. | ||
| - | {{:nuc-perf.jpg?400|}} | + | This was stable for over a year, until I decided to install Windows 11. What could go wrong? |
| + | |||
| + | ==== Camera ==== | ||
| + | |||
| + | The original setup used an off-brand 720p webcam wrapped in a righteous amount of plastic wrap for weatherproofing. Surprisingly the weatherproofing worked well and there was never a major failure while using the first camera. However, the quality and color on that camera wasn’t good and an upgrade was due. I already had a Logitech Brio 4k webcam intended for remote work, but it ended up largely unused so it was repurposed for birdwatching. | ||
| + | |||
| + | While the plastic wrap method never had any major failures it wasn’t ideal either. Heavy humidity created fogging inside the plastic that could take a few hours to wear off. It needed replacing anytime the camera was adjusted. Due to these problems and the higher cost of the Brio I decided to build a weatherproof enclosure. | ||
| + | |||
| + | The feeder is constructed of acrylic. My initial plan was to use acrylic sheeting build out an extension to the feeder big enough to house the camera. I picked up some acrylic sheeting from Amazon and began researching appropriate adhesives. It turns out most adhesives don’t work very well on acrylic, at least not for my use case – the load bearing joints between the sheets were thin and I needed the construction to be rigid enough to support its own weight and the weight of the camera without sagging. Since the enclosure would be suspended over air relying on its inherent rigidity for structure the adhesive needed to be strong. | ||
| + | |||
| + | The best way to adhere acrylic to itself is using acrylic cement. Acrylic cement dissolves the surfaces of the two pieces to be bonded, allowing them to mingle, and then evaporates away. This effectively fuses the two pieces together with a fairly strong bond (though not as strong as if the piece had been manufactured that way). | ||
| + | |||
| + | {{:acrylic-cement.jpg?400 |}} | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Three sides were opaque to prevent sunlight reflections within the box. Joints were caulked and taped the joints to increase weather resistance. I played around with using magnets to secure the enclosure to the main feeder body but didn’t come up with anything I liked, so I glued it to the feeder with more acrylic cement, threw my camera in there and called it a day. | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | This weatherproofing solution turned out great. It successfully protected the camera from all inclement weather until I retired that feeder, surviving rain, snow, and high winds over the course of the year. | ||
| + | |||
| + | ====== Switching to Linux ====== | ||
| + | |||
| + | As it turns out, the webcam splitter software appears to rely on some undocumented / unofficial Windows 10 APIs and does not work on Windows 11. I decided to bite the bullet and just put Linux on the NUC. | ||
| + | |||
| + | Similar to Windows, on Linux only one device can be reading from a camera device file at a time. Unlike Windows there is a very easy way to work around this called [[https:// | ||
| + | |||
| + | tl;dr: | ||
| + | |||
| + | <code bash> | ||
| + | # load v4l2loopback module; this creates a few loopbacks | ||
| + | sudo modprobe v4l2loopback | ||
| + | # set desired parameters on loopback device / | ||
| + | sudo v4l2loopback-ctl set-fps 60 / | ||
| + | # pipe video from camera device /dev/video0 to loopback | ||
| + | gst-launch-1.0 v4l2src device=/ | ||
| + | </ | ||
| + | |||
| + | After this, any number of clients can read from / | ||
| + | |||
| + | On the off chance this helps someone, here's how you set video camera parameters on device 0 (/ | ||
| + | |||
| + | < | ||
| + | sudo v4l2-ctl -d 0 -c focus_automatic_continuous=0 | ||
| + | sudo v4l2-ctl -d 0 -c focus_absolute=75 | ||
| + | sudo v4l2-ctl -d 0 -c backlight_compensation=0 | ||
| + | sudo v4l2-ctl -d 0 -c auto_exposure=3 | ||
| + | </ | ||
| - | Note that the webcam splitter software appears | + | Around this time I also replaced |
| You can watch the [[https:// | You can watch the [[https:// | ||
| Line 154: | Line 221: | ||
| In the case of sexually dimorphic species that also have appropriate training examples, such as house finches, it’s even capable of distinguishing the sex. | In the case of sexually dimorphic species that also have appropriate training examples, such as house finches, it’s even capable of distinguishing the sex. | ||
| - | {{ : | + | {{ birdbar: |
| In a few cases, such as the nuthatch and the pine warbler, the model taught me something I did not know before. Reflecting on that, I think that makes this one of my favorite projects. Building a system that teaches you new things is cool. | In a few cases, such as the nuthatch and the pine warbler, the model taught me something I did not know before. Reflecting on that, I think that makes this one of my favorite projects. Building a system that teaches you new things is cool. | ||
| Line 176: | Line 243: | ||
| Then I thought it would be cool to show these graphs on the livestream. It turns out Grafana supports embedding individual graphs, and since OBS supports rendering browser views it was easy to get those set up. | Then I thought it would be cool to show these graphs on the livestream. It turns out Grafana supports embedding individual graphs, and since OBS supports rendering browser views it was easy to get those set up. | ||
| - | {{: | + | {{birdbar: |
| I left these up for a while, but ultimately I felt they were taking up too much space in the stream so I took them down. | I left these up for a while, but ultimately I felt they were taking up too much space in the stream so I took them down. | ||
| Line 189: | Line 256: | ||
| * Retrain with background images to reduce false positives | * Retrain with background images to reduce false positives | ||
| - | {{: | + | {{birdbar: |
| - | {{tag> | + | {{tag> |
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International