qlyoung's wiki

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
bird_bar [2024/01/31 22:48] – [Dataset] add link to nabirds dataset qlyoungbird_bar [2024/07/03 19:16] (current) – change ai tag to machine learning qlyoung
Line 1: Line 1:
 ====== bird bar ====== ====== bird bar ======
 +
 +{{ :feeder-with-chickadee.jpg?200|Bird feeder with Carolina chickadee perched on it}}
  
 At the start of 2021 I received a window-mount bird feeder as a secret santa gift (thank you!). As someone who loves birds I was excited to put it up and get a close up view of some of the birds that inhabit the woods around where I live. It’s a great little feeder and within around 3 days I had birds showing up regularly. At the start of 2021 I received a window-mount bird feeder as a secret santa gift (thank you!). As someone who loves birds I was excited to put it up and get a close up view of some of the birds that inhabit the woods around where I live. It’s a great little feeder and within around 3 days I had birds showing up regularly.
- 
-{{:feeder-with-chickadee.jpg?200|Bird feeder with Carolina chickadee perched on it}} 
  
 Shortly after installing the feeder I had the idea to mount a camera pointing at it and stream it to Twitch, so that I could watch the birds while I was at my computer. While watching I found myself wondering about a few of the species I saw, and looking up pictures trying to identify them. Then it hit me - this is a textbook computer vision problem. I could build something that used realtime computer vision to identify birds as they appeared on camera. Shortly after installing the feeder I had the idea to mount a camera pointing at it and stream it to Twitch, so that I could watch the birds while I was at my computer. While watching I found myself wondering about a few of the species I saw, and looking up pictures trying to identify them. Then it hit me - this is a textbook computer vision problem. I could build something that used realtime computer vision to identify birds as they appeared on camera.
Line 18: Line 18:
  
 {{:feeder-with-camera.jpg?400|Bird feeder showing webcam pointed at it}} {{:feeder-with-camera-outside.jpg?200|Picture of webcam attached to the side of my apartment building}} {{:webcam-condom.jpg?200|Picture of the webcam completely wrapped in plastic wrap secured by orange duct tape sitting on my window sill}} {{:feeder-with-camera.jpg?400|Bird feeder showing webcam pointed at it}} {{:feeder-with-camera-outside.jpg?200|Picture of webcam attached to the side of my apartment building}} {{:webcam-condom.jpg?200|Picture of the webcam completely wrapped in plastic wrap secured by orange duct tape sitting on my window sill}}
 +
 +{{ :webcam-tupperware-lid.jpg?200|Picture of a tupperware lid taped over the camera as a sort of primitive rain shield}}
  
 Additional weatherproofing measures included a plastic tupperware lid taped over the camera as a sort of primitive precipitation shield. Additional weatherproofing measures included a plastic tupperware lid taped over the camera as a sort of primitive precipitation shield.
- 
-{{:webcam-tupperware-lid.jpg?200|Picture of a tupperware lid taped over the camera as a sort of primitive rain shield}} 
  
 Say what you will, but this setup survived a thunderstorm immediately followed by freezing temperatures and several hours of snow. All for $0. Say what you will, but this setup survived a thunderstorm immediately followed by freezing temperatures and several hours of snow. All for $0.
Line 29: Line 29:
 ===== Bird Identification ===== ===== Bird Identification =====
  
-I’d read about [[https://pjreddie.com/darknet/yolo/|YOLO] some years before and began to reacquaint myself. It’s come quite far and seems to be more or less the state of the art for realtime computer vision object detection and classificationI downloaded the latest version ([[https://github.com/ultralytics/yolov5|YOLOv5]] at time of writing) and ran the webcam demo. It ran well over 30fps with good accuracy on my RTX3080, correctly picking out myself as “person”, my phone as “cell phone”, and my light switch as “clock”.+{{ :me-with-phone-yolo-detection.png?200|Screen capture of webcam feed after applying YOLOv5'out-of-box 'small' model to a scene of me holding up my cell phone. Picture is heavily blurred}}
  
-{{:me-with-phone-yolo-detection.png?200|Screen capture of webcam feed after applying YOLOv5'out-of-box 'small' model to a scene of me holding up my cell phone. Picture is heavily blurred}}+I’d read about [[https://pjreddie.com/darknet/yolo/|YOLO]] some years before and began to reacquaint myself. It’s come quite far and seems to be more or less the state of the art for realtime computer vision object detection and classificationI downloaded the latest version ([[https://github.com/ultralytics/yolov5|YOLOv5]] at time of writing) and ran the webcam demo. It ran well over 30fps with good accuracy on my RTX3080, correctly picking out myself as “person”, my phone as “cell phone”, and my light switch as “clock”.
  
 Out of the box YOLOv5 is trained on COCO, which is a dataset of _co_mmon objects in _co_ntext. This dataset is able to identify a picture of a Carolina chickadee as “bird”. Tufted titmice are also identified as “bird”. All birds are “bird” to COCO (at least the ones I tried). Out of the box YOLOv5 is trained on COCO, which is a dataset of _co_mmon objects in _co_ntext. This dataset is able to identify a picture of a Carolina chickadee as “bird”. Tufted titmice are also identified as “bird”. All birds are “bird” to COCO (at least the ones I tried).
Line 61: Line 61:
 YOLOv5 offers multiple network sizes, from n to x (n for nano, x for x). n, s and m sizes are recommended for mobile or workstation deployments, while l and x variants are geared towards cloud / datacenter deployments (i.e. expensive datacenter GPUs / tensor processors). The larger variants take longer to train and longer to evaluate. Since the model needed to be evaluated on each frame in a video feed, holding all else constant, for this project the choice of model size would ultimately dictate the achievable framerate. YOLOv5 offers multiple network sizes, from n to x (n for nano, x for x). n, s and m sizes are recommended for mobile or workstation deployments, while l and x variants are geared towards cloud / datacenter deployments (i.e. expensive datacenter GPUs / tensor processors). The larger variants take longer to train and longer to evaluate. Since the model needed to be evaluated on each frame in a video feed, holding all else constant, for this project the choice of model size would ultimately dictate the achievable framerate.
  
-{{:yolo-model-comparison-graph.png?200|Graph showing evaluation time versus average precision against the COCO dataset}}+{{ :yolo-model-comparison-graph.png?400|Graph showing evaluation time versus average precision against the COCO dataset}}
  
 Since the webcam demo with the s model ran at a good framerate on my GPU I chose that one to start. Since the webcam demo with the s model ran at a good framerate on my GPU I chose that one to start.
Line 77: Line 77:
 For the data oriented, here is the summary information for the training run of the model I ended up using: For the data oriented, here is the summary information for the training run of the model I ended up using:
  
-{{:final-training-results.png?200|Composite image with various graphs depicting various training metrics, including recall, precision, loss, and mean average precision on both training and validation sets}} {{:pr-curve.png?200|PR curve}}+{{:final-training-results.png?400|Composite image with various graphs depicting various training metrics, including recall, precision, loss, and mean average precision on both training and validation sets}} {{:pr-curve.png?400|PR curve}}
  
 These metrics are all good and show that the model trained very nicely on the dataset. These metrics are all good and show that the model trained very nicely on the dataset.
Line 85: Line 85:
 Trying it on an image with three species of chickadee, that to my eye look almost identical: Trying it on an image with three species of chickadee, that to my eye look almost identical:
  
-{{:three-chickadee.png?200|Composite picture of carolina, mountain and black-capped chickadees}}+{{:three-chickadee.png?400|Composite picture of carolina, mountain and black-capped chickadees}}
  
 I’m not sure if these were in the training set; I just searched for the first images of each species I found on Google Images. I’m not sure if these were in the training set; I just searched for the first images of each species I found on Google Images.
Line 113: Line 113:
 Advantages: no cables through windows, no new hardware. I have a pretty good home network and am pretty handy with this stuff so I went with that. After several hours experimenting with RTMP servers, HTTP streaming tools, and the like, I ended up with this setup: Advantages: no cables through windows, no new hardware. I have a pretty good home network and am pretty handy with this stuff so I went with that. After several hours experimenting with RTMP servers, HTTP streaming tools, and the like, I ended up with this setup:
  
-{{:video-routing.png?400|Video routing setup; shows video feed streaming from webcam host in the kitchen, to my desktop where inference is performed, then to Twitch}}+{{ :video-routing.png?400|Video routing setup; shows video feed streaming from webcam host in the kitchen, to my desktop where inference is performed, then to Twitch}}
  
 I tried a bunch of other things, including streaming RTMP to a local NGINX server, using VLC as an RTSP source on the webcam box, etc, but this was the setup that was the most stable, had the highest framerate, and lowest artifacts. Actually detect.py does support consuming RTSP feeds directly, but whatever implementation OpenCV uses under the hood introduces some significant artifacts into the output. Using VLC to consume the RTSP feed and rebroadcast it locally as an HTTP stream turned out better. The downside to this is that VLC seems to crash from time to time, but a quick batch script fixed that right up: I tried a bunch of other things, including streaming RTMP to a local NGINX server, using VLC as an RTSP source on the webcam box, etc, but this was the setup that was the most stable, had the highest framerate, and lowest artifacts. Actually detect.py does support consuming RTSP feeds directly, but whatever implementation OpenCV uses under the hood introduces some significant artifacts into the output. Using VLC to consume the RTSP feed and rebroadcast it locally as an HTTP stream turned out better. The downside to this is that VLC seems to crash from time to time, but a quick batch script fixed that right up:
Line 131: Line 131:
  
 Why yes, I do have Windows experience :’) Why yes, I do have Windows experience :’)
 +
 +{{ :setup.jpg?400|}}
  
 I ran it that way for a couple months or so, but eventually the above setup proved too unreliable. It required running lots of software on both the camera host as well as my desktop, and since it used my desktop GPU for inference it limited what I could use my computer for (read: no gaming). Also, the stream went down every time I rebooted my computer. I ran it that way for a couple months or so, but eventually the above setup proved too unreliable. It required running lots of software on both the camera host as well as my desktop, and since it used my desktop GPU for inference it limited what I could use my computer for (read: no gaming). Also, the stream went down every time I rebooted my computer.
Line 136: Line 138:
 After deciding that I wanted to maintain this as a long term installation I ponied up for a NUC and an eGPU enclosure. I initially tried to use the enclosure with an RTX 3070, but I couldn’t get it working with that card so I used a spare 1070 instead which worked flawlessly. The 1070 runs at about 25fps when inferencing with my bird model which is more than enough to look snappy overlaid on a video feed. The whole thing sits on my kitchen floor and is relatively unobtrusive. After deciding that I wanted to maintain this as a long term installation I ponied up for a NUC and an eGPU enclosure. I initially tried to use the enclosure with an RTX 3070, but I couldn’t get it working with that card so I used a spare 1070 instead which worked flawlessly. The 1070 runs at about 25fps when inferencing with my bird model which is more than enough to look snappy overlaid on a video feed. The whole thing sits on my kitchen floor and is relatively unobtrusive.
  
-{{:setup.jpg?400|}} +==== 60fps ====
- +
-====== 60fps ======+
  
 Up to this point I was streaming the window with annotated frames displayed by YOLO’s detect.py convenience script. However, this window updates only as often as an inferencing run completes, so around 25fps. It doesn't look good on a livestream. It would be better to stream video straight from the camera at native framerates (ideally 60fps) and overlay the labels on top of it. Up to this point I was streaming the window with annotated frames displayed by YOLO’s detect.py convenience script. However, this window updates only as often as an inferencing run completes, so around 25fps. It doesn't look good on a livestream. It would be better to stream video straight from the camera at native framerates (ideally 60fps) and overlay the labels on top of it.
  
 Doing this turned out to be rather difficult because you cannot multiplex camera devices on Windows; only one program can have a handle on the camera and its video feed to the exclusion of all others. Fortunately there is some [[https://very-soft.com/product/webcamsplitter|software]] which works around this. I purchased that software and used it to create two virtual camera feeds. OBS directly consumes one feed and the other one goes into YOLO for inferencing. The resulting labeled frames are displayed in a live preview window. I patched YOLO so that the preview window, which normally displays the source frame annotated with the inferencing results, only displayed the annotations on a black background without the source frame. That window is used as a layer in OBS with a luma filter applied to make the black parts transparent. With some additional tweaks to get the canvas sizing and aspect ratio correct this allowed me to composite the 25fps inferencing results on top of the high quality 60fps video coming from the camera. Doing this turned out to be rather difficult because you cannot multiplex camera devices on Windows; only one program can have a handle on the camera and its video feed to the exclusion of all others. Fortunately there is some [[https://very-soft.com/product/webcamsplitter|software]] which works around this. I purchased that software and used it to create two virtual camera feeds. OBS directly consumes one feed and the other one goes into YOLO for inferencing. The resulting labeled frames are displayed in a live preview window. I patched YOLO so that the preview window, which normally displays the source frame annotated with the inferencing results, only displayed the annotations on a black background without the source frame. That window is used as a layer in OBS with a luma filter applied to make the black parts transparent. With some additional tweaks to get the canvas sizing and aspect ratio correct this allowed me to composite the 25fps inferencing results on top of the high quality 60fps video coming from the camera.
 +
 +{{ :nuc-perf.jpg?400|}}
  
 For encoding I use Nvenc on the 1070. That keeps the stream at a solid 60fps, which the NUC CPU can’t accomplish. Between inferencing and video encode the card is getting put to great use. For encoding I use Nvenc on the 1070. That keeps the stream at a solid 60fps, which the NUC CPU can’t accomplish. Between inferencing and video encode the card is getting put to great use.
- 
-{{:nuc-perf.jpg?400|}} 
  
 This was stable for over a year, until I decided to install Windows 11. What could go wrong? This was stable for over a year, until I decided to install Windows 11. What could go wrong?
 +
 +==== Camera ====
 +
 +The original setup used an off-brand 720p webcam wrapped in a righteous amount of plastic wrap for weatherproofing. Surprisingly the weatherproofing worked well and there was never a major failure while using the first camera. However, the quality and color on that camera wasn’t good and an upgrade was due. I already had a Logitech Brio 4k webcam intended for remote work, but it ended up largely unused so it was repurposed for birdwatching.
 +
 +While the plastic wrap method never had any major failures it wasn’t ideal either. Heavy humidity created fogging inside the plastic that could take a few hours to wear off. It needed replacing anytime the camera was adjusted. Due to these problems and the higher cost of the Brio I decided to build a weatherproof enclosure.
 +
 +The feeder is constructed of acrylic. My initial plan was to use acrylic sheeting build out an extension to the feeder big enough to house the camera. I picked up some acrylic sheeting from Amazon and began researching appropriate adhesives. It turns out most adhesives don’t work very well on acrylic, at least not for my use case – the load bearing joints between the sheets were thin and I needed the construction to be rigid enough to support its own weight and the weight of the camera without sagging. Since the enclosure would be suspended over air relying on its inherent rigidity for structure the adhesive needed to be strong.
 +
 +The best way to adhere acrylic to itself is using acrylic cement. Acrylic cement dissolves the surfaces of the two pieces to be bonded, allowing them to mingle, and then evaporates away. This effectively fuses the two pieces together with a fairly strong bond (though not as strong as if the piece had been manufactured that way).
 +
 +{{:acrylic-cement.jpg?400 |}}
 +
 +{{:assembled-box.jpg?400 |}}
 +
 +Three sides were opaque to prevent sunlight reflections within the box. Joints were caulked and taped the joints to increase weather resistance. I played around with using magnets to secure the enclosure to the main feeder body but didn’t come up with anything I liked, so I glued it to the feeder with more acrylic cement, threw my camera in there and called it a day.
 +
 +{{ :full-setup-with-first-weather-shield.jpg?600 |}}
 +
 +This weatherproofing solution turned out great. It successfully protected the camera from all inclement weather until I retired that feeder, surviving rain, snow, and high winds over the course of the year.
  
 ====== Switching to Linux ====== ====== Switching to Linux ======
Line 228: Line 248:
  
  
-{{tag>from_blog technology ai nature}}+{{tag>from_blog technology machine_learning nature}}
  
  
  
  
Panorama theme by desbest
bird_bar.1706741328.txt.gz · Last modified: 2024/01/31 22:48 by qlyoung
CC Attribution-Noncommercial-Share Alike 4.0 International Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International