Instructions to use TheVortexProject/insectnet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use TheVortexProject/insectnet with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("TheVortexProject/insectnet", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
InsectNet
A BirdNET-Pi sidecar that classifies insect sounds in real time. Research prototype β active development.
What It Is
InsectNet is a lightweight sklearn head trained on BirdNET's 6,522-dim logit space. It runs alongside BirdNET-Pi on a Raspberry Pi, watches the audio stream, and sorts captured WAVs into acoustic classes.
The architecture is simple: StandardScaler β OneVsRest(LogisticRegression). Nothing novel β the interesting part is that BirdNET's logit space encodes insect acoustic structure well enough that a linear probe works for several classes.
What's Validated
Field validation at Pine Hollow, Tennessee (35.8565, -83.3744):
| Class | Status | Confidence (field) | Notes |
|---|---|---|---|
| background | Production | N/A | 0.984 F1, 1,669 public clips + field negatives |
| cicada_drone | Working | 83-100% | Natural capture at 83%, playback at 99-100%. AC unit false positive at 92%. |
| frog | Working | 51-99% | Natural chorus confirmed. 440+ captures in one evening, two species identified. |
| cricket_katydid | Likely working | 99+% | Playback at 100%. Natural summer data pending. |
| grasshopper | Data-limited | TBD | 183 training clips, 0.701 F1. Not production-ready. |
| bee | Untrained | TBD | 43 training clips, 0.608 F1. No real field captures. Known false positives from weed whacker and night noise. |
What It's Not
This is not a finished product. It's a working research prototype that has been field-tested enough to know it catches real insects β and also catches enough false positives to know it shouldn't be trusted blindly.
- The F1 numbers are from cross-validation on public training data, not from field deployment. Actual performance varies with environment, mic placement, and insect proximity.
- All threshold tuning was done over one month at a single location.
- Grasshopper and bee classes need substantially more training data before they can be used without human review.
Known Limitations
- BirdNET dependency. The classifier requires BirdNET's TFLite model to extract logits. Without BirdNET, the classifier can't run.
- Mic placement. The outdoor mic at Pine Hollow is upward-facing for birds. Ground-level insect sounds must be loud to reach it.
- No cicada species channels. BirdNET has zero cicada labels. Cicada detection relies on general acoustic features in the BirdNET embedding space.
- False positives. AC units β cicada_drone (92%). Weed whackers β bee (98%). Night noise β bee (50-70%).
- All BirdNET species IDs are approximate. BirdNET maps to the closest species in its 6,522-label set, which may not be the true species.
How to Use
The classifier alone isn't useful standalone β it needs BirdNET's TFLite model to produce logits. The full capture pipeline lives on GitHub:
https://github.com/vortexpjeff/insectnet
# After extracting BirdNET logits (6,522-dim vector):
import joblib
clf = joblib.load("classifier.joblib")
X = clf["scaler"].transform(logits.reshape(1, -1))
scores = clf["classifier"].predict_proba(X)[0]
for i, cls in enumerate(clf["classes"]):
print(f"{cls}: {scores[i]*100:.1f}%")
Training Data
| Source | Clips | License | Content |
|---|---|---|---|
| InsectSet459 | ~1,800 | CC BY-NC-SA 4.0 | 459 insect species, primarily Orthoptera |
| iNatSounds | ~1,041 | CC BY-NC 4.0 | iNaturalist insect observations |
| ESC-50 | 1,519 | CC BY-NC 4.0 | Environmental noise (background class) |
| Pine Hollow field | 38 (unreviewed) | CC BY-NC-SA 4.0 | Natural captures from Pi sidecar |
All training data and the BirdNET backbone are non-commercial. Derivative classifiers must use a compatible license.
Project Status
Actively developed. Summer 2026 is the primary field data collection window for improving grasshopper, bee, and cricket classes. New captures are being accumulated continuously via the BirdNET-Pi sidecar.
License
CC BY-NC-SA 4.0 β See LICENSE file.
- Downloads last month
- -