Open models for biodiversity.
Peer-reviewed papers at NeurIPS, IEEE Field Robotics, and AAAI. Open ATProto lexicons. A self-hostable Hypersphere stack. Built with frontline partners; freely usable.
Selected papers, datasets, and writing.
- daviddao.org · 2026
Governing the Commons in the Intelligent Age
David Dao
AbstractFrom Hardin to Ostrom to AI agents; design principles for sociotechnical systems that preserve human agency, build digital trust, and scale commons governance with ML in the loop.
Essay - IEEE Trans. on Field Robotics · 2025
Autonomous Aerial-Aquatic Rapid Biodiversity Assessment in the Amazon
ETH BiodivX with GainForest
AbstractAutonomous aerial and aquatic drones, vision-language models, environmental DNA, and bioacoustic classifiers chained into a 24-hour biodiversity assessment pipeline; full XPRIZE Rainforest field methodology.
Paper - NeurIPS 2024
OAM-TCD: A Globally Diverse Dataset of High-Resolution Tree Cover Maps
Veitch-Michaelis, Dao, et al.
Abstract280,000+ instance annotations of individual tree crowns from OpenAerialMap imagery; Mask2Former and SegFormer baselines released alongside the dataset for instance and semantic segmentation.
Dataset - NeurIPS 2023
Collaborative Machine Learning for the Natural World
David Dao
AbstractInvited NeurIPS workshop talk on community-in-the-loop ML pipelines for biodiversity; field data flows from Ecuador, Brazil, and the Philippines, and how attribution rewards make those pipelines durable.
Invited talk - NeurIPS 2023
GEO-Bench: Toward Foundation Models for Earth Monitoring
Lacoste, Dao, et al.
AbstractSix classification and six segmentation tasks across six remote-sensing modalities; standard pretrain / fine-tune protocol and a leaderboard for evaluating Earth-observation foundation models.
Paper - MBZUAI · 2023
GainForest: AI and Web3 for the Climate Frontline
David Dao
AbstractResearch seminar at MBZUAI covering ReforesTree, deep-learning baselines for forest carbon stock, smart-contract payouts to steward addresses, and the move toward ATProto-anchored proof-of-impact records.
Invited talk - AAAI Workshop · 2022
ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock
Reiersen, Dao, et al.
AbstractDrone photogrammetry across six agroforestry sites in Ecuador with per-tree carbon-stock annotations; CNN regression baselines released openly and later reused in Earth-observation foundation-model evaluations.
Workshop - Medium · 2018
Decentralized Sustainability: Beyond the Tragedy of the Commons with Smart Contracts and AI
David Dao
AbstractThe founding essay; satellite-driven forest-loss prediction wired to a smart-contract escrow paying steward addresses directly, demoed at the 2017 UN Climate Change Hackathon.
Essay
ATProto lexicons for nature data.
Co-authored with the Hypercerts community and shipped as five reusable layers. Every lexicon, package, and service below is open source and operable end-to-end on your own PDS.
org.hypercerts.* lexicons
the schema
Co-authored ATProto lexicons describing impact claims, evidence collections, and verification labels as portable signed records on any PDS, validated against shared JSON schemas.
Hypersphere PDS
the data home
Self-hostable atproto-pds deployment tuned for community use; OAuth with DPoP, blob storage on S3-compatible buckets, and one-command provisioning so a steward can own every record signed against their DID.
Hyperindex
the indexer
ATProto firehose subscriber that crawls org.hypercerts.* records across the network, normalises them into Postgres, and exposes the result through a typed GraphQL schema every downstream tool can query.
Hyperlabel
the trust layer
Labeller service emitting com.atproto.label.* records over Hypercert claims; tier signals (high-quality, verified, contested) feed Bumicerts and any compatible consumer the same way Bluesky labels feed downstream feeds.
Hyperscan
the explorer
Web explorer for org.hypercerts.* records; resolves DID → PDS → blob CID and renders the full evidence trail behind any Bumicert, like a block explorer for community claims.
Open models, open datasets.
Every artefact behind the papers above is downloadable today. Trained weights and datasets on HuggingFace, benchmark suites and field pipelines on GitHub, and the assistant on community-owned PDS infrastructure.
OAM-TCD
on HuggingFace
280,000+ instance annotations of individual tree crowns over OpenAerialMap imagery, plus Mask2Former and SegFormer baselines fine-tuned for instance and semantic segmentation; dataset and weights both released.
Geo-Bench
on GitHub
Community benchmark suite for Earth-observation foundation models; six classification and six segmentation tasks across six remote-sensing modalities, with a shared pretrain / fine-tune protocol and leaderboard.
ReforesTree
on arXiv
Drone photogrammetry across six agroforestry sites in Ecuador with per-tree carbon-stock annotations and CNN regression baselines; reused as a downstream task in later Earth-observation foundation-model evaluations.
BiodivX agents
in IEEE T-FR
Multi-modal field pipeline behind the XPRIZE Rainforest win: autonomous aerial and aquatic drones, vision-language agents, bioacoustic classifiers, and on-site environmental DNA sequencing chained into a 24-hour biodiversity assessment.
Taina
on PDS + Telegram
Community-owned multilingual LLM assistant co-designed with Indigenous and local communities around Manaus; memory and contributions are stored as signed records on the contributor's own PDS rather than a vendor's database.
The theoretical frame; regenerative intelligence.
How the papers, datasets, and protocols above fit one frame: AI as a tool for repairing the commons, governed by the people closest to the land.