Multimodal Data Integration for Sustainable Indoor Gardening: Tracking Anyplant with Time Series Foundation Model

This paper introduces a multimodal framework integrating computer vision, environmental sensors, and Lag-Llama time-series models for automated plant health monitoring in sustainable indoor gardening.
Key Findings
- Multimodal integration (RGB imagery, phenotypic ratios like area-to-height, environmental data such as temperature/humidity/VOC) boosted water stress prediction accuracy in basil plants grown in a Vivosun Smart Grow Tent.
- Zero-shot Lag-Llama outperformed fine-tuning, achieving lowest errors with full multimodal data: CRPS 0.00109, MSE 0.009680, MAE 0.096282.
- Tracking Anyplant model (built on SAM and XMem) effectively extracted features like RGB values, plant dimensions from webcam images, enabling scalable CEA applications.
Future Directions
- Expand to diverse plant species, varied environments, and refine fine-tuning for minute-level temporal data mismatches.
- Integrate with Building Management Systems (BMS) for energy-efficient urban agriculture and automated irrigation.
- Pre-train Lag-Llama on high-granularity datasets to surpass zero-shot performance; add sensors for broader phenotyping.
