WiFi-based human pose estimation is a promising alternative to camera or wearable sensor-based approaches due to its privacy preserving nature, robustness to lighting and occlusion, and reliance on commodity hardware already present in most homes. Despite this potential, research progress remains constrained by the scarcity of large scale CSI to pose benchmarks, and models that treat CSI as a generic input rather than exploiting its inherent frequency temporal structure. We introduce WiSight, a dataset of over 380,000 paired CSI and 3D joint frames spanning 9 light exercise activities performed by 20 individuals, designed for smart elderly care monitoring. Alongside the dataset we release an end to end pipeline from raw CSI captures to machine learning ready tensors and provide benchmarks against established WiFi pose estimation models under both random and person-identity splits under identical data and metrics to provide a strong baseline for the dataset. As a reference architecture we further propose a dual-stream model whose frequency and temporal branches each employ a state-space mixer with learned decay, attractor and control, followed by transformer attention and a fusion refinement head. Our dataset and codes are publicly available.