The Toulouse Hyperspectral Data Set is the combination of 1) an airborne hyperspectral image acquired by the AisaFENIX sensor over Toulouse, France, during the CAMCATT-AI4GEO campaign and of 2) a land cover ground truth, provided with standard train / test splits for the validation of machine learning models on various tasks such as semantic segmentation.

The image is provided in ground-level reflectance with a very high spatial resolution (1 m ground sampling distance) and spectral resolution (< 8 nm) from 0.4 µm to 2.5 µm (310 channels*). More than 380,000 pixels are sparsely labeled with land cover classes (and secondarily with land use classes) over an area of 90 km². The land cover nomenclature contains 32 classes hierarchically organized into 16 impermeable surfaces and 16 permeable surfaces as illustrated below:

Nomenclature

*Raw data has 420 spectral channels from which atmospheric absorption bands and very noisy bands are removed, resulting in 310 spectral channels.

Python library

A Python library allows to easily build Pytorch data loaders and run experiments:

                    import torch
                    from TlseHypDataSet.tlse_hyp_data_set import TlseHypDataSet
                    from TlseHypDataSet.utils.dataset import DisjointDataSplit

                    dataset = TlseHypDataSet('/path/to/dataset/', pred_mode='pixel', patch_size=1)
                    # Load the first standard ground truth split
                    ground_truth_split = DisjointDataSplit(dataset, split=dataset.standard_splits[0])
                    train_loader = torch.utils.data.DataLoader(ground_truth_split.sets_['train'], shuffle=True, batch_size=1024)

                    for epoch in range(100):
                          for samples, labels in train_loader:
                                ...
            

Qualitative comparison

In our paper, we qualitatively compared the data set of Toulouse to two publicly available hyperspectral data sets: Pavia University and Houston University. In particular, we computed spatial and spectral hand-crafted features from 64 x 64 pixel patches, represented in the figure below through a t-SNE transformation:

dataset_comparison

Download data

The hyperspectral image is divided in tiles that can be freely downloaded at https://camcatt.sedoo.fr/catalogue/. The tiles used by the Toulouse Hyperspectral Data Set are the tiles 1b, 1c, 1d, 1e, 3a, 3d, 3e, 5c, 9c.

The ground truth is directly included in the Python library. If you do not use it, you can still download the ground truth here.

Citation

If you use this data set, please cite both the CAMCATT-AI4GEO data paper and the Toulouse Hyperspectral Data Set paper.

                    @article{ROUPIOZ2023109109,
                    title = {Multi-source datasets acquired over Toulouse (France) in 2021 for urban microclimate studies during the CAMCATT/AI4GEO field campaign},
                    journal = {Data in Brief},
                    volume = {48},
                    pages = {109109},
                    year = {2023},
                    issn = {2352-3409},
                    doi = {https://doi.org/10.1016/j.dib.2023.109109},
                    url = {https://www.sciencedirect.com/science/article/pii/S2352340923002287},
                    author = {L. Roupioz and X. Briottet and K. Adeline and A. {Al Bitar} and D. Barbon-Dubosc and R. Barda-Chatain and P. Barillot and S. Bridier and E. Carroll and C. Cassante and A. Cerbelaud and P. Déliot and P. Doublet and P.E. Dupouy and S. Gadal and S. Guernouti and A. {De Guilhem De Lataillade} and A. Lemonsu and R. Llorens and R. Luhahe and A. Michel and A. Moussous and M. Musy and F. Nerry and L. Poutier and A. Rodler and N. Riviere and T. Riviere and J.L. Roujean and A. Roy and A. Schilling and D. Skokovic and J. Sobrino},
                    keywords = {Land surface temperature, Spectral emissivity, Spectral reflectance, Air temperature, Airborne LiDAR, Atmospheric data, Urban area},
                    }

                    @misc{thoreau2023toulouse,
      title={Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques},
      author={Romain Thoreau and Laurent Risser and Véronique Achard and Béatrice Berthelot and Xavier Briottet},
      year={2023},
      eprint={2311.08863},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
               

Contact

For any questions, you can email us at romain.thoreau@cnes.fr.