XrayTIFF2h5
This repository provides a workflow to transform X-rays data originally stored in .TIFF format, into .h5. Overall, this makes the data more compressed and easier to process in machine learning pipelines, such as unsat.
Description
This repository provides a workflow to transform X-rays data originally stored in .TIFF format, into .h5.
Overall, this makes the data more compressed and easier to process in machine learning pipelines, such as unsat.
To get an overview of the expected structure of the data you can check: X-ray Computed Tomography Reconstructions of Partially Saturated Vegetated Sand.
This repo will aim to the following objectives:
- Show the user how to get the data.
- Automate a folder structure creation such that it will work well with extra modules.
- Keep track of data-related issues.
The published data
Using make
- Copy or clone this repository in the location you want to store your local copy of the data.
- Open a console.
- Run
make.
Using snakemake
- Copy or clone this repository in the location you want to store your local copy of the data.
- Open a console.
- Run
snakemake -j 1.
Manually
- Copy or clone this repository in the location you want to store your local copy of the data.
- Save the dataset X-ray Computed Tomography Reconstructions of Partially Saturated Vegetated Sand inside the
expfolder. Uncompress if necessary. - Save the dataset Phase field data generated from coupled Lattice Boltzmann-discrete element simulations inside the
simfolder. Uncompress if necessary.
The resulting working folder should look like:
.
├── exp
│ ├── CoarseSand_Day2Growth.tif
│ ├── CoarseSand_Day6Growth.tif
│ ├── FineSand_Day2Growth.tif
│ ├── FineSand_Day6Growth.tif
│ └── README.txt
└── sim
├── colour_output_t00250000-0285621391.h5
├── particle_configuration.dat
└── README.md
The full dataset
The full dataset is, for now, only available on surfdrive. The format for the x-ray data is:
data
├── coarse
│ └───── loose
│ └──── day-01.tif
│ └──── day-02.tif
│ └──── ...
└── fine
├──── loose
│ └──── ...
└──── dense
└──── ...
and the format for the labels is identical.
Running the following
python tif_to_h5.py --data_path data --label_path labels --h5_path data.h5
will combine all the tif files into a single .h5 file, following again the same structure, the full file being:
data.h5 (2 objects)
├── chickpea (2 objects)
│ ├── coarse (1 object)
│ │ └── loose (2 objects)
│ │ ├── data (9, 1600, 650, 650), float16
│ │ └── labels (9, 1600, 650, 650), uint8
│ └── fine (2 objects)
│ ├── dense (2 objects)
│ │ ├── data (8, 1600, 650, 650), float16
│ │ └── labels (8, 1600, 650, 650), uint8
│ └── loose (2 objects)
│ ├── data (8, 1600, 650, 650), float16
│ └── labels (8, 1600, 650, 650), uint8
└── maize (2 objects)
├── coarse (1 object)
│ └── loose (2 objects)
│ ├── data (8, 1600, 650, 650), float16
│ └── labels (8, 1600, 650, 650), uint8
└── fine (2 objects)
├── dense (2 objects)
│ ├── data (8, 1600, 650, 650), float16
│ └── labels (8, 1600, 650, 650), uint8
└── loose (2 objects)
├── data (9, 1600, 650, 650), float16
└── labels (8, 1600, 650, 650), uint8
It also changes the class labels to:
- 0: water
- 1: outside of bounds of sample
- 2: air
- 3: root
- 4: soil
And finally it normalizes the X-ray data to be between 0 and 1, by dividing by the global maximum.
Binary labels
It is possible to reduce the complexity of the classification process by reducing the number of labels. In particular, for our specific soil analysis, the most interesting part is to identify roots. Then, we can decide to process the data in order to have only two labels:
- 0: non-root
- 1: root
To obtain the dataset for this binary classification you have to run:
python binarize_h5.py full_data.h5 --chunk_size 100
Notice that binarize_h5.py modifies the full_data.h5 in place, so it is wise to store a copy with all the labels to avoid re-running the full procedure to get all the labels.
Participating organisations
Contributors
Contact person
Related projects
UNSAT
U-Net segmentation of 3D micro-CT images of rooted soils using label data from multi-physics simulators