Sample clustering in Isatis.neo has proven to be efficient with big datasets

Posted on
Tagged Mining, , , , , , ,

Isatis.neo quickly groups borehole samples into homogeneous classes (e.g., facies, geological or mining domains) in an automatic way. Those who have seen the tool run qualifies it as impressive.

Sample clustering with Isatis.neo


The classification of samples into geological domains is a fastidious and fairly subjective step in geological modeling. Geovariances and Mines Paris Tech have developed a combination of Geostatistical Hierarchical Clustering (‘GHC’) and Support Vector Machine (‘SVM’) in order to reduce the subjectivity and improve the productivity of that crucial step of MRE. In particular, the updating of an existing sample classification occurs in a shorter time than traditional methods with more flexibility and dynamism.

GHC is a clustering algorithm that respects the spatial connectivity of data, forming subsets according to the degree of similarity between samples, eventually assigning a domain to each sample. SVM is a machine learning algorithm used when working with big data sets. In a first step, a fraction of the samples are classified using GHC and in a second step, the remaining samples are classified using SVM supervised by the result of the first classification. Using hybrid classification speeds up the classification procedure.

Geovariances has tested GHC and SVM algorithm for fast and flexible sample classification with updating capacities with a real 3D data set which was kindly provided by BHP Billiton. The case study dataset consists of about 2.120 vertical drill holes, with 114.842 samples and 45 variables.

To begin, a first drill hole campaign (65.468 samples) was taken into account. The dissimilarity between the samples is based on 5 numerical variables (Fe, Al2O3, SiO2, and spectral measurements of hematite and goethite) and a categorical variable (weathering). A weight was attributed to each variable based on the relevance of each variable to the overall domaining rationale. Post-processing tools were used to smooth the output (variable with a given domain assigned to each sample).

In a second time, a second campaign (49.374 samples) with additional data information was added and classification was updated using SVM with the same weights.

Geostatistical Hierarchical Clustering and Support Vector Machine techniques give similar results when compared to manual classification for both campaigns, but they bring the added benefits of being able to integrate easily many variables (such grade, structural data, lithology, etc.) and to update quickly and flawlessly with new data.