Sample clustering in Isatis.neo has proven to be efficient with big datasets

Posted on
Tagged Mining, , , , , , ,

Isatis.neo quickly groups borehole samples into homogeneous classes (e.g., facies, geological or mining domains) in an automatic way. Those who have seen the tool run qualifies it as impressive.


Classifying samples into geological domains is a fastidious and somewhat subjective step in geological modeling. Geovariances and Mines Paris Tech have developed a combination of Geostatistical Hierarchical Clustering (‘GHC’) and Support Vector Machine (‘SVM’) to reduce the subjectivity and improve the productivity of that crucial step of MRE. In particular, updating an existing sample classification occurs in a shorter time than traditional methods with more flexibility and dynamism.

GHC is a clustering algorithm that respects the spatial connectivity of data, forming subsets according to the degree of similarity between samples, eventually assigning a domain to each sample. SVM is a machine learning algorithm used when working with big data sets. In the first step, a fraction of the samples are classified using GHC, and in the second step, the remaining samples are classified using SVM supervised by the result of the first classification. Using hybrid classification speeds up the classification procedure.

Geovariances has tested GHC and SVM algorithms for fast and flexible sample classification with updating capacities with an actual 3D data set, which BHP Billiton kindly provided. The case study dataset consists of about 2.120 vertical drill holes, with 114.842 samples and 45 variables.

To begin with, a first drill hole campaign (65.468 samples) was taken into account. The dissimilarity between the samples is based on five numerical variables (Fe, Al2O3, SiO2, and spectral measurements of hematite and goethite) and a categorical variable (weathering). A weight was attributed to each variable based on its relevance to the overall domaining rationale. Post-processing tools were used to smooth the output (variable with a given domain assigned to each sample).

In a second time, a second campaign (49.374 samples) with additional data information was added and classification was updated using SVM with the same weights.

Geostatistical Hierarchical Clustering and Support Vector Machine techniques give similar results to manual classification for both campaigns, but they have the added benefits of being able to integrate many variables (such as grade, structural data, lithology, etc.) easily and update quickly and flawlessly with new data.