Home & Exploratory Data Analysis

Exploratory Data Analysis

Exploratory Data Analysis Spatial exploratory data analysis is the first crucial stage of any stochastic study. Isatis provides a unique and powerful tool for data clean-up and spatial analysis through interactive and linked-together base maps, histograms, variograms and other statistical representations.

Isatis incorporates a fully interactive Exploration Data Analysis tool to investigate the statistics of the variables to be processed. The various applications which are involved offer a large variety of univariate and multivariate tools based on conventional statistics (qq-plots, c2 tests, multilinear regressions, bigaussian distribution tests, PCA, etc.) and on geostatistical analysis (H-scatter plots, variograms, variogram maps, etc.).

Isatis Exploratory Data Analysis

They are all represented graphically in linked windows: the impact of selecting or discarding points in one of the windows is directly passed on to the other views.

A weight variable can be introduced to ponderate calculations in relevant statistical graphs.

Classical Statistical Tools

  • Summary Statistics

    Three categories of statistical values are computed inside Isatis. They describe:

    • the location of the data distribution with the number of defined samples, the minimum and the maximum values, the mean and the quantiles;
    • the variability of the data with the variance and the standard deviation;
    • the shape of the distribution with the coefficient of skewness, the kurtosis and the coefficient of variation; In the multivariable analysis, the correlation matrix between the variables is also provided.
  • Histogram

    Constant class width is used in this classical representation of the data frequency. Cumulative Histogram and Inverse Cumulative Histogram representations are also available.
  • Q-Q Plot and P-P Plot

    This good visual tool for comparing two distributions is used in Isatis to compare the experimental quantile distribution of a variable with a theoretical quantile distribution or the probabilities below a value. Several reference laws are available (Uniform, Gaussian, Power, Lognormal, Gamma, Exponential). Two experimental distributions may also be compared.
  • Scatter Plot

    This bivariate representation is used to analyze the correlation between two variables, or to point out any anomaly in the data set. Active samples are plotted according to the value taken by the first variable (coordinate along the Y-axis) versus the value of the second variable (coordinate along the X-axis). The relationship that links these variables may be represented in the same graphic. The simplest way to represent this relationship is to calculate the linear regression between the two variables. The conditional expectation curve or the trimmed mean may also be computed. If more than two variables are analyzed, the graphic page represents all the diagrams that can be generated by combining the variables two by two. Several additional graphical representations may enhance X-plots (e.g. color coding of a third variable).
  • H-Scatter Plot

    This bivariate representation is meant to analyze the spatial continuity of one or more vari­able(s). In the univariate case, this functionality allows the plotting of a scatter plot between the selected variable and itself, that is the representation of all the pairs of samples whose locations are sepa­rated by a given distance along a given direction.
  • Principal Component Analysis

    This widely used statistical method for multivariate data analysis is implemented in Isatis. It enables a quick analysis of several variables at a time. The orthogonal factors and different types of graphical windows are computed: basemaps, scatter plots, spin plots (representation of samples where three variables are defined - these can be three factors), circle of correlations (unit circle representing the coefficients of correlation according to the factors, so the affinities and the antagonisms between variables) and scree graph (this graph shows the evolution of the different eigen values related to the factors and how they replicate the global variability).
  • Min/Max Autocorrelation Factors

    This application is similar to Principal Component Analysis and is helpful in synthesizing large multivariate datasets. Extracted Orthogonal Factors are ranked in order of increasing spatial correlation:

    • Factors consisting largely of noise and exhibiting pure nugget-effect correlation structures are isolated in the lower rankings, and these need not be simulated.
    • Factors to be simulated are those capturing most of the spatial correlation in the data, and they are isolated in the highest ranks.

Geostatistical Analysis

  • Base Map

    The active samples are represented by a symbol, the dimension of which is proportional to the value of the variable. When the data is collected on a regular grid, the base map is displayed in raster mode. Other representations are also available such as literal maps (each sample location is plotted with the value of the data), contour maps (to visualize the general trend of the data), symbolic maps (the data is classified into classes which are represented by a symbol or a color - this can derive in indicator maps if only two classes are defined) or gradient maps (represented as proportional and directional arrows).
  • H-Scatter Plot

    This X-Y representation of two variables is meant to analyze the spatial continuity of the data and display all the pairs of samples which are separated by a certain distance along a given direction. The coordinates correspond to the value of the first variable at the first sample location versus the value of the second variable (which can be identical to the first one) at the second sample location. The shape of the cloud of points spreads out as the spatial correlation between the two samples decreases or the relationship between the two variables weakens.
  • Experimental Variogram

    This is the fundamental tool of geostatistics, which establishes the correlation between samples and between variables as a function of the distance. In Isatis, the variogram may be calculated in various directions or specifically along lines. The cloud of pairs from which the curves are derived can also be displayed and used for exploration. The variogram can be replaced by a large variety of representations of the spatial variability (in total, 18 representations are available, among them the covariance, the correlogram, the madogram, the rodogram, pairwise variograms and gaussian variogram transformations, aso.).
  • Variogram Map

    Isatis Variogram MapThe variogram map may be used in 3D to identify multiple directions of anisotropy related to structural and depositional controls on sedimentation patterns.

    This representation of the variogram in all the directions is a good visual tool to highlight possible anisotropy in the data. The principle is to define a grid such that the origin of the space is located at the center of this grid. Each pair of samples corresponds to a distance and a direction, which can be converted into a grid cell, and to a variability, which contributes to the cell valuation.
  • Variogram Fitting

    Variogram modeling consists in finding a single mathematical function which will capture the spatial behavior of all the variables for all the directions of the space. The model is obtained as a linear combination of basic structures, characterized by their type and range.

    The software offers a choice of basic structures for covariances, variograms or generalized cova­riances. These structures have been selected in order to be defined whatever the space dimension and to allow an easy simulation algorithm: spherical variogram, exponential variogram, gaussian variogram, cubic variogram, cardinal sine variogram, stable variogram, gamma variogram, J-Bessel variogram, K-Bessel variogram, exponential cosine variogram (hole effect), generalized cauchy variogram, power variogram.

    The weights correspond to the sills. In the multivariate case, the set of all the sills constitute the coregionalization matrix for each basic structure.

    The principle is to minimize the distance between the model and all the different experimental vari­ograms for any calculation distance. An Automatic procedure helps in calculating the optimal sills once the set of basic structures has been specified.

    When dealing with a single variable, Isatis allows to calculate:
    - an omnidirectional experimental variogram if the phenomenon is isotropic;
    - multi-directional experimental variograms if the data is anisotropic.
    In the multivariate case, Isatis may compute:
    - simple (omni or multi-directional) variograms for each one of the involved variables;
    - (omni or multi-directional) cross-variograms for any pair of variables.
  • Global Trend Modeling

    A global trend can automatically be fitted on a data set using the method of least squares.
    It is the initial step for Residual Kriging or Universal Kriging.
Share this page on 
top

Training catalog Training
training

Análisis de Datos y Estimación de Recursos

March 15 - 19, 2010

March 22 - 26, 2010

Los fundamentos de los principales conceptos y técnicas usados en la geoestadística minera - Curso de 5 días

Analyse de données et estimation de ressources avec Isatis

March 15 - 19, 2010

Maîtrisez les concepts essentiels et les méthodes indispensables de la géostatistique minière - Durée : 5 jours

Multivariate Non-linear Resource Evaluation - NEW!

March 22 - 24, 2010

May 19 - 21, 2010

Be at the forefront of mining geostatistics and learn how to improve your resource evaluation with innovative multivariate non-linear geostatistics - 3-day course

top

All events Event
events

Intersol’ 2010

March 16 - 19, 2010

Geovariances to present a paper: "Characterization of radio-contaminated soils in France: challenges and outcomes". We’ll also be present with a stand. Please come and discover our solutions!

Nantes 2010

March 29 - April 1, 2010

Geovariances to present a paper "Quantifying the transferability of hydraulic parameters using geostatistics: the Boom Clay case"

Séminaire CETAMA

April 27 - 29, 2010

Geovariances présente un exposé : "Du contrôle qualité des données à la maîtrise des incertitudes, exemples d’application de la géostatistique."

SimHydro 2010

June 2 - 4, 2010

Geovariances to present a paper "Assessment of Escherichia coli contamination in the Thau lagoon (France) : combined use of physically based modelling and geostatistics"

EAGE 2010

June 14 - 17, 2010

Geovariances will be exhibiting at EAGE Barcelona’10. Come and meet us at booth #122.