Exploratory Data Analysis

Spatial exploratory data analysis is the first crucial stage of any stochastic study. ISATIS provides a particularly efficient tool for data clean-up and spatial analysis through interactive and linked-together base maps, histograms, variograms and other statistical representations.

A wide range of classical statistics as well as geostatistical analysis tools are available simultaneously in the same module. Statistical reports display a wealth of results which can be reusable in third party softwares.

(JPEG) ISATIS incorporates a fully interactive Exploration Data Analysis tool to investigate the statistics of the variables to be processed. The various applications which are involved offer a large variety of univariate and multivariate tools based on conventional statistics (qq-plots, c2 tests, multilinear regressions, bigaussian distribution tests, PCA, etc.) and on geostatistical analysis (H-scatter plots, variograms, variogram maps, etc.).

They are all represented graphically in linked windows: the impact of selecting or discarding points in one of the windows is directly passed on to the other views. A Weight Variable can be introduced to ponderate calculations in relevant statistical graphs.

- Summary Statistics

Three categories of statistical values are computed inside ISATIS. They describe:
- the location of the data distribution with the number of defined samples, the minimum and the maximum values, the mean and the quantiles;
- the variability of the data with the variance and the standard deviation;
- the shape of the distribution with the coefficient of skewness, the kurtosis and the coefficient of variation; In the multivariable analysis, the correlation matrix between the variables is also provided.

- Histogram

Constant class width is used in this classical representation of the data frequency. Cumulative Histogram and Inverse Cumulative Histogram representations are also available.

- Quantile-Quantile Plot

This good visual tool for comparing two distributions is used in Isatis to compare the experimental quantile distribution of a variable with a theoretical quantile distribution. Several reference laws are available (Uniform, Gaussian, Power, Lognormal, Gamma, Exponential).

- Scatter Plot

This bivariate representation is used to analyze the correlation between two variables, or to point out any anomaly in the data set. Active samples are plotted according to the value taken by the first variable (coordinate along the Y-axis) versus the value of the second variable (coordinate along the X-axis). The relationship that links these variables may be represented in the same graphic. The simplest way to represent this relationship is to calculate the linear regression between the two variables. The conditional expectation curve or the trimmed mean may also be computed. If more than two variables are analyzed, the graphic page represents all the diagrams that can be generated by combining the variables two by two. Several additional graphical representations may enhance X-plots (e.g. color coding of a third variable).

- Principal Component Analysis

This widely used statistical method for multivariate data analysis is implemented in Isatis. It enables a quick analysis of several variables at a time. The orthogonal factors and different types of graphical windows are computed: basemaps, scatter plots, spin plots (representation of samples where three variables are defined - these can be three factors), circle of correlations (unit circle representing the coefficients of correlation according to the factors, so the affinities and the antagonisms between variables) and scree graph (this graph shows the evolution of the different eigen values related to the factors and how they replicate the global variability).

- Base Map

The active samples are represented by a symbol, the dimension of which is proportional to the value of the variable. When the data is collected on a regular grid, the base map is displayed in raster mode. Other representations are also available such as literal maps (each sample location is plotted with the value of the data), contour maps (to visualize the general trend of the data), symbolic maps (the data is classified into classes which are represented by a symbol or a color - this can derive in indicator maps if only two classes are defined) or gradient maps (represented as proportional and directional arrows).

- H-Scatter Plot

This X-Y representation of two variables is meant to analyze the spatial continuity of the data and display all the pairs of samples which are separated by a certain distance along a given direction. The coordinates correspond to the value of the first variable at the first sample location versus the value of the second variable (which can be identical to the first one) at the second sample location. The shape of the cloud of points spreads out as the spatial correlation between the two samples decreases or the relationship between the two variables weakens.

- Variogram

This is the fundamental tool of geostatistics, which establishes the correlation between samples and between variables as a function of the distance. In Isatis, the variogram may be calculated in various directions or specifically along lines. The cloud of pairs from which the curves are derived can also be displayed and used for exploration. The variogram can be replaced by a large variety of representations of the spatial variability (in total, 18 representations are available, among them the covariance, the correlogram, the madogram, the rodogram, pairwise variograms and gaussian variogram transformations, aso.).

- Variogram Map

(JPEG) The variogram map may be used in 3D to identify multiple directions of anisotropy related to structural and depositional controls on sedimentation patterns.

This representation of the variogram in all the directions is a good visual tool to highlight possible anisotropy in the data. The principle is to define a grid such that the origin of the space is located at the center of this grid. Each pair of samples corresponds to a distance and a direction, which can be converted into a grid cell, and to a variability, which contributes to the cell valuation.


<- Back