Einzeltreffer — DigiBib

A comparative study of the predictive capabilities of recent advances in computational intelligence (CI) is presented. This study utilised the machine learning paradigm to evaluate the CI techniques while applying them to the prediction of porosity and permeability of heterogeneous petroleum reservoirs using six diverse well data sets. Porosity and permeability are the major petroleum reservoir properties that serve as indicators of reservoir quality and quantity. The results showed that the performance of support vector machines (SVM) and functional networks (FN) is competitively better than that of Type-2 fuzzy logic system (T2FLS) in terms of correlation coefficient. With execution time, FN and SVM were faster than T2FLS, which took the most time for both training and testing. The results also demonstrated the capability of SVM to handle small data sets. This work will assist artificial intelligence practitioners to determine the most appropriate technique to use especially in conditions of limited amo...

Recent advances in the application of computational intelligence techniques in oil and gas reservoir characterisation: a comparative study.

A comparative study of the predictive capabilities of recent advances in computational intelligence (CI) is presented. This study utilised the machine learning paradigm to evaluate the CI techniques while applying them to the prediction of porosity and permeability of heterogeneous petroleum reservoirs using six diverse well data sets. Porosity and permeability are the major petroleum reservoir properties that serve as indicators of reservoir quality and quantity. The results showed that the performance of support vector machines (SVM) and functional networks (FN) is competitively better than that of Type-2 fuzzy logic system (T2FLS) in terms of correlation coefficient. With execution time, FN and SVM were faster than T2FLS, which took the most time for both training and testing. The results also demonstrated the capability of SVM to handle small data sets. This work will assist artificial intelligence practitioners to determine the most appropriate technique to use especially in conditions of limited amount of data and low processing power.

Keywords: machine learning; computational intelligence; support vector machines; functional networks; Type-2 fuzzy logic system; petroleum reservoir characterisation

1. Background

Support vector machines (SVM), functional networks (FN) and Type-2 fuzzy logic system (T2FLS) are the three popular computational intelligence (CI) techniques that have individually featured in many successful applications published in recent times (Fang & Chen, [19]; Jong-Se, [25]; Karnik & Mendel, [26]; Maqsood & Adwait, [27]; Mendel, [29]; Toshinori, [42]). One of the relatively new areas where these techniques have been applied includes oil and gas reservoir characterisation and modelling. Oil and gas reservoir characterisation is a process for describing, in qualitative and quantitative terms, various reservoir properties in spatial variability by using available field data (Anifowose & Abdulraheem, [7]). It plays a very important role in modern reservoir management, making sound reservoir decisions, reducing uncertainties and hence, improving the reliability of reservoir predictions. The ultimate goal of any characterisation process is a robust reservoir model with realistic tolerance for imprecision and uncertainty.

A petroleum reservoir is an important component of a complete petroleum system that is composed of a body of rock, commonly sedimentary. The rock is buried beneath the earth surface with sufficient porosity and permeability to hold and transmit fluids such as oil, gas and water. Porosity and permeability are the two fundamental reservoir properties which complimentarily relate to the amount of fluid contained in a reservoir and its ability to flow. These properties make significant impacts on petroleum field operations and reservoir management (Jong-Se, [25]). The measurement of these properties is usually made on plugs that are extracted from the core of wells drilled for oil and gas exploration. The data obtained from such measurements are essential for linking permeability with porosity, and both, in turn, serve as standard indicators of reservoir quality in the oil and gas industry.

Porosity is the percentage of voids and open spaces (called pores) in a rock or sedimentary deposit, while permeability is the ease with which the rock fluids are transmitted through the pore spaces (Schlumberger, [38]). Although a rock may be very porous, it is not necessarily permeable especially in carbonate formations. Permeability is a measure of how interconnected the individual pore spaces are in a rock or sediment. It is a key parameter that is associated with the characterisation of any hydrocarbon reservoir. In fact, it is not possible to have accurate solutions to many petroleum engineering problems without having accurate values of permeability (Schlumberger, [39]). The most common reservoir rocks with sufficient number of pores, and the interconnectivity necessary for a viable production of oil and gas is the sedimentary rock. This is because they are more porous than other types of rocks and they are formed under the most appropriate temperature conditions at which hydrocarbons can be preserved.

There are several other properties of oil and gas reservoirs that are necessary to be studied and predicted such as porosity, permeability, pressure-volume-temperature (PVT), water saturation, drive mechanism, structure and seal, well spacing, well-bore stability and lithofacies. However, porosity and permeability mutually serve as a major indicator of the potential of a reservoir as well as its viability for exploration and exploitation. It is necessary to be able to predict these two properties as accurately as possible in addition to just understanding them.

A good number of studies have been carried out on the use of various CI techniques to predict various properties of oil and gas reservoirs. SVM, FN and T2FLS were chosen for this study due to their promising performance in their respective areas of applications (Al-Anazi & Gates, [3]; El-Sebakhy, [18]; Olatunji, Selamat, & Abdulraheem, [33], [34]). FN was preferred to the traditional artificial neural networks (ANN) as the former is an advancement of the latter with some of the limitations of the latter already taken care of in the former (Mohsen, Morteza, & Alli, [30]; Rusu & Rusu, [37]). In this work, we carried out a rigorous comparative study of the performance of these techniques in order to identify their strength and weakness when using them to solve various industrial problems as well as academic exercises. For the purpose of our rigorous validation and evaluation, the three techniques were used to solve one of the most challenging problems in the exploration and production of oil and gas reservoirs namely one that has to do with petroleum reservoir characterisation and modelling.

The major motivation for this study is the need for rigorous and robust comparative studies of various CI techniques in order for researchers to apply and extend their experience on the application of these techniques to different fields using real-life data sets. A deep understanding of these techniques and their behaviour under different data and operating conditions will assist researchers to determine which of them to use in their various and respective operating scenarios. This process is very much necessary especially in areas where there is scarcity of real-life data such as in petroleum engineering. As availability of relevant data, in quantity and quality, is a major limitation in petroleum engineering research, the results of this study will assist CI practitioners to determine the appropriate techniques to use in conditions of scarcity of good quality data in the industry and low processing power in the academics.

To the best of our knowledge, there is no previous study that has rigorously compared these recently used techniques in order to reveal their respective performance in different operational and data scenarios. The main objective of using CI in oil and gas reservoir characterisation is to use the data sets of known wells or formations to train models in order to predict the desired properties of new wells or formations whose values were previously unknown. The predicted properties will, in turn, be used to populate full-field simulation models for better reservoir exploration, production and management.

The main problem that this study attempts to solve is the focus of most petroleum engineering applications of CI on only the traditional ANN. Despite the several published reports such as Petrus, Thuijsman, and Weijters ([36]) on its deficiencies, it is surprising that most practitioners in the oil industry still continue to use the technique without any significant improvement. Perhaps, most researchers in the petroleum industry are not well informed about the alternative, better and state-of-the-art choices that are available to them. In view of the aforementioned, the main objectives of this study are:

To review the literature on the three selected state-of-the-art CI techniques.

To apply them on one of the real-life and most challenging problems in the petroleum industry.

To carry out a robust and rigorous comparison of the performance of these techniques.

Based on the results of the comparison, to recommend the most appropriate techniques for different data and processing conditions.

The rest of this paper is organised as follows: Section 2 presents an overview of SVM, FN and T2FLS. A detailed survey of literature on the application of CI in the oil and gas industry is presented in Section 3. Section 4 describes the data and tools, along with the detailed design methodology and implementation strategy. Results and their detailed discussions are presented in Section 5, while conclusion and a plan for future work are presented in Section 6.

2. Overview of SVMs, FNs and T2FLS

2.1 Support vector machines

SVMs are a set of related supervised learning methods used for classification and regression. They belong to a family of generalised linear classifiers (GLCs). GLCs are classifiers that are implemented by treating the input data as a nonlinear transformation prior to feeding it to a linear classifier. The transformation may result into an increase or decrease in the number of input features. The classifier is essentially thresholding a basis function regression. It can have arbitrarily shaped decision boundaries (Duda, Hart, & Stock, [16]). GLCs are characterised by setting a threshold on the prediction boundary as a linear combination of input predictor variable and their weights. Other examples of GLCs are linear models (with the exception of cases where the condition probability of the target variable is an arbitrary function of the predictor variable) and logistic regression. What makes GLCs attractive is that the parameter of the predictor variable can easily be learned in a very similar way as done in linear regression (McCullagh & Nelder, [28]).

SVM can also be considered as a special case of Tikhonov Regularisation as it maps input vectors to a higher dimensional space where a maximal separating hyperplane is constructed (Jian & Wenfen, [24]; Taboada, Matias, Ordonez, & Garcia, [41]). Tikhonov Regularisation has been used to find stable approximate solutions to linear ill-posed problems such as the one represented as follows:

(1)

Graph

where A is a linear compact injective operator between Hilbert spaces U and F with the solution μ and data f belonging to U and F.

Instead of the exact data f, suppose a noisy data fδ is available such that:

(2)

Graph

Equation (1) is said to be ill-posed in the sense that the inverse operator A− 1 of A exists but not continuous. In essence, there exists a unique solution to Equation (1) but solving it directly will not produce a right solution as any attempt to solve it using numerical methods may fail. This is where the Tikhonov Regularisation comes in to rescue. Tikhonov Regularisation attempts to solve (Cristianini & Shawe-Taylor, [14]; Nguyen, [32]):

(3)

Graph

where the regularisation parameter α is found such that:

(4)

Graph

The generalisation ability of SVMs is ensured by the special properties of the optimal hyperplane that maximises the distance to training examples in a high dimensional feature space. SVMs were initially introduced for the purpose of classification until 1995 when Vapnik et al., as reported in Fang and Chen ([19]), developed a new ϵ-sensitive loss function technique that is based on statistical learning theory, and which adheres to the principle of structural risk minimisation that seeks to minimise an upper bound of the generalisation error. This new technique is called support vector regression. It has been shown to exhibit excellent performance.

Structural risk minimisation, first introduced by Vapnik and Chevronenkis (VC) ([44]), is an inductive principle for model selection used in the machine learning paradigm for learning from finite sets of training data. It presents a general description of capacity control and provides a balance between the VC dimension of approximating functions and the empirical error. The procedure involves the execution of the following algorithm (Sewell, [40]; Vapnik, [43]):

Use a priori knowledge of the domain.

Choose a class of functions such as a degree of polynomial.

Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity.

Perform empirical risk minimization on each subset.

Select the model in the series whose sum of empirical risk and VC confidence is minimal.

The notion of the VC-dimension applies to any parameterised class F of functions x → f (θ, x) from some domain D into {0, 1}, where θ ranges over some given parameter space, for example Rw.

For the implementation of this study, the polynomial kernel function used is of the form:

(5)

Graph

where param is the kernel parameter in the SVM feature space (Cristianini & Shawe-Taylor, [14]).

Further details on SVM can be found in Abe ([1]), Al-Anazi and Gates ([2]), Anifowose and Abdulraheem ([7]), Diao, Yang, and Wang ([15]), Fang and Chen ([19]), Jian and Wenfen ([24]) and Wu and Krishnan ([45]).

2.2 Functional network

Functional network is an extension of ANN. Like ANN, it consists of different layers of neurons connected by links but unlike ANN, each computing unit or neuron performs a simple calculation: a scalar typically monotone function f of a weighted sum of inputs. The function f, associated with the neurons, is fixed and the weights are learned from data using some well-known algorithms. A FN consists of: a layer of input units containing the input data, a layer of output units containing the output data, one or several layers of neurons or computing units which evaluate a set of input values coming from the previous layer, and which gives a set of output values to the next layer of neurons or output units. The computing units are connected to each other, in the sense that the output from one unit can serve as part of the input to another neuron or to the units in the output layer. Once the input values are given, the output is determined by the neuron type, which can be defined by a function (Castillo, [9]; Castillo, Gutierrez, Hadi, & Lacruz, [10]; Castillo, Hadi, & Lacruz, [11]).

A commonly used functional equation is the generalised associative functional equations, which can be expressed as (Castillo, Gutierrez, et al., [10]; Castillo, Hadi, et al., [11]):

(6)

Graph

If the model has a d-dimensional input vector x and a one-dimensional output variable y, a generalised associative FN can be written as:

(7)

Graph

where fd+1 is a strictly monotonic (i.e. invertible) function.

More detailed description of FN along with the functional equations derivations and simplification can be found in Castillo, Gutierrez, et al. ([10]), Castillo, Hadi, et al. ([11]), and Anifowose and Abdulraheem ([7]).

FN evolved due to the limitations of ANN (Petrus et al., [36]) which include:

The determination of the number of hidden layers and hidden neurons of the network architecture are usually determined by trial and error.

A large number of data samples are usually required to fit a good network structure.

ANN is usually trapped in the local optima. This results in its instability when executed several times over the same data and operating conditions.

Despite the wide range of applications of ANN, there is still no general framework or procedure through which the appropriate network for a specific task can be designed.

FN was introduced to address some of the limitations of ANN as stated above.

Some of the features that have made FN more desirable than ANN are (Mohsen et al., [30]):

FN does not require a large number of data to fit a good network structure.

Number of hidden layers and weights are automatically learned directly from data. Hence, there is no need for tuning by trial and error like in ANN.

As functional equations are used in FN rather than backpropagation of errors, it does not suffer from the problem of local optima.

2.3 Type-2 fuzzy logic system

Fuzzy Sets (FSs) have been around for nearly 40 years, and they include Type-1 FS (fuzzy) and Type-2 FS (fuzzy fuzzy). Type-2 FS was introduced by Zadeh, as reported in Maqsood and Adwait ([27]), as an extension of the concept of Type-1 Fuzzy. T2FLS has membership grades that are themselves fuzzy. For each value of primary variable (e.g. pressure and temperature), the membership is a function (not just a single-point value). This is the secondary membership function (MF), whose domain, the primary membership, is in the [0,1] interval, and whose range, secondary grades, may also be in the [0,1] interval. Hence, the MF of a Type-2 FLS is three-dimensional, and it is the newly introduced third dimension that provides new degrees of design freedom for handling uncertainties.

The major difference between T2FLS and Type-1 FLS is in the out-processing block which contains the defuzzifier. The defuzzifier steps down the Type-2 intermediate output to the Type-1 final output. Like in Type-1 FLS, the fuzzifier maps the crisp input into a FS. This FS can be a Type-2 set. The other distinction between Type-1 and Type-2 is associated with the nature of MFs, which is not essential when constructing rules. Hence, the structure of rules in Type-2 remains the same as in Type-1. The inference process in Type-2 FLS combines rules and gives a mapping from input Type-2 FS to output Type-2 FS. To do this, one needs to find the unions and intersections of Type-2 sets, as well as compositions of Type-2 relations. Extended versions (called Zadeh's extension) can be used to give a Type-1 FS. As this extension process takes one from Type-2 output sets of the FLS to Type-1 sets, this operation is called Type Reduction which is used to produce a type-reduced set. To obtain a crisp output from Type-2 FLS, the type-reduced set needs to be defuzzified. The most natural way of doing this is to find the centroid of the type-reduced set (Karnik & Mendel, [26]; Mendel, [29]).

More details on T2FLS can be found in Karnik and Mendel ([26]), Mendel ([29]), Maqsood and Adwait ([27]), Chen, Li, Harrison, and Zhang ([13]), Anifowose and Abdulraheem ([7]), Olatunji et al. ([33], [34]).

3. Related work

The application of CI techniques has been widely studied in the oil and gas industry as well as in other fields. Some of the areas of petroleum technology in which CI has been used with success include seismic pattern recognition, porosity and permeability predictions, identification of sandstone lithofacies, drill bit diagnosis, and analysis and improvement of oil and gas well production (Ali, [4]; Chen et al., [13]; Mohsen et al., [30]; Zerandi, Rezaee, Turksen, & Neshat, [47]).

The application of CI in petroleum reservoir characterisation was pioneered by Ali ([4]) and continued through the work of Maqsood and Adwait ([27]). The latter used ANN to predict permeability from petrographic data while using fuzzy logic to screen and rank the predictor variables with respect to the target variable. They demonstrated the generalising capability of the ANN model. A recent study by Mohsen et al. ([30]) proposed a method for the auto-design of neural networks based on genetic algorithm. Most recent applications include the implementation of the hybrids of existing CI techniques by Anifowose ([5]), Helmy, Anifowose, and Faisal ([22]) and Anifowose and Abdulraheem ([7]). However, these studies lack a rigorous comparative analysis of these techniques.

Jian and Wenfen ([24]), while comparing the performances of group method of data handling, ANN and SVM, concluded that SVM could pay more attention to both the universality and extendibility of a model when the samples are very limited, showing a good prospect of its application. However, the work only compared SVM with ANN, thus did not offer a detailed comparison with more recent techniques. In another study, Taboada et al. ([41]) used different kinds of SVMs: SVM classification (multi-class one-against-all), ordinal SVM and SVM regression, and they found that the SVMs are perfectly comparable to kriging (a statistical model) and have better control of outliers. This work, however, only compared the different versions of SVM and did not include a comparison of SVM with other techniques. The work of El-Sebakhy ([17]) on the application of SVM in the prediction of PVT properties of crude oil samples only compared SVM with ANN, nonlinear regression and some petroleum engineering empirical correlations, while other CI techniques that have proven better than ANN were not included in the comparison.

Al-Anazi and Gates ([2]) compared the performance of SVM and multi-layer perceptron (MLP) in their ability to effectively predict the porosity of a heterogeneous reservoir in the face of limited log and core data. Again, the comparison was limited to SVM and MLP and thus not comprehensive enough. The works of Helmy and Anifowose ([21]), Helmy et al. ([22]) and Anifowose and Abdulraheem ([7]) involving SVM, T2FLS and FNs, however focused mainly on the performance of the hybrid models combining these techniques rather than the detailed comparative analysis among the individual techniques.

Though the work of Anifowose and Abdulraheem ([6]) addressed a comprehensive comparative performance of various AI techniques including SVM, T2FLS and FN under different data size conditions, the main focus was to suggest a clear demarcation to what constitutes a small, medium and large data sets rather than a rigorous analysis of the comparative performance of the techniques applied in the paper. Olatunji et al. ([33], [34]) who compared the performance of T2FLS with other commonly used techniques in the prediction of permeability and PVT properties of carbonate reservoirs also focused mainly on the ability of T2FLS to handle uncertainties rather than a robust comparison of the techniques in terms of other parameters such as ability to handle data sets of different sizes especially on the scale proposed by Anifowose and Abdulraheem ([6]).

This study is the first, to the best of our knowledge, which presents a rigorous and robust comparison of SVM, T2FLS and FN: three techniques that have proven to be popularly used in various CI applications especially in oil and gas reservoir characterisation.

4. Description of data and research methodology

4.1 Description of data

Well logs for porosity and permeability from six oil and gas wells were used for the design, training and validation of this work. The three well logs for porosity were petrographic measurements obtained from a drilling site in the Northern Marion Platform of North America (Site 1), and the other three for permeability were log measurements obtained from a giant reservoir in the Middle East (Site 2). The data sets from Site 1 have five predictor variables for porosity viz. top interval, grain density, grain volume, length and diameter, while the data sets from Site 2 have eight predictor variables for permeability viz. gamma ray log, porosity log, density log, water saturation, deep resistivity, micro-spherically focused log, neutron porosity log and caliper log.

A well log is a report that describes the chemical, physical, nuclear and electrical composition of the geological formation of an oil and gas well. Well logging is, therefore, the technique of taking measurements in drill holes with probes designed to measure the physical and chemical properties of rocks and the fluids contained in them. Much information can be obtained from samples of rock brought to the surface in cores or bit cuttings, or from other clues while drilling, such as penetration rate, however, the greatest amount of information comes from well logs.

The available data for each well were divided into training and testing subsets by using a stratified sampling technique. With this technique, a random 70% of the entire samples were used for training, and the remaining 30% for testing. In essence, the training subsets effectively represent the section of the wells or reservoirs that have complete log and core data samples, while the testing subsets represent uncored sections of the wells or reservoirs. The major challenge is to use the CI techniques to predict the core values (porosity and permeability in the case of this study) for the uncored sections. To further ensure fairness and integrity of the results obtained, several iterations were made, and the average of the runs was obtained. Table 1 shows the two well logs with their sizes and divisions.

Table 1 Division of data sets into training and testing.

	Site 1 (porosity)	Site 2 (permeability)
Wells	1	2	3	1	2	3
Data size	415	285	23	355	477	387
Training (70%)	291	200	16	249	334	271
Testing (30%)	124	85	7	106	143	116

4.2 Experimental design

The methodology in this work is based on the standard CI approach. The individual models were designed and implemented with their respective optimal tuning parameters.

We implemented the FN technique using the associative rule, the relationship between the input data and the output target, denoted by {xi, yi} where xi is the set of inputs and yi is the output vector, can be mathematically expressed as:

(8)

Graph

where z is the output of the FN model. This is equivalent to:

(9)

Graph

where fi(·) and gi(·) are unknown neural functions.

Using the standard form (φj,ϕj) of a polynomial family, the neuron functions can be expressed as:

(10)

Graph

and

(11)

Graph

where the coefficients aj and bj are the parameters of the FN and p and q are the orders of the polynomial (Bruen & Yang, [8]).

The main objective was to find the set of values z that would give the minimum error.

For the implementation of the SVM model, the objective was to learn a mapping x → y where x is a set of input variables {x1, x2,..., xn} and y is the output vector such that:

(12)

Graph

where α are the parameters of the function.

The goal is to minimise the training error given by:

(13)

Graph

where l is the 0–1 loss function, and Remp(α) is the empirical risk function.

The overall risk of the testing error given by:

(14)

Graph

where P(x,y) is the unknown joint distribution function of x and y (Peng & Wang, [35]).

We implemented the T2FLS model by using the Zadeh's extension principle (Mendel, [29]). Using this principle, T2FLS is defined as follows:

For a set of p inputs, x1, x2,..., xp, and one output y, the rule base is of the general form:

(15)

Graph

For l = 1,..., m.

The firing strength of the ith rule is expressed as:

(16)

Graph

The output of the T2FLS model with the extension principle is expressed as:

(17)

Graph

where M is the number of fired rules which represents the chosen t-norm.

Finally, the defuzzified output is given by:

(18)

Graph

More details about the structure and the mathematical basis of T2FLS as well as its proofs can be found in Mendel ([29]).

The optimised parameters of the three techniques are summarised in Table 2.

Table 2 Summary of optimised parameters used in the implementation of models.

CI techniques	Optimised parameters
Functional networks	Fitting algorithm: least-square based backward-forward. (Ye & Xiong, 2007)Best model selection principle: minimum length description (Grünwald, 2005)Degree of polynomial: 1Source of code: Software repository of the AI Research Group of Enrique Castillo (Castillo, 2012).
SVMs	Kernel function: polynomialError goal, ϵ: 0.001Regularisation parameter, C = 450lambda = 1e − 7; epsilon = 0.2kerneloption = 0.30verbose = 1.Source of code: Least squares-SVM, basic version available online,
Type-2 fuzzy logic	Learning algorithm: Steepest Descent (Jang, 1993)Rule extraction: all data samplessn2 = sn1 = standard deviation of the input dataLearning parameter, alpha4 = alpha = 0.1Source of code: The interval Type-2 version and the MATLAB codes provided at Mendel's software repository()

For validity of comparison, the same training and testing data subsets were used for the training and testing of the three techniques, hence ensuring that the techniques were subjected to the same data and processing conditions.

4.3 Criteria for performance evaluation

In order to establish a valid evaluation for this work, we have used the correlation coefficient (CC) and execution time (ET) as the criteria for measuring the performance. The CC measures the statistical correlation between the predicted and actual values. A value of "1" means perfect statistical correlation and a "0" means there is no correlation at all. The ET is simply the total CPU time taken (measured in seconds) for a model to run from the beginning to the end of the desired process, and it is computed as: T2 − T1, where T2 is the CPU time at the end of the process and T1 is the CPU time at the beginning.

As some of the results of this study relate to the size of data, the scale proposed by Anifowose and Abdulraheem ([6]) was used to categorise the data sets into small and large. According to this scale, data sets that consist of less than 150 samples are considered as small, more than 250 samples are considered as large and between 150 and 250 are categorised as medium-sized.

The computing environment used for this simulation study consists of MATLAB version 2010a that runs on a Personal Computer with Windows 7 Professional version with Service Park 1. The processor is based on Intel Pentium Duo technology with a speed of 2.0 GHz and a RAM size of 2 GB. The basic MATLAB codes cited in Table 2 were customised with the NETLAB toolbox (Neural Computing Research Group, [31]) and augmented with other user-defined functions for increased functionality and better performance.

4.4 Implementation strategy

The codes for the identified techniques were gathered and arranged in a single MATLAB m-file. This was done to ensure that the three techniques operated under the same data and processing conditions. With this implementation style, the same data stratification was used on all the three techniques, and the evaluation criteria were used to measure their performance simultaneously. Thus, for each data set, the CC and ET were measured. The run of the three techniques on each data set was repeated a number of times using a loop and the average of the CC and ET measurements were recorded. This ensured integrity of the results, fairness in the comparison and to arrive at a just conclusion and recommendation.

As the data stratification was random, each run of the techniques on each data set used different bootstrap samples of the data. This further ensured fairness and increased the integrity of the results obtained. This is in contrast to some petroleum engineering application studies that used a fixed stratification where a part of each data set, usually the first 70%, was used for training and last 30% for testing. These would result in skewed results that are lacking in integrity. With our stratification approach, each data sample has equal chance of being selected for training or testing and there is no occurrence of bias.

5. Experimental results and discussion

5.1 Experimental results

After the implementation of each of the techniques in the prediction of porosity and permeability using the training and testing data described in Section 4.1, several iterations of the implementation were made, and the averages of the results, which comprise the CCs and ET were taken. This step is necessary in order to ensure fairness in the distribution of the training and testing data sets.

As this study focuses on the comparative performance of the techniques, the comparative results that are relevant to the objective of this study were plotted and shown in Figures 12345. To further provide an objective and clear evaluation of the merit of the comparison, the actual and predicted values of porosity and permeability with respect to depth are shown in Figures 67891011. The permeability plots (Figures 91011) would be cluttered if we had plotted them in the normal scale as the permeability values ranged from 0 to about 3000. To make them clearer and less decongested, more especially the testing permeability predictions, we plotted them in the log scale. Their rigorous analysis and discussion revealed more understanding about their behaviours and respective areas of strength and weakness.

Graph: Figure 1 CCs comparisons for porosity training and testing.

Graph: Figure 2 ETs comparisons for porosity Well 1 and 2 training testing.

Graph: Figure 3 ET comparison for Well 3 porosity training and testing.

Graph: Figure 4 CCs comparisons for permeability training and testing.

Graph: Figure 5 ET comparisons for permeability training and testing.

Graph: Figure 6 Training and testing predictions by depth for all techniques (Site 1 Well 1).

Graph: Figure 7 Training and testing predictions by depth for all techniques (Site 1 Well 2).

Graph: Figure 8 Training and testing predictions by depth for all techniques (Site 1 Well 3).

Graph: Figure 9 Training and testing predictions by depth for all techniques (Site 2 Well 1).

Graph: Figure 10 Training and testing predictions by depth for all techniques (Site 2 Well 2).

Graph: Figure 11 Training and testing predictions by depth for all techniques (Site 2 Well 3).

5.2 Discussion of results

The results showed that the each of the techniques exhibited its competitive performance except at instances where they otherwise revealed their respective comparative strong and weak points. This study is a case of very stiff competition among popularly used AI techniques and the most recent advances in CI.

Generally, SVM and FN demonstrated competitive performance, while T2FLS also showed a good measure-up but with lower performance rating. In terms of CC, SVM performed exceptionally well in both porosity training and testing for Site 1 Well 3 (Figure 1(a),(b) and permeability testing for Site 2 Well 2 (Figure 4(b)). FN outperformed SVM in porosity testing for Site 1 Well 1 (Figure 1(b)), permeability training for Site 2 Well 2 (Figure 4(a)) and permeability testing for Site 2 Well 3 (Figure 4(b)). T2FLS took its turn for the best performance in permeability training and testing for Site 2 Well 1 (Figure 4(a),(b)). A very stiff competition was exhibited between SVM and FN with porosity training and testing for Site 1 Well 2 (Figure 1(a),(b)), and permeability training and testing for Site 2 Well 1 (Figure 4(a),(b)). A very close tie of the three techniques was observed in the porosity training only for Site 1 Well 1 and Well 2 (Figure 1(a)) and in the permeability training and testing for Site 2 Well 2 (Figure 4(a),(b)).

The outperformance of SVM in Site 1 Well 3 (Figure 1(a),(b)) is a special characteristic demonstrated by SVM's ability to withstand a small data size while T2FLS demonstrated otherwise. This observation agrees with the results of the previous studies in the literature (Anifowose, [5]; Al-Anazi & Gates, [3]). This also agrees with the major weakness of T2FLS demonstrated by its inability to handle small amount of data set as reported in Karnik and Mendel ([26]), Mendel ([29]) and Zarandi et al. ([47]). The average performance of both SVM and FN was better than that of T2FLS. This is probably due to the fact that there is no fuzziness or uncertainty in the data sets used in this study. T2FLS has been reported to be more suitable in the handling of uncertainties in input data (Mendel, [29]; Mohsen et al., [30]; Olatunji et al., [33], [34]). However, that is not included in the objective of this study. If the data sets were to have contained some uncertainties, then based on reports in the literature (Mendel, [29]; Mohsen et al., [30]; Olatunji et al., [33], [34]), T2FLS would have performed excellently well. Both FN and SVM have not been reported to perform well in handling uncertainties.

In terms of ET, the results showed that FN is the fastest in terms of both training and testing, followed by SVM. T2FLS took the most time for both training and testing due to its algorithmic complexity (Karnik & Mendel, [26]; Mendel, [29]). This could be attributed to the fact that the algorithms of SVM and FN are based on the least square error reduction while that of T2FLS is based on steepest gradient learning algorithm which is an exhaustive search procedure that needs more time for effective convergence.

From the actual and predicted values comparison shown in Figures 67891011, the superior performance of the SVM technique could be noticed in most of them by the best closeness of its predicted values (green line) to the actual ones (black line) in addition to the competitiveness of the other techniques. The excellent performance of the SVM technique is followed by that of FN and then T2FLS. The exceptional ability of SVM to generalise with small data set is clearly noticed with the Site 1 Well 3 data set (Figure 8(b)) where there was a very close match between the SVM-predicted values and the actual ones.

For the permeability prediction comparisons, though the correlations are generally lower but SVM is still clearly shown to have the best closeness to the actual permeability values than the other techniques. This shows that, in the overall, it would be most preferred to use SVM for most petroleum reservoir characterisation problems. Of course, T2FLS would be most preferred when dealing with data containing uncertainties.

6. Conclusion

The performance of the three popular CI techniques, namely T2FLS, SVMs and FNs, has been rigorously compared using the prediction of porosity and permeability of oil and gas reservoirs as a case study. The parameters of the techniques were optimised using the training data subset, and their generalisation capability was validated using the testing subset based on six well log data sets containing three each of porosity and permeability data sets. The major objective was to perform a rigorous comparative analysis on their respective performance indices in order to arrive at a recommendable outcome.

The results of the study can be summarised as follows:

Basically, the techniques performed competitively and only exhibited minor differences in the extreme case of small data size. In the latter scenario, SVM and FN had higher CCs than T2FLS.

The smallness of data was determined using the data size categorisation scale proposed by Anifowose and Abdulraheem ([6]).

In the overall, SVM and FN were found to be light-weight in terms of speed of execution, while T2FLS can be described as heavy-weight due to its structural and computational complexity.

The choice of each technique would depend on the nature of the problem, the size of data and the processing environment.

SVM is desirable when the data size is small, FN is recommended when low processing power is required, while T2FLS is ideal when the data set contains uncertainties.

In our future work, we intend to carry out a similar study to determine the respective capabilities of these techniques in the identification of lithological properties of oil and gas reservoirs as well as for history matching.

Acknowledgements

The authors would like to acknowledge the support of King Fahd University of Petroleum and Minerals for the facilities used in the conduct of this study.

References 1 Abe , S. (2004, April 28–30). Fuzzy LP-SVMs for multiclass problems. Electronic Proceedings of the 12th European symposium on artificial neural networks (ESANN'2004) (pp. 429 – 434), Bruges, Belgium. Retrieved from https://www.elen.ucl.ac.be/esann/proceedings/papers.php?ann=2004. 2 Al-Anazi , A. F. , & Gates , I. D. (2010). Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study. Computers and Geosciences , 36 , 1494 – 1503. 3 Al-Anazi , A. F. , & Gates , I. D. (2012). Support vector regression to predict porosity and permeability: Effect of sample size. Computers and Geosciences , 39 , 64 – 76. 4 Ali , J. K. (1994). Neural networks: A new tool for the petroleum industry. Proceedings of the European petroleum computer conference (pp. 217 – 231). Aberdeen, UK: Society of Petroleum Engineers. doi:10.2118/27561-MS. 5 Anifowose , F. A. (2009). Hybrid AI models for the characterization of oil and gas reservoirs: Concept, design and implementation. Berlin : VDM Verlag. 6 Anifowose , F. A. , & Abdulraheem , A. (2010, December). How small is a small data? Proceedings of the 2nd Saudi conference on oil & gas exploration & production (pp. 18 – 20), Dhahran, Saudi Arabia. Retrieved from http://www.docstoc.com/docs/124372315/2010-Oil-and-Natural-Gas-Conference. 7 Anifowose , F. A. , & Abdulraheem , A. (2011). Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization. Journal of Natural Gas Science and Engineering , 3 , 505 – 517. 8 Bruen , M. , & Yang , J. (2005). Functional networks in real-time flood forecasting – A novel application. Advances in Water Resources , 28 , 899 – 909. 9 Castillo , E. (1998). Functional networks. Neural Processing Letters , 7 , 151 – 159. Castillo , E. , Gutierrez , J. M. , Hadi , A. S. , & Lacruz , B. (2001). Some applications of functional networks in statistics and engineering. Technometrics , 43 , 10 – 24. Castillo , E. , Hadi , A. S. , & Lacruz , B. (2001). Optimal transformations in multiple linear regression using functional networks. Proceedings of the international work-conference on artificial and natural neural networks (Vol. 2084, pp. 316 – 324). Lecture Notes in Computer Science. Castillo, E. (2012). Software repository. Artificial Intelligence Research Group. Retrieved from http://ccaix3.unican.es/∼AIGroup. Chen , X. , Li , Y. , Harrison , R. , & Zhang , Y. Q. (2007). Type-2 fuzzy logic based classifier fusion for support vector machines. Applied Soft Computing Journal. 10.1016/j.asoc.2007.02.019. Cristianini , N. , & Shawe-Taylor , J. (2000). An introduction to support vector machines and other kernel-based learning methods (1st ed.). London : Cambridge University Press. Diao , L. , Yang , C. , & Wang , H. (2012). Training SVM email classifiers using very large imbalanced dataset. Journal of Experimental & Theoretical Artificial Intelligence , 24 , 193 – 210. Duda , R. O. , Hart , P. T. , & Stock , D. G. (2001). Pattern classification (2nd ed.). New York : Wiley. El-Sebakhy , E. A. (2009). Forecasting PVT properties of crude oil systems based on support vector machines modeling scheme. Journal of Petroleum Science and Engineering , 64 , 25 – 34. El-Sebakhy , E. A. (2011). Functional networks as a novel data mining paradigm in forecasting software development efforts. Expert Systems with Applications , 38 , 2187 – 2194. Fang , J. H. , & Chen , H. C. (1997). Fuzzy modeling and the prediction of porosity and permeability from the compositional and textural attributes of sandstone. Journal of Petroleum Geology , 20 , 185 – 204. Grünwald , P. (2005). A tutorial introduction to the minimum description length principle. In P. Grünwald , I. J. Myung , & M. Pitt (Eds.), Advances in minimum description length: Theory and applications (pp. 5 – 71). Cambridge, MA : MIT Press. Helmy , T. , & Anifowose , F. (2010). Hybrid computational intelligence models for porosity and permeability prediction of petroleum reservoirs. International Journal of Computational Intelligence and Applications , 9 , 313 – 337. Helmy , T. , Anifowose , F. , & Faisal , K. (2010). Hybrid computational models for the characterization of oil and gas reservoirs. International Journal of Expert Systems with Applications , 37 , 5353 – 5363. Jang , J. S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics , 23 , 665 – 685. Jian , H. , & Wenfen , H. (2006). Novel approach to predict potentiality of enhanced oil recovery, SPE Paper 99261. Proceedings of the society of petroleum engineers intelligent energy conference and exhibition, Amsterdam, The Netherlands. Jong-Se , L. (2005). Reservoir properties determination using fuzzy logic and neural networks from well data in offshore Korea. Journal of Petroleum Science and Engineering , 49 , 182 – 192. Karnik , N. , & Mendel , J. (1999). Type-2 fuzzy logic systems. IEEE Transactions on Fuzzy Systems , 7 , 643 – 658. Maqsood , A. , & Adwait , C. (2000). Using artificial intelligence to predict permeability from petrographic data. Computers and Geosciences , 26 , 915 – 925. McCullagh , P. , & Nelder , A. J. A. (1999). Generalized linear models (2nd ed.). Washington, DC : Chapman & Hall/CRC. Mendel , J. M. (2003). Type-2 fuzzy sets: Some questions and answers. IEEE Connections, Newsletter of the IEEE Neural Networks Society , 1 , 10 – 13. Mohsen , S. , Morteza , A. , & Alli , Y. V. (2007). Design of neural networks using genetic algorithm for the permeability estimation of the reservoir. Journal of Petroleum Science and Engineering , 59 , 97 – 105. Neural Computing Research Group, Information Engineering, Aston University, Birmingham B4 7ET, United Kingdom. (2012). Retrieved from http://www.ncrg.aston.ac.uk/netlab. Nguyen , N. C. (2006, October 3). A note on Tikhonov regularization of linear ill-posed problems (Special Report). Cambridge, MA : Massachusetts Institute of Technology. Olatunji , S. O. , Selamat , A. , & Abdulraheem , A. (2011a). Modeling the permeability of carbonate reservoir using Type-2 fuzzy logic systems. Computers in Industry , 62 , 147 – 163. Olatunji , S. O. , Selamat , A. , & Abdulraheem , A. (2011b). Predicting correlations properties of crude oil systems using Type-2 fuzzy logic systems. Expert Systems with Applications , 38 , 10911 – 10922. Peng , X. , & Wang , Y. (2009). A normal least squares support vector machine (NLS-SVM) and its learning algorithm. Neurocomputing , 72 , 3734 – 3741. Petrus , J. B. , Thuijsman , F. , & Weijters , A. J. (1995). Artificial neural networks: An introduction to ANN theory and practice. Berlin : Springer. Rusu , C. , & Rusu , V. (2006). Artificial intelligence in theory and practice. In M. Bramer (Ed.), International federation for information processing (pp. 119 – 128). Boston : Springer. Schlumberger. (2007a). Excellence in educational development. Science lab project. Retrieved from www.seed.slb.com/en/scictr/lab/permeability/index.htm. Schlumberger. (2007b). Excellence in educational development. Science lab project. Retrieved from www.seed.slb.com/en/scictr/lab/porosity/index.htm. Sewell, M. (2008). Structural risk minimization (unpublished doctoral dissertation). London: University College London, Department of Computer Science. Taboada , J. , Matias , J. M. , Ordonez , C. , & Garcia , P. J. (2007). Creating a quality map of a slate deposit using support vector machines. Journal of Computational and Applied Mathematics , 204 , 84 – 94. Toshinori , M. (2008). Fundamentals of the new artificial intelligence neural, evolutionary, fuzzy and more (2nd ed.). London : Springer Science-in-Business Media. Vapnik , V. N. (2000). The nature of statistical learning theory (2nd ed.). New York : Springer-Verlag. Vapnik , V. N. , & Chevronenkis , A. Y. (1974). Teoriya raspoznavaniya obrazov: statisticheskie problemy obucheniya. [Theory of pattern recognition: Statistical problems of learning]. Moscow : Nauka. (Russian). Wu , Y. , & Krishnan , S. (2011). Combining least-squares support vector machines for classification of biomedical signals: A case study with knee-joint vibroarthrographic signals. Journal of Experimental & Theoretical Artificial Intelligence , 23 , 63 – 77. Ye , J. , & Xiong , T. (2007). SVM versus least squares SVM. Proceedings of the eleventh international conference on artificial intelligence and statistics (AISTATS-07). Journal of Machine Learning Research – Proceedings Track , 2 , 644 – 651. Zerandi , H. , Rezaee , B. , Turksen , I. B. , & Neshat , E. (2007). A Type-2 fuzzy rule-based expert system model for stock price analysis. Expert Systems with Applications. 10.1016/j.eswa.2007.09.034.

By Fatai Anifowose; Suli Adeniye and Abdulazeez Abdulraheem

Reported by Author; Author; Author

Titel:	Recent advances in the application of computational intelligence techniques in oil and gas reservoir characterisation: a comparative study
Autor/in / Beteiligte Person:	Adeniye, Suli ; Anifowose, Fatai ; Abdulraheem, Abdulazeez
Link:	Volltext (PDF) https://explore.openaire.eu/search/publication?articleId=doi_________::ff2c74de83098d4a8d21fda2b49be7f4 https://doi.org/10.1080/0952813x.2014.924577
Zeitschrift:	Journal of Experimental & Theoretical Artificial Intelligence, Jg. 26 (2014-06-17), S. 551-570
Veröffentlichung:	Informa UK Limited, 2014
Medientyp:	unknown
ISSN:	1362-3079 (print) ; 0952-813X (print)
DOI:	10.1080/0952813x.2014.924577
Schlagwort:	Fuzzy logic system Correlation coefficient business.industry Computer science Computational intelligence computer.software_genre Machine learning Petroleum reservoir Theoretical Computer Science Functional networks Support vector machine Permeability (earth sciences) Small data sets Artificial Intelligence Data mining Artificial intelligence business computer Software
Sonstiges:	Nachgewiesen in: OpenAIRE

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.