A comparative study of the predictive capabilities of recent advances in computational intelligence (CI) is presented. This study utilised the machine learning paradigm to evaluate the CI techniques while applying them to the prediction of porosity and permeability of heterogeneous petroleum reservoirs using six diverse well data sets. Porosity and permeability are the major petroleum reservoir properties that serve as indicators of reservoir quality and quantity. The results showed that the performance of support vector machines (SVM) and functional networks (FN) is competitively better than that of Type-2 fuzzy logic system (T2FLS) in terms of correlation coefficient. With execution time, FN and SVM were faster than T2FLS, which took the most time for both training and testing. The results also demonstrated the capability of SVM to handle small data sets. This work will assist artificial intelligence practitioners to determine the most appropriate technique to use especially in conditions of limited amount of data and low processing power.
Keywords: machine learning; computational intelligence; support vector machines; functional networks; Type-2 fuzzy logic system; petroleum reservoir characterisation
Support vector machines (SVM), functional networks (FN) and Type-2 fuzzy logic system (T2FLS) are the three popular computational intelligence (CI) techniques that have individually featured in many successful applications published in recent times (Fang & Chen, [
A petroleum reservoir is an important component of a complete petroleum system that is composed of a body of rock, commonly sedimentary. The rock is buried beneath the earth surface with sufficient porosity and permeability to hold and transmit fluids such as oil, gas and water. Porosity and permeability are the two fundamental reservoir properties which complimentarily relate to the amount of fluid contained in a reservoir and its ability to flow. These properties make significant impacts on petroleum field operations and reservoir management (Jong-Se, [
Porosity is the percentage of voids and open spaces (called pores) in a rock or sedimentary deposit, while permeability is the ease with which the rock fluids are transmitted through the pore spaces (Schlumberger, [
There are several other properties of oil and gas reservoirs that are necessary to be studied and predicted such as porosity, permeability, pressure-volume-temperature (PVT), water saturation, drive mechanism, structure and seal, well spacing, well-bore stability and lithofacies. However, porosity and permeability mutually serve as a major indicator of the potential of a reservoir as well as its viability for exploration and exploitation. It is necessary to be able to predict these two properties as accurately as possible in addition to just understanding them.
A good number of studies have been carried out on the use of various CI techniques to predict various properties of oil and gas reservoirs. SVM, FN and T2FLS were chosen for this study due to their promising performance in their respective areas of applications (Al-Anazi & Gates, [
The major motivation for this study is the need for rigorous and robust comparative studies of various CI techniques in order for researchers to apply and extend their experience on the application of these techniques to different fields using real-life data sets. A deep understanding of these techniques and their behaviour under different data and operating conditions will assist researchers to determine which of them to use in their various and respective operating scenarios. This process is very much necessary especially in areas where there is scarcity of real-life data such as in petroleum engineering. As availability of relevant data, in quantity and quality, is a major limitation in petroleum engineering research, the results of this study will assist CI practitioners to determine the appropriate techniques to use in conditions of scarcity of good quality data in the industry and low processing power in the academics.
To the best of our knowledge, there is no previous study that has rigorously compared these recently used techniques in order to reveal their respective performance in different operational and data scenarios. The main objective of using CI in oil and gas reservoir characterisation is to use the data sets of known wells or formations to train models in order to predict the desired properties of new wells or formations whose values were previously unknown. The predicted properties will, in turn, be used to populate full-field simulation models for better reservoir exploration, production and management.
The main problem that this study attempts to solve is the focus of most petroleum engineering applications of CI on only the traditional ANN. Despite the several published reports such as Petrus, Thuijsman, and Weijters ([
- To review the literature on the three selected state-of-the-art CI techniques.
- To apply them on one of the real-life and most challenging problems in the petroleum industry.
- To carry out a robust and rigorous comparison of the performance of these techniques.
- Based on the results of the comparison, to recommend the most appropriate techniques for different data and processing conditions.
The rest of this paper is organised as follows: Section 2 presents an overview of SVM, FN and T2FLS. A detailed survey of literature on the application of CI in the oil and gas industry is presented in Section 3. Section 4 describes the data and tools, along with the detailed design methodology and implementation strategy. Results and their detailed discussions are presented in Section 5, while conclusion and a plan for future work are presented in Section 6.
SVMs are a set of related supervised learning methods used for classification and regression. They belong to a family of generalised linear classifiers (GLCs). GLCs are classifiers that are implemented by treating the input data as a nonlinear transformation prior to feeding it to a linear classifier. The transformation may result into an increase or decrease in the number of input features. The classifier is essentially thresholding a basis function regression. It can have arbitrarily shaped decision boundaries (Duda, Hart, & Stock, [
SVM can also be considered as a special case of Tikhonov Regularisation as it maps input vectors to a higher dimensional space where a maximal separating hyperplane is constructed (Jian & Wenfen, [
(
Graph
where A is a linear compact injective operator between Hilbert spaces U and F with the solution μ and data f belonging to U and F.
Instead of the exact data f, suppose a noisy data fδ is available such that:
(
Graph
Equation (
(
Graph
where the regularisation parameter α is found such that:
(
Graph
The generalisation ability of SVMs is ensured by the special properties of the optimal hyperplane that maximises the distance to training examples in a high dimensional feature space. SVMs were initially introduced for the purpose of classification until 1995 when Vapnik et al., as reported in Fang and Chen ([
Structural risk minimisation, first introduced by Vapnik and Chevronenkis (VC) ([
- Use a priori knowledge of the domain.
- Choose a class of functions such as a degree of polynomial.
- Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity.
- Perform empirical risk minimization on each subset.
- Select the model in the series whose sum of empirical risk and VC confidence is minimal.
The notion of the VC-dimension applies to any parameterised class F of functions x → f (θ, x) from some domain D into {0, 1}, where θ ranges over some given parameter space, for example R
For the implementation of this study, the polynomial kernel function used is of the form:
(
Graph
where param is the kernel parameter in the SVM feature space (Cristianini & Shawe-Taylor, [
Further details on SVM can be found in Abe ([
Functional network is an extension of ANN. Like ANN, it consists of different layers of neurons connected by links but unlike ANN, each computing unit or neuron performs a simple calculation: a scalar typically monotone function f of a weighted sum of inputs. The function f, associated with the neurons, is fixed and the weights are learned from data using some well-known algorithms. A FN consists of: a layer of input units containing the input data, a layer of output units containing the output data, one or several layers of neurons or computing units which evaluate a set of input values coming from the previous layer, and which gives a set of output values to the next layer of neurons or output units. The computing units are connected to each other, in the sense that the output from one unit can serve as part of the input to another neuron or to the units in the output layer. Once the input values are given, the output is determined by the neuron type, which can be defined by a function (Castillo, [
A commonly used functional equation is the generalised associative functional equations, which can be expressed as (Castillo, Gutierrez, et al., [
(
Graph
If the model has a d-dimensional input vector x and a one-dimensional output variable y, a generalised associative FN can be written as:
(
Graph
where f
More detailed description of FN along with the functional equations derivations and simplification can be found in Castillo, Gutierrez, et al. ([
FN evolved due to the limitations of ANN (Petrus et al., [
- The determination of the number of hidden layers and hidden neurons of the network architecture are usually determined by trial and error.
- A large number of data samples are usually required to fit a good network structure.
- ANN is usually trapped in the local optima. This results in its instability when executed several times over the same data and operating conditions.
- Despite the wide range of applications of ANN, there is still no general framework or procedure through which the appropriate network for a specific task can be designed.
- FN was introduced to address some of the limitations of ANN as stated above.
Some of the features that have made FN more desirable than ANN are (Mohsen et al., [
- FN does not require a large number of data to fit a good network structure.
- Number of hidden layers and weights are automatically learned directly from data. Hence, there is no need for tuning by trial and error like in ANN.
- As functional equations are used in FN rather than backpropagation of errors, it does not suffer from the problem of local optima.
Fuzzy Sets (FSs) have been around for nearly 40 years, and they include Type-1 FS (fuzzy) and Type-2 FS (fuzzy fuzzy). Type-2 FS was introduced by Zadeh, as reported in Maqsood and Adwait ([
The major difference between T2FLS and Type-1 FLS is in the out-processing block which contains the defuzzifier. The defuzzifier steps down the Type-2 intermediate output to the Type-1 final output. Like in Type-1 FLS, the fuzzifier maps the crisp input into a FS. This FS can be a Type-2 set. The other distinction between Type-1 and Type-2 is associated with the nature of MFs, which is not essential when constructing rules. Hence, the structure of rules in Type-2 remains the same as in Type-1. The inference process in Type-2 FLS combines rules and gives a mapping from input Type-2 FS to output Type-2 FS. To do this, one needs to find the unions and intersections of Type-2 sets, as well as compositions of Type-2 relations. Extended versions (called Zadeh's extension) can be used to give a Type-1 FS. As this extension process takes one from Type-2 output sets of the FLS to Type-1 sets, this operation is called Type Reduction which is used to produce a type-reduced set. To obtain a crisp output from Type-2 FLS, the type-reduced set needs to be defuzzified. The most natural way of doing this is to find the centroid of the type-reduced set (Karnik & Mendel, [
More details on T2FLS can be found in Karnik and Mendel ([
The application of CI techniques has been widely studied in the oil and gas industry as well as in other fields. Some of the areas of petroleum technology in which CI has been used with success include seismic pattern recognition, porosity and permeability predictions, identification of sandstone lithofacies, drill bit diagnosis, and analysis and improvement of oil and gas well production (Ali, [
The application of CI in petroleum reservoir characterisation was pioneered by Ali ([
Jian and Wenfen ([
Al-Anazi and Gates ([
Though the work of Anifowose and Abdulraheem ([
This study is the first, to the best of our knowledge, which presents a rigorous and robust comparison of SVM, T2FLS and FN: three techniques that have proven to be popularly used in various CI applications especially in oil and gas reservoir characterisation.
Well logs for porosity and permeability from six oil and gas wells were used for the design, training and validation of this work. The three well logs for porosity were petrographic measurements obtained from a drilling site in the Northern Marion Platform of North America (Site 1), and the other three for permeability were log measurements obtained from a giant reservoir in the Middle East (Site 2). The data sets from Site 1 have five predictor variables for porosity viz. top interval, grain density, grain volume, length and diameter, while the data sets from Site 2 have eight predictor variables for permeability viz. gamma ray log, porosity log, density log, water saturation, deep resistivity, micro-spherically focused log, neutron porosity log and caliper log.
A well log is a report that describes the chemical, physical, nuclear and electrical composition of the geological formation of an oil and gas well. Well logging is, therefore, the technique of taking measurements in drill holes with probes designed to measure the physical and chemical properties of rocks and the fluids contained in them. Much information can be obtained from samples of rock brought to the surface in cores or bit cuttings, or from other clues while drilling, such as penetration rate, however, the greatest amount of information comes from well logs.
The available data for each well were divided into training and testing subsets by using a stratified sampling technique. With this technique, a random 70% of the entire samples were used for training, and the remaining 30% for testing. In essence, the training subsets effectively represent the section of the wells or reservoirs that have complete log and core data samples, while the testing subsets represent uncored sections of the wells or reservoirs. The major challenge is to use the CI techniques to predict the core values (porosity and permeability in the case of this study) for the uncored sections. To further ensure fairness and integrity of the results obtained, several iterations were made, and the average of the runs was obtained. Table 1 shows the two well logs with their sizes and divisions.
Table 1 Division of data sets into training and testing.
Site 1 (porosity) Site 2 (permeability) Wells 1 2 3 1 2 3 Data size 415 285 23 355 477 387 Training (70%) 291 200 16 249 334 271 Testing (30%) 124 85 7 106 143 116
The methodology in this work is based on the standard CI approach. The individual models were designed and implemented with their respective optimal tuning parameters.
We implemented the FN technique using the associative rule, the relationship between the input data and the output target, denoted by {x
(
Graph
where z is the output of the FN model. This is equivalent to:
(
Graph
where f
Using the standard form (φ
(
Graph
and
(
Graph
where the coefficients a
The main objective was to find the set of values z that would give the minimum error.
For the implementation of the SVM model, the objective was to learn a mapping x → y where x is a set of input variables {x
(
Graph
where α are the parameters of the function.
The goal is to minimise the training error given by:
(
Graph
where l is the 0–1 loss function, and R
The overall risk of the testing error given by:
(
Graph
where P(x,y) is the unknown joint distribution function of x and y (Peng & Wang, [
We implemented the T2FLS model by using the Zadeh's extension principle (Mendel, [
For a set of p inputs, x
(
Graph
For l = 1,..., m.
The firing strength of the ith rule is expressed as:
(
Graph
The output of the T2FLS model with the extension principle is expressed as:
(
Graph
where M is the number of fired rules which represents the chosen t-norm.
Finally, the defuzzified output is given by:
(
Graph
More details about the structure and the mathematical basis of T2FLS as well as its proofs can be found in Mendel ([
The optimised parameters of the three techniques are summarised in Table 2.
Table 2 Summary of optimised parameters used in the implementation of models.
CI techniques Optimised parameters Functional networks Fitting algorithm: least-square based backward-forward. (Ye & Xiong, SVMs Kernel function: polynomialError goal, ϵ: 0.001Regularisation parameter, Type-2 fuzzy logic Learning algorithm: Steepest Descent (Jang,
For validity of comparison, the same training and testing data subsets were used for the training and testing of the three techniques, hence ensuring that the techniques were subjected to the same data and processing conditions.
In order to establish a valid evaluation for this work, we have used the correlation coefficient (CC) and execution time (ET) as the criteria for measuring the performance. The CC measures the statistical correlation between the predicted and actual values. A value of "1" means perfect statistical correlation and a "0" means there is no correlation at all. The ET is simply the total CPU time taken (measured in seconds) for a model to run from the beginning to the end of the desired process, and it is computed as: T
As some of the results of this study relate to the size of data, the scale proposed by Anifowose and Abdulraheem ([
The computing environment used for this simulation study consists of MATLAB version 2010a that runs on a Personal Computer with Windows 7 Professional version with Service Park 1. The processor is based on Intel Pentium Duo technology with a speed of 2.0 GHz and a RAM size of 2 GB. The basic MATLAB codes cited in Table 2 were customised with the NETLAB toolbox (Neural Computing Research Group, [
The codes for the identified techniques were gathered and arranged in a single MATLAB m-file. This was done to ensure that the three techniques operated under the same data and processing conditions. With this implementation style, the same data stratification was used on all the three techniques, and the evaluation criteria were used to measure their performance simultaneously. Thus, for each data set, the CC and ET were measured. The run of the three techniques on each data set was repeated a number of times using a loop and the average of the CC and ET measurements were recorded. This ensured integrity of the results, fairness in the comparison and to arrive at a just conclusion and recommendation.
As the data stratification was random, each run of the techniques on each data set used different bootstrap samples of the data. This further ensured fairness and increased the integrity of the results obtained. This is in contrast to some petroleum engineering application studies that used a fixed stratification where a part of each data set, usually the first 70%, was used for training and last 30% for testing. These would result in skewed results that are lacking in integrity. With our stratification approach, each data sample has equal chance of being selected for training or testing and there is no occurrence of bias.
After the implementation of each of the techniques in the prediction of porosity and permeability using the training and testing data described in Section 4.1, several iterations of the implementation were made, and the averages of the results, which comprise the CCs and ET were taken. This step is necessary in order to ensure fairness in the distribution of the training and testing data sets.
As this study focuses on the comparative performance of the techniques, the comparative results that are relevant to the objective of this study were plotted and shown in Figures 12345. To further provide an objective and clear evaluation of the merit of the comparison, the actual and predicted values of porosity and permeability with respect to depth are shown in Figures 67891011. The permeability plots (Figures 91011) would be cluttered if we had plotted them in the normal scale as the permeability values ranged from 0 to about 3000. To make them clearer and less decongested, more especially the testing permeability predictions, we plotted them in the log scale. Their rigorous analysis and discussion revealed more understanding about their behaviours and respective areas of strength and weakness.
Graph: Figure 1 CCs comparisons for porosity training and testing.
Graph: Figure 2 ETs comparisons for porosity Well 1 and 2 training testing.
Graph: Figure 3 ET comparison for Well 3 porosity training and testing.
Graph: Figure 4 CCs comparisons for permeability training and testing.
Graph: Figure 5 ET comparisons for permeability training and testing.
Graph: Figure 6 Training and testing predictions by depth for all techniques (Site 1 Well 1).
Graph: Figure 7 Training and testing predictions by depth for all techniques (Site 1 Well 2).
Graph: Figure 8 Training and testing predictions by depth for all techniques (Site 1 Well 3).
Graph: Figure 9 Training and testing predictions by depth for all techniques (Site 2 Well 1).
Graph: Figure 10 Training and testing predictions by depth for all techniques (Site 2 Well 2).
Graph: Figure 11 Training and testing predictions by depth for all techniques (Site 2 Well 3).
The results showed that the each of the techniques exhibited its competitive performance except at instances where they otherwise revealed their respective comparative strong and weak points. This study is a case of very stiff competition among popularly used AI techniques and the most recent advances in CI.
Generally, SVM and FN demonstrated competitive performance, while T2FLS also showed a good measure-up but with lower performance rating. In terms of CC, SVM performed exceptionally well in both porosity training and testing for Site 1 Well 3 (Figure 1(a),(b) and permeability testing for Site 2 Well 2 (Figure 4(b)). FN outperformed SVM in porosity testing for Site 1 Well 1 (Figure 1(b)), permeability training for Site 2 Well 2 (Figure 4(a)) and permeability testing for Site 2 Well 3 (Figure 4(b)). T2FLS took its turn for the best performance in permeability training and testing for Site 2 Well 1 (Figure 4(a),(b)). A very stiff competition was exhibited between SVM and FN with porosity training and testing for Site 1 Well 2 (Figure 1(a),(b)), and permeability training and testing for Site 2 Well 1 (Figure 4(a),(b)). A very close tie of the three techniques was observed in the porosity training only for Site 1 Well 1 and Well 2 (Figure 1(a)) and in the permeability training and testing for Site 2 Well 2 (Figure 4(a),(b)).
The outperformance of SVM in Site 1 Well 3 (Figure 1(a),(b)) is a special characteristic demonstrated by SVM's ability to withstand a small data size while T2FLS demonstrated otherwise. This observation agrees with the results of the previous studies in the literature (Anifowose, [
In terms of ET, the results showed that FN is the fastest in terms of both training and testing, followed by SVM. T2FLS took the most time for both training and testing due to its algorithmic complexity (Karnik & Mendel, [
From the actual and predicted values comparison shown in Figures 67891011, the superior performance of the SVM technique could be noticed in most of them by the best closeness of its predicted values (green line) to the actual ones (black line) in addition to the competitiveness of the other techniques. The excellent performance of the SVM technique is followed by that of FN and then T2FLS. The exceptional ability of SVM to generalise with small data set is clearly noticed with the Site 1 Well 3 data set (Figure 8(b)) where there was a very close match between the SVM-predicted values and the actual ones.
For the permeability prediction comparisons, though the correlations are generally lower but SVM is still clearly shown to have the best closeness to the actual permeability values than the other techniques. This shows that, in the overall, it would be most preferred to use SVM for most petroleum reservoir characterisation problems. Of course, T2FLS would be most preferred when dealing with data containing uncertainties.
The performance of the three popular CI techniques, namely T2FLS, SVMs and FNs, has been rigorously compared using the prediction of porosity and permeability of oil and gas reservoirs as a case study. The parameters of the techniques were optimised using the training data subset, and their generalisation capability was validated using the testing subset based on six well log data sets containing three each of porosity and permeability data sets. The major objective was to perform a rigorous comparative analysis on their respective performance indices in order to arrive at a recommendable outcome.
The results of the study can be summarised as follows:
- Basically, the techniques performed competitively and only exhibited minor differences in the extreme case of small data size. In the latter scenario, SVM and FN had higher CCs than T2FLS.
- The smallness of data was determined using the data size categorisation scale proposed by Anifowose and Abdulraheem ([
6 ]). - In the overall, SVM and FN were found to be light-weight in terms of speed of execution, while T2FLS can be described as heavy-weight due to its structural and computational complexity.
- The choice of each technique would depend on the nature of the problem, the size of data and the processing environment.
- SVM is desirable when the data size is small, FN is recommended when low processing power is required, while T2FLS is ideal when the data set contains uncertainties.
In our future work, we intend to carry out a similar study to determine the respective capabilities of these techniques in the identification of lithological properties of oil and gas reservoirs as well as for history matching.
The authors would like to acknowledge the support of King Fahd University of Petroleum and Minerals for the facilities used in the conduct of this study.
By Fatai Anifowose; Suli Adeniye and Abdulazeez Abdulraheem
Reported by Author; Author; Author