Samuel Furfari, Professeur à l’Université libre de Bruxelles,
et Henri Masson, Professeur (émérite) à l’Université d’Antwerpen
Is it the increase of temperature during the period 1980-2000 that has triggered the strong interest for the climate change issue? But actually, about which temperatures are we talking, and how reliable are the corresponding data?
1/ Measurement errors
Temperatures have been recorded with thermometers for a maximum of about 250 years, and by electronic sensors or satellites, since a few decades. For older data, one relies on “proxies” (tree rings, stomata, or other geological evidence requiring time and amplitude calibration, historical chronicles, almanacs, etc.). Each method has some experimental error, 0.1°C for a thermometer, much more for proxies. Switching from one method to another (for example from thermometer to electronic sensor or from electronic sensor to satellite data) requires some calibration and adjustment of the data, not always perfectly documented in the records. Also, as shown further in this paper, the length of the measurement window is of paramount importance for drawing conclusions on a possible trend observed in climate data. Some compromise is required between the accuracy of the data and their representativity.
2/ Time averaging errors
If one considers only “reliable” measurements made using thermometers, one needs to define daily, weekly, monthly, annually averaged temperatures. But before using electronic sensors, allowing quite continuous recording of the data, these measurements were made punctually, by hand, a few times a day. The daily averaging algorithm used changes from country to country and over time, in a way not perfectly documented in the data; which induces some errors (Limburg, 2014) . Also, the temperature follows seasonal cycles, linked to the solar activity and the local exposition to it (angle of incidence of the solar radiations) which means that when averaging monthly data, one compares temperatures (from the beginning and the end of the month) corresponding to different points on the seasonal cycle. Finally, as any experimental gardener knows, the cycles of the Moon have also some detectable effect on the temperature (a 14 days cycle is apparent in local temperature data, corresponding to the harmonic 2 of the Moon month, Frank, 2010); there are circa 13 moon cycle of 28 days in one solar year of 365 days, but the solar year is divided in 12 months, which induces some biases and fake trends (Masson, 2018).
3/ Spatial averaging
First of all, IPCC is considering global temperatures averaged over the Globe, despite the fact that temperature is an intensive variable, a category of variables having only a local thermodynamic meaning, and also despite the fact that it is well known that the Earth exhibits different well documented climatic zones.
The data used come from meteo stations records, and are supposed to be representative for a zone surrounding each of the meteo stations, and which is the locus of all points closer to that given station than to any other one (Voronoï algorithm) . As the stations are not spread evenly and as their number has changed considerably over time, “algorithmic errors” are associated to this spatial averaging method.
“In mathematics, a Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other. These regions are called Voronoi cells.
As the number of seed points changes, their corresponding cells change in shape, size and number (Figs. 1 and 2).
Fig.1 : Example of a Voronoi diagram and Fig. 2 : Same with a reduced number of seeds.
[constructed with http://alexbeutel.com/webgl/voronoi.html)
In climatology, the seed points are the meteo stations, and their number has been considerably reduced over time, which changes the number and size of the corresponding cells (see Figs. 3, 4 and 5).
Figs. 3 and 4 : Number of stations according to time (left) and location versus time (right).
Data of Figure 3 is from the GISS Surface Temperature Analysis (GISTEM v4) .
Source : https://data.giss.nasa.gov/gistemp/station_data_v4_globe/
Fig. 5 : Evolution of land stations and global temperature (Easterbrook 2016) . Beginning of 1990, thousand meteo stations located in cooler rural zones (e.g. Siberia, North Canada) stopped recording data (source :ftp://ftp.ncdc.noaa.gov/pub/data:ghcn/v2/v2.temperature.readme). Note that Fig. 3 is based on GHCNv4 (June 2019) and Fig. 5 on GHCNv2 (before 2011). All stations have been reanalyzed and that’s why there are differences between the two figures.
The average temperature (actually its anomaly; see below) is calculated by summing the individual data from the different stations and by giving to each point a weight proportional to its corresponding cell (weighted average).
As the sizes of the cells have changed over time, the weight of the seed points (the meteo stations) have also changed, which induces a bias in the calculation of the global average value.
Applying a Voronoï algorithm opens clearly the door to any kind of manipulation in zones where the data are sparse, as shown in Figure 6.
Figure 6. Source : https://www.ncei.noaa.gov/news/global-climate-201907
4/ Urban island effect
Also, many meteo stations were initially located on the countryside, but these locations became progressively urbanized, causing an “urban island” effect, increasing artificially the measured temperature (Figs 7 and 8).
Figs 7 and 8. Causes and illustration of ‘urban heat island effect’.
As a consequence, for land station data to be useful, it is essential that any non-climatic temperature jumps , such as the urban island effect, are eliminated. Such jumps may also be induced by moving the location of the stations or by updating the equipment. In the adjusted data form GISTEMPv4 . the effect of such non-climatic influences is eliminated whenever possible. Originally, only documented cases were adjusted, however the current procedure used by NOAA/NCEI applies an automated system based on systematic comparisons with neighboring stations to deal with documented and undocumented fluctuations that are not directly related to climate change. The protocols used and evaluation of these procedures are described in numerous publications — for instance [2, 3].
However, correcting an urban island effect by homogenizing the data coming from meteo stations located in the vicinity, and which remained away from urbanisation, induces some pervert effect leading to fake conclusions. Indeed, just as the data from the « urbanized » station are tempered by the data from the other stations, the data of those ones are also affected by the data coming from the urbanized station and become somewhat corrupted by the urban island effect. This mutual influence, the perverse effects of it, and the fake conclusions reached, are well illustrated, in a stepwise manner, by the case described in this video :
5/ Sea surface temperature
And what about the temperature over the oceans (representing about 70% of the Earth surface)? Until very recently, these temperatures have been only scarcely reported, as the data for SST (Sea Surface Temperature) came from vessels following a limited number of commercial routes (Fig. 9).
Fig. 9 : SST (Sea Surface temperature) measurements and (non-) representativity
More recently Argo floats (buoys) were spread over the oceans, allowing a more representative spatial coverage of SST.
6/ Temperature anomalies
Secondly, at a given moment, the temperature on Earth may vary by as much as 100°C (between spots located in polar or equatorial regions). To overcome this problem, IPCC is not referring to absolute temperature but to what is called “anomalies of temperature”. For that, they first calculate the average temperature over fixed reference periods of 30 years: 1931-60, 1961-1990. The next period will be 1991-2020. They then compare each annual temperature with the average temperature over the closest reference period. Presently and up to year 2021, the anomaly is the difference between the ongoing temperature and the average over the period 1961-1990.
This method is based on the implicit hypothesis that the “natural” temperature remains constant and that any trend detected is caused by anthropogenic activities. But even so, one may expect having to proceed to some adjustments, when switching from one reference period to another, a task interfering with the compensation of an eventual “urban island” effect or with a change in the number of meteo stations when switching from one reference period to the other, two effects we have identified as sources of error and bias.
But, actually, the key problem is that the temperature records undergo locally natural polycyclic, not exactly periodic and non-synchronized fluctuations . The fact that those fluctuations are not exactly periodic makes it mathematically impossible to “detrend” the data, by subtracting a sinusoid, as is commonly done, for example, when eliminating seasonal effects from data.
The length of these cycles range from one day to annual, decennial, centennial, millennial components and beyond up to tenths of thousands of years (the Milankovich cycles).
Of particular interest for our discussion are decennial cycles, as their presence has a triple consequence:
Firstly, as they cannot be correctly detrended, because they are a-periodic, they interfere with, and amplify an eventual anthropogenic effect that is tracked in the anomalies.
Secondly, they induce biases and fake anomalies in calculating the average temperature over the reference period, as shown in Figure 10 below (Masson, ).
Fig. 10. Anomalies and periodic signals of period comparable to the length of the reference period.
Comment on Fig. 10
The figure shows the problems associated to defining an anomaly when the signal exhibits a periodic signal of length comparable to the reference period used for calculating this anomaly. To make the case simple, a sinusoid is considered of period equal to 180 years (a common periodicity detected in climate related signals), thus 360° = 180 years, and 60° = 30 years (the length of the reference period used by IPCC for calculating anomalies). For our purpose, 3 reference periods of 60° (30 years) have been considered along the sinusoid (the red horizontal lines marked reference 1, 2 and 3). On the right side of the picture, the corresponding anomalies (measurement over the next 30 year minus the average value over the reference period) have been represented. It can be seen, obviously, that the anomalies exhibit different trends. Obviously also, all those trends are fake because the real signal is a sinusoid of overall mean value equal to zero. In other words there is no trend, only a periodic behaviour.
The third fundamental critic on the way IPCC is handling temperature data relates to their choice to rely exclusively on linear regression trend lines, despite the fact that any data scientist knows that one must at least consider a time window exceeding 5 times the period of a cyclic component present in the data to avoid “border effects”. Bad luck for IPCC, most of the climate data shows significant cyclic components with (approximate) periods of 11, 60 and 180 years, while on the other hand they consider time windows of 30 years for calculating their anomalies.
And so, IPCC creates artificial “global warming acceleration” by calculating short term linear trends from data exhibiting a cyclic signature. With Figure 11 taken from FAQ 3.1 from Chapter 3 of the IPCC AR4 2007 report  IPCC states “Note that for shorter recent periods, the slope is greater, indicating accelerated warming ».
Fig. 11. Fake conclusions reached by IPCC.
The following graph (Fig. 12) illustrates the issue.
Fig. 12. Global temperatures compared to the average global temperature over the period 1901-2000.
Comment on Fig. 12.
The graph shows the average annual global temperature since 1880 compared not with a 30 years reference period (as done for calculating anomalies), but compared to the long-term average from 1901 to 2000. The zero represents the long-term average for the whole planet, the bars show the global (but long term) “anomalies” above or below the long term average versus time. The claimed linear trend represented on the left part of the figure is (more than probably), as shown on the right part of this figure, nothing else than the ascending branch of a sinusoid of 180 years. This is also another way (the correct and simplest one?) to explain the existence of the “pause” or “hiatus” that has been observed over the last 20 years. The “pause” corresponding to the maximum of the sinusoid, and, consequently, a global cooling period could be expected during the coming years.
7/ Linear trend lines and data having a cyclic signature
Finally, the followings graphs (Figs 13–15, from Masson  illustrate the “border effect” mentioned previously for a schematic case and show the potential errors that can be made, when handling with linear regression methods, data having a cyclic component with a (pseudo-) period of length comparable to the time window considered. The sinusoid remains exactly the same (and shows no trend), but if one calculates the linear regression (by the least square method) over one period of the sinusoid, a FAKE trend line is generated and its slope depends on the initial phase of the time window considered.
Figs 13–15. Linear regression line over a single period of a sinusoid.
Regression lines for a sinusoid.
To illustrate the problem associated to the “Border effect” when drawing the regression line for a sinusoid, let us consider a simple sinusoid and calculate the regression line over one, two, five, … many cycles (Figs 16–18).
The sinusoid being stationary, the true regression line is a horizontal line (with a slope = 0).
Taking an initial phase of 180° (to generate a regression line with a positive slope), let us see how the slope of the regression line changes with the number of periods:
Figs 16–18. Regression line for sinusoids with one, two, five cycles.
The corresponding regression equation is given on each figure. In this equation, the coefficient of x gives the slope of the “fake” regression line. The value of this slope changes with the number of periods as given on Fig. 19. As a thumb rule, data scientists recommand to consider at least 6 periods.
Fig. 19. Slope of the regression line vs number of cycles (see text for explanation).
See also this Excel illustration (here).
8/ An illustratative case
The considerations developed earlier in this paper may probably look obvious to experimented data scientists, but it seems that most of the climatologists are not aware of (or try to hide ?) the problems associated to the length of the time window considered and its initial moment. As a final illustration, let us consider “official” climate data and see what happens when changing the length of the time window considered and its initial instant (Figs 20–22). From this example, it is obvious that linear trend lines applied to (poly)-cyclic data of period similar to the length of the time window considered, open the door to any kind of fake conclusions, if not manipulations aimed to push one political agenda or another.
Fig. 20. An example of “official” global temperature anomalies.
Fig. 21. Effect of the length and initial instant of a time window on linear trend lines.
Fig. 22. Effect of the length and initial instant of a time window on linear trend lines (ctnd).
- IPCC projections result from mathematical models which need to be calibrated by making use of data from the past. The accuracy of the calibration data is of paramount importance, as the climate system is highly non-linear, and this is also the case for the (Navier-Stokes) equations and (Runge-Kutta integration) algorithms used in the IPCC computer models. Consequently, the system and also the way IPCC represent it, are highly sensitive to tiny changes in the value of parameters or initial conditions (the calibration data in the present case), that must be known with high accuracy. This is not the case, putting serious doubt on whatever conclusion that could be drawn from model projections.
- Most of the mainstream climate related data used by IPCC are indeed generated from meteo data collected at land meteo stations. This has two consequences:
(i) The spatial coverage of the data is highly questionable, as the temperature over the oceans, representing 70% of the Earth surface, is mostly neglected or “guestimated” by interpolation;
(ii) The number and location of theses land meteo stations has considerably changed over time, inducing biases and fake trends.
- The key indicator used by IPCC is the global temperature anomaly, obtained by spatially averaging, as well as possible, local anomalies. Local anomalies are the comparison of present local temperature to the averaged local temperature calculated over a previous fixed reference period of 30 years, changing each 30 years (1930-1960, 1960-1990, etc.). The concept of local anomaly is highly questionable, due to the presence of poly-cyclic components in the temperature data, inducing considerable biases and false trends when the “measurement window” is shorter than at least 6 times the longest period detectable in the data; which is unfortunately the case with temperature data
- Linear trend lines applied to (poly-)cyclic data of period similar to the length of the time window considered, open the door to any kind of fake conclusions, if not manipulations aimed to push one political agenda or another.
- Consequently, it is highly recommended to abandon the concept of global temperature anomaly and to focus on unbiased local meteo data to detect an eventual change in the local climate, which is a physically meaningful concept, and which is after all what is really of importance for local people, agriculture, industry, services, business, health and welfare in general.
 The GISS Surface Temperature Analysis (GISTEMP v4) is an estimate of global surface temperature change. It is computed using data files from NOAA GHCN v4 (meteorological stations), and ERSST v5 (ocean areas), combined as described in Hansen et al. (2010) and Lenssen et al. (2019) (see : https://data.giss.nasa.gov/gistemp/). In June 2019, the number of terrestrial stations was 8781 in the GHCNv4 unadjusted dataset; in June 1880, it was only 281 stations.
 Matthew J. Menne, Claude N. Williams Jr., Michael A. Palecki (2010) On the reliability of the U.S. surface temperature record. JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 115, D11108, doi:10.1029/2009JD013094, 2010.
 Venema VKC et al. (2012) Benchmarking homogenization algorithms for monthly data. Clim. Past, 8, 89-115, 2012.
 F.K. Ewert (FUSION 32, 2011, Nr. 3 p31)
 H. Masson,Complexity, Causality and Dynamics inside the Climate System (Proceedings of the 12thannual EIKE Conference, Munich November 2018.
 IPCC, http://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4-wg1-chapter3.pdf]
 Easterbrook D.J. (2016) Evidence-based climate science. Data opposing CO2 emissions as the primary source of global warming. Second Edition. Elsevier, Amsterdam, 418 pp.