Hosted by CU logo University of Colorado
Powered by ESGF-CoG logo
Welcome, Guest. | Login | Create Account
CoG logo
You are at the CoG-CU node

Data FAQ

This FAQ focuses on questions concerning data, e.g. data format, data processing, CMIP and CORDEX data.

Each ESGF question is sorted into exactly one topic. See also ESGF General since questions of general interest and questions matching several topics are under topic ESGF General.

What are ensembles?

Results of climate model runs depend on the starting point of the calculation, on the initialisation method and on the model physics. Ensemble calculations facilitate quantifying the variability of simulation data concerning a single model. In the CMIP and CORDEX projects, ensemble members are named in the rip-nomenclature, r for realization (starting point), i for initialization, p for physics, followed by an integer, e.g. "r1i1p1". More

Do means over ensembles exist?

Means over several ensemble members are not in the ESGF. You may download the individual ensemble members and calculate the mean using tools, e.g. with Climate Data Operators (CDO).

Does an internationally common or recommended model exist?

A generally recommended model doesn't exist in the CMIP and CORDEX projects. Many researchers take data from more than one model and also more than one ensemble member per model and calculate a mean or plot them together to have a measure for the deviations of the models.

Where can I find model descriptions and associated literature?

Go to the es-doc portal. Information about European models can also be found in the ENES portal, page European Earth System Models and Modelling Groups.

How can I find the definition of a variable?

CMIP and CORDEX variables obey the CF standard incl. the CF Standard Name Table (CF for "Climate and Forecast"). In this table is also a definition of the variable. Additional information is in the variable requirement tables of the projects, see next question.

What is the relation between Variable, CF Standard Name and Variable Long Name?

The search category Variable only contains abbreviations. The CF Standard Name obeys the CF Standard Name Table (CF for "Climate and Forecast"). For CMIP5 varaibles, the relation between all three, Variable, CF Standard Name and Variable Long Name, is tabulated in the CMIP5 Standard Output document, for CORDEX variables see the CORDEX Variables Requirement Table.

Where can I find CMIP5 scenarios?

In the CMIP5 project, Near-Term (10-30 years) or Long-Term (century and longer) climate simulations have been performed, with many models even both. Some of the decadal experiments are Near Term future scenarios. CMIP5 Long Term scenarios are the Representative Concentration Pathways (RCPs), which represent the full bandwidth of future emission trajectories for the years 2006-2100, some continued until 2300. [More information]

Where are CMIP5 historicalAA data?

CMIP5 historicalAA, historical data with anthropogenic aerosol forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = AA" in the metadata of the search results.

An overview which CMIP5 data for historicalAA and other forcings should exist can be found in the tables of Gavin Schmidt, for CCSM and CESM models see the updated table of Gary Strand.

Where are CMIP5 historicalLU data?

CMIP5 historicalLU, historical data with land-use change forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = LU" in the metadata of the search results.

An overview which CMIP5 data for historicalLU and other forcings should exist can be found in the tables of Gavin Schmidt, for CCSM and CESM models see the updated table of Gary Strand.

Where are CMIP5 historicalSl data?

CMIP5 historicalSl, historical data with solar forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = Sl" in the metadata of the search results.

An overview which CMIP5 data for historicalSl and other forcings should exist can be found in the tables of Gavin Schmidt.

Where can I find AR5 data?

Climate model output used in the IPCC's Fifth Assessment Report (AR5) is a subset of CMIP5 data. Two snapshots of these data were taken for documentation. Both are based on the status of CMIP5 data on March 15, 2013, the cutoff date for literature to be included in the Working Group I report CLIMATE CHANGE 2013, The Physical Science Basis. Data updates since March 15, 2013, are not included in the snapshots. A more detailed description inclusive links to access points to the two snapshots can be found on the AR5 GCM data page of Data Distribution Centre (DDC).

Unless you really need the frozen data with deadline March 15, 2013, we recommend CMIP5 data because erroneous CMIP5 data have usually been corrected by publication of a new version. CMIP5 data can be downloaded from ESGF.

Where are the SRES scenarios A1B, A2 and B1?

The SRES scenarios (Special Report on Emission Scenarios, for the Third Assessment Report) belong to CMIP3. CMIP3 data are in the ESGF now. In ESGF search, select project=CMIP3 and, for example, experiment=sresa1b.

Where can I find more CORDEX 3-hourly or 6-hourly data?

CORDEX 3hr and 6hr data are usually not in the ESGF but locally stored at the modeling centers according to the CORDEX Archive Design. Please contact the modeling groups.

Where are CORDEX regional climate models described?

A central database with descriptions of CORDEX regional climate models does not exist. Nevertheless, every CORDEX data file has a header with a global attribute "references", which usually contains a web address. You may see this and other attributes without a file download: Simply select a CORDEX file in an ESGF portal, follow the OPENDAP link and search the section "Global Attributes".

Where can I find the land sea mask or landfrac?

"landfrac" is not a variable name in ESGF. Please look for variable "sftlf", standard name "land_area_fraction". This is the land sea mask of the model in the projects CMIP5, CORDEX, GeoMIP, LUCID, PMIP3, ...

Which grid resolutions do CMIP5 models have?

The following table lists the grid resolutions, i.e. the distance between adjacent grid points in degrees.

Model Atmospheric Grid Ocean Grid
  Latitude Longitude Latitude Longitude
ACCESS1.0 1.25 1.875 lat(i,j) lon(i,j)
ACCESS1.3 1.25 1.875 lat(i,j) lon(i,j)
BCC-CSM1.1 2.7906 2.8125 0.3333, 1 1
BCC-CSM1.1(m) 2.7906 2.8125 0.3333, 1 1
BNU-ESM 2.7906 2.8125 0.3344, 1 1
CCSM4 0.9424 1.25 lat(i,j) lon(i,j)
CESM1(BGC) 0.9424 1.25 lat(i,j) lon(i,j)
CESM1(CAM5) 0.9424 1.25 lat(i,j) lon(i,j)
CESM1(FASTCHEM) 0.9424 1.25 only time-independent ocean data
CESM1(WACCM) 1.8848 2.5 lat(i,j) lon(i,j)
CFSv2-2011 1 1 0.5 0.5
CMCC-CESM 3.4431 3.75 lat(i,j) lon(i,j)
CMCC-CM 0.7484 0.75 lat(i,j) lon(i,j)
CMCC-CMS 3.7111 3.75 lat(i,j) lon(i,j)
CNRM-CM5 1.4008 1.40625 lat(i,j) lon(i,j)
CNRM-CM5-2 1.4008 1.40625 lat(i,j) lon(i,j)
CSIRO-Mk3.6.0 1.8653 1.875 0.9327, 0.9457 1.875
CSIRO-Mk3L-1-2 3.1857 5.625 only time-independent ocean data
CanAM4 2.7906 2.8125 no ocean data
CanCM4 2.7906 2.8125 0.9303, 1.1407 1.40625
CanESM2 2.7906 2.8125 0.9303, 1.1407 1.40625
EC-EARTH 1.1215 1.125 lat(i,j) lon(i,j)
FGOALS-g2 2.7906 2.8125 0.5, 1 1
FGOALS-gl 4.1026 5 1 1
FGOALS-s2 1.6590 2.8125 0.5, 1 1
GEOS-5 2 2.5 1 1
GFDL-CM2.1 2.0225 2.5 0.3344, 1 1
GFDL-CM3 2 2.5 0.3344, 1 1
GFDL-ESM2G 2.0225 2 0.375, 0.5 1
GFDL-ESM2M 2.0225 2.5 0.3344, 1 1
GISS-E2-H 2 2.5 1 1
GISS-E2-H-CC 2 2.5 1 1
GISS-E2-R 2 2.5 1 1.25
GISS-E2-R-CC 2 2.5 1 1.25
HadCM3 2.5 3.75 1.25 1.25
HadGEM2-A 1.25 1.875 no ocean data
HadGEM2-AO 1.25 1.875 0.3396, 1 1
HadGEM2-CC 1.25 1.875 0.3396, 1 1
HadGEM2-ES 1.25 1.875 0.3396, 1 1
INM-CM4 1.5 2 0.5 1
IPSL-CM5A-LR 1.8947 3.75 lat(i,j) lon(i,j)
IPSL-CM5A-MR 1.2676 2.5 lat(i,j) lon(i,j)
IPSL-CM5B-LR 1.8947 3.75 lat(i,j) lon(i,j)
MIROC-ESM 2.7906 2.8125 0.5582, 1.7111 1.40625
MIROC-ESM-CHEM 2.7906 2.8125 0.5582, 1.7111 1.40625
MIROC4h 0.5616 0.5625 0.1875 0.28125
MIROC5 1.4008 1.40625 0.5, 0.5 1.40625
MPI-ESM-LR 1.8653 1.875 orthogonal curvilinear coordinates lat(i,j) and lon(i,j)
MPI-ESM-MR 1.8653 1.875
MPI-ESM-P 1.8653 1.875
MRI-AGCM3-2H 0.562 0.5625 no ocean data
MRI-AGCM3-2S 0.188 0.1875 no ocean data
MRI-CGCM3 1.12148 1.125 0.5, 0.5 1
MRI-ESM1 1.12148 1.125 0.5, 1.125 1
NorESM1-M 1.8947 2.5 lat(i,j) lon(i,j)
NorESM1-ME 1.8947 2.5 lat(i,j) lon(i,j)

In case of the atmospheric grid and its latitude, the tabulated resolution is only valid for the equator region. For higher latitudes deviations may occur.

Ocean models have their own, finer grid. If two values are given for the latitude resolution of the ocean grid, the resolution is not constant. The first value is that for the equator, the second for the poles (maximum for the two poles if different). In case of rotated poles the resolutions for the rotated coordinates rlon and rlat are tabulated. If latitude and longitude are defined with two indices i and j, the resolution cannot simply be read out. In this case lat(i,j) and lon(i,j) have been entered.

How can the MPI-M ocean grid be remapped?

MPI-M ocean data are upside down, due to the MPI-M history to store the data from North to South (positive to negative latitude values). Additionally, a curvilinear grid with the North Pole over Greenland is used.

Solution: Use Climate Data Operator remapbil:

cdo remapbil,r240x220

More details in the CDO documentation.

Do all CMIP5 models use the same calendar?

No, see the table below.

Model Calendar For experiments
ACCESS1.0 proleptic_gregorian all
ACCESS1.3 proleptic_gregorian all
BCC-CSM1.1 365_day all
BCC-CSM1.1(m) 365_day all
BNU-ESM 365_day all
CanAM4 365_day all
CanCM4 365_day all
CanESM2 365_day all
CCSM4 365_day all
CESM1(BGC) 365_day all
CESM1(CAM5) 365_day all
CESM1(FASTCHEM) 365_day all
CESM1(WACCM) 365_day all
CFSv2-2011 gregorian all
CMCC-CESM gregorian all
CMCC-CM gregorian all
CMCC-CMS gregorian all
CNRM-CM5 gregorian all
CNRM-CM5-2 gregorian all
CSIRO-Mk3.6.0 365_day all
CSIRO-Mk3L-1-2 365_day all
EC-EARTH gregorian all
GEOS-5 gregorian all
FGOALS-g2 365_day all
FGOALS-gl 365_day all
FGOALS-s2 365_day all
GFDL-CM2.1 julian all
GFDL-CM3 365_day all but amip: julian
GFDL-ESM2G 365_day all
GFDL-ESM2M 365_day all
GISS-E2-H 365_day all
GISS-E2-H-CC 365_day all
GISS-E2-R 365_day all
GISS-E2-R-CC 365_day all
HadCM3 360_day all
HadGEM2-A 360_day all
HadGEM2-AO 360_day all
HadGEM2-CC 360_day all
HadGEM2-ES 360_day all
INM-CM4 365_day all
IPSL-CM5A-LR 365_day all but aqua4K, aqua4xCO2, aquaControl, past1000: 360_day
IPSL-CM5A-MR 365_day all
IPSL-CM5B-LR 365_day all but aquaControl: 360_day
MIROC-ESM proleptic_gregorian 1pctCO2, abrupt4xCO2, past1000
gregorian esmControl, esmFixClim2, esmHistorical, lgm, midHolocene, piControl, rcp26, rcp45, rcp60, rcp85, esmrcp85, historical, historicalGHG, historicalNat
MIROC-ESM-CHEM gregorian all
MIROC4h gregorian all but piControl: 365_day
MIROC5 360_day aqua4K, aqua4xCO2, aquaControl
365_day 1pctCO2, abrupt4xCO2, amip, amip4K, amip4xCO2, amipFuture, historical, piControl, rcp26, rcp45, rcp60, rcp85, sstClim, sstClim4xCO2, sstClimAerosol, sstClimSulfate
gregorian decadals
MPI-ESM-LR proleptic_gregorian all
MPI-ESM-MR proleptic_gregorian all
MPI-ESM-P proleptic_gregorian all
MRI-AGCM3-2H gregorian all
MRI-AGCM3-2S gregorian all
MRI-CGCM3 gregorian all
MRI-ESM1 gregorian all
NorESM1-M 365_day all
NorESM1-ME 365_day all

The values in the table have been taken from the calendar attributes of the NetCDF files. Since the calendars "standard" and "gregorian" are identical as well as "noleap" and "365_day", only the latter are used in the table. CMIP5 calendars are defined in the CF standard and in the CMIP5 Model Output Requirements.

How may I cite ESGF and CMIP5 data in my paper?

Our most up to date paper describing ESGF can be found here.

CMIP5 in general: Taylor, K.E., R.J. Stouffer, G.A. Meehl: An Overview of CMIP5 and the experiment design.” Bull. Amer. Meteor. Soc., 93, 485-498, doi:10.1175/BAMS-D-11-00094.1, 2012.

For many CMIP5 data a DataCite DOI has been assigned providing persistent citation information. These data may therefore be cited. Two ways are possible to find the corresponding DOIs:

  • Via Data Distribution Centre (DDC): Follow one of the green links in the tables to the landing page of the DOI.
  • Via CMIP5 Citation Information Service: An existing DOI can be found with help of this service using the tracking_id of the data. The tracking_id is component of the general attribute section of the NetCDF file header.

May CMIP5 historical and RCP data be combined to one long time series?

Yes, if you select matching ensemble members. Look into the header of the RCP data file: The attributes parent_experiment_id and parent_experiment_rip name the right ensemble member for combination. [Background information]

Which height levels do the data have?

CORDEX data: The height level is part of the short variable name. For example, ta500 is the air temperature at the 500 hPa pressure level.

Before download with OPeNDAP: Expand the dataset you need with "Show Files" and click on "OPENDAP". In the OPeNDAP Dataset Access Form look for lev and enable it. Click on "Get ASCII" and login. The lev array with the height levels will be listed.

After download: Use local software, for example ncdump, which is a command line tool belonging to NetCDF software.

ncdump -c

The option -c causes ncdump to output header and coordinate arrays.

Which heigth boundaries do CORDEX cloud layers have?

CORDEX offers cloud fraction variables for the following three height layers.

  Variable name Lower boundary in Pa Upper boundary in Pa
Low Clouds cll 100000 68000
Medium Clouds clm 68000 44000
High Clouds clh 44000 0


The height boundaries for the three layers are given as pressure levels and are defined in the CORDEX Archive Design document. The height boundaries of the layer are also stored in the netCDF file in the variable plev_bnds.

In which sequence are the data ordered inside a NetCDF file?

The Network Common Data Format (NetCDF) is a binary data format for the exchange of scientific data and consists of a header and a data part. The header contains beside attributes the structure of the data part. The data itself are deposited in arrays in the data part. This enables quick access.

Data variables are defined by means of coordinate variables, for example the near-surface air temperature tas is defined as a function of time, latitude and longitude.

tas(time, lat, lon)

Inside a data array the data are ordered as follows:

For Mathematicians: The order inside the array corresponds to the lexical order of its index set. The index set of the data variable is the cartesian product of the index sets of the coordinate variables, for example

   Itas = Itime X Ilat X Ilon

The definition of the data variable in the file header contains the manner and sequence of the coordinate variables.

For Programmers: The first value in the tas array is the value for the first time, first lat and first lon. The second value is that for first time, first lat and second lon. Then the tas values for the other longitudes follow. If the number of longitudes is only 2, now the value for first time, second lat and first lon follows. If the number of latitudes is also 2, the first tas value for the second time appears in position 5.

   | 1 1 1 | 1 1 2 | 1 2 1 | 1 2 2 | 2 1 1 | ...

Technically spoken, the values are written to the array in a nested loop. The innermost loop is lon, the outermost is time with lat in the middle.

How can I verify that the data have not been updated since I downloaded them?

Solution 1: You may compare the version of the data. The version is part of the metadata and can be found in the ESGF portals. It is also printed in the NetCDF header.

Solution 2: ESGF offers a comfortable comparison using Wget scripting. Keep your Wget script after download and again run it with the -u option.

bash -u

This does not repeat the download but creates a new version of the download script. The old and the new script version are compared and this comparison includes the checksums in the download file lists of both scripts. A change in a file checksum is a hint for a new dataset version.

Solution 3: Sometimes data producers replace data without updating the version number in case of minor changes. In ESGF, this is not allowed and fortunately seldom. Ruling out these hidden changes is tedious. You may compare the checksum of your download file with that of a freshly downloaded file. Checksums may be calculated with md5sum:


How can I read or process downloaded data?

Data downloaded from ESGF are usually in NetCDF format. NetCDF is a header based binary format and can be read/processed by

An exception is NetCDF OPeNDAP download. Here you can get ASCII CSV, i.e. readable text (Comma Separated Values), or dodc (binary OPeNDAP data format). ASCII CSV can directly be imported, for example, into Microsoft Excel.

I can not process downloaded data

There might be several reasons and solutions for this issue:

Solution 1: If you have downloaded the file with your browser's download manager (following a HTTPServer link), compare the checksum of your downloaded file with that in the metadata. In case the checksums are different, repeat the download since the file may have been changed during download. ESGF Wget scripts perform this check automatically.

Solution 2: Many data, especially CORDEX data, are stored in the format NetCDF4 or compressed NetCDF4. Ensure that your local software can handle this relatively new data format.

How can I calculate a multi-year average for each month of year?

A multi-year average for each month of year can easily be calculated with CDO ymonavg. Example:

cdo splityear OH_   # split into years
cdo cat  # concatenate to a file containing 10 years
cdo ymonavg  # calculate multi-year average for each month

More details are in the CDO documentation,

How can CORDEX data on a grid with rotated poles be rotated back?

Some native CORDEX grids have rotated poles, for example the native European domains EUR-44 and EUR-11. They can easily be regridded (rotated back).

Solution 1: Use interpolated data
Interpolated data are in the domains with "i" at the end, e.g. EUR-44i. These data already have a grid which has been rotated back.

Solution 2: Use cf-python
cf-python uses the ESMF regridding library as its regridding engine, and currently provides first-order conservative (by default) or bilinear spherical regridding. CORDEX data are usually NetCDF/CF compliant; so cf-python only needs the following commands:

The rotated_fields may have more dimensions than just rotated latitude (X) and rotated longitude (Y). The above command will regrid each X-Y slice and so regridded_fields will have the same rank as the original.

import cf
rotated_fields ='')
unrotated_field ='')
regridded_fields = rotated_fields.regrids(unrotated_field)

More details in the cf-python documentation

Solution 3: Use CDO
Climate Data Operators (CDO) offer different ways of regridding, for example cdo rotuvb can perform a backward transformation of velocity components U and V from a rotated spherical system to a geographical system. More details in the CDO documentation.

Last Update: March 17, 2017, 1:41 p.m. by Torsten Rathmann
Read News
CoG 3.8 Released!
This release improves support for retracted datasets. Learn more at: https://earthsyst...
CoG 3.7 Released!
This release focuses on improving the security of the application and the node federation functionality. ...
CoG 3.6 Released!
This release adds the ability to add non-CoG links to the left navigation bar. Read ...
CoG 3.5 Released!
This release upgraded the Globus libraries. Learn more at: https://earthsyst...
CoG 3.4 Released!
This release changes how the ESGF search functions. Learn more at: https://earthsyst...
Browse Projects
Start typing, or use the 'Delete' key to show all available tags.
CoG Tags: Cyberinstrastructure