# Hydrographic Climate Database

The Ocean Science Hydrographic Climate Database (Climate) is a comprehensive, open access collection of temperature and salinity data for the Northwest Atlantic and Eastern Arctic, an area defined by 35°N - 80° N and 42°W - 100° W. The data come from a variety of sources including hydrographic bottles, Conductivity Temperature Depth (CTD) casts, profiling floats, spatially and temporally averaged Batfish tows, and expendable, digital or mechanical bathythermographs. Near real-time observations of temperature and salinity from the Global Telecommunications System (GTS) are also included. The database currently consists of approximately 850,000 profiles and 35 million individual observations from 1910 to January 2010. Vertical resolution varies from 1 m (for CTDs) to ~ 10-100 m for traditional bottle casts. Climate was updated monthly and approximately 20,000 new profiles were added each year.

## Validation

Initial validation is carried out by the originating institute or organization. All data, whether from Canadian or foreign sources are also validated by the Marine Environmental Data Service (MEDS), the national data center for Fisheries and Oceans Canada. The primary validation procedures at MEDS are described in the Intergovernmental Oceanographic Commission (IOC) publication Global Temperature-Salinity Pilot Project (GTSPP) Real Time Quality Control Manual.

At the Bedford Institute of Oceanography (BIO), the data are subjected to a set of final tests before being incorporated into the database. One of the primary functions of this validation is the determination and elimination of duplicate profiles.

For climatological purposes, we have defined a duplicate as any profile which is within 0.02° of latitude and 0.03° of longitude (roughly 3 km.) and 30 minutes time of another profile.

Determining which duplicate to select is based on a data type hierarchy. A CTD down cast is at the highest level, down through bottle casts, the various Bathythermograph (BT) types, and finally the low resolution Integrated Global Ocean Services System (IGOSS) Tesac and Bathy messages. Low resolution Bathy and Tesac messages get replaced with the higher resolution CTD or eXpendable Bathythermograph (XBT) data as they become available after having worked their way through the collecting agency, national data center if outside of Canada, and finally to MEDS. This process may take a number of years before we receive the final version of the data to replace the near real-time IGOSS data. If data type does not reconcile duplicate selection, selection is based on a progression of selecting Canadian data over foreign, selecting the profile with the greatest number of observations, and finally, selecting the profile with the greatest depth.

Profiles which have been flagged as having failed the MEDS quality control are individually examined to determine if any may be salvaged. There is no attempt to correct erroneous data, however individual data levels may be discarded and a portion of a profile retained.

In addition, the entire database is subjected to various ongoing subjective and objective tests to improve the overall confidence in the data. Stations with individual observations three standard deviations outside of the mean value derived from a 1° grid averaged over depths ranging from 25 meters at the surface to 500 meters for the deep ocean on a seasonal basis have been individually examined and removed when deemed appropriate.

## Interpolated values

During the early 1970's, data was sent to MEDS from BIO as inflection points based on a liner regression tolerance of 0.01° or 0.01 psu. Also during the period 1969-89, MEDS had a limit of 99 levels for a single CTD profile which they ensured by using a similar reduction technique. As a consequence, much of the CTD data during this period was at a much reduced resolution. As a one-time correction, all CTD data in the database with an average depth resolution (maximum depth/ # of observations) of less than 5 meters were interpolated to include values at standard oceanographic depths.

## The Climate Application

This application extracts information from the Ocean Sciences hydrographic database according to user specified spatial and temporal criteria. Output results can be either statistical summaries of the data or the actual data stored in the database (for input into your own analyses). The query is performed off-line. You will be contacted by email when your results become available. Results should normally be available within a few hours (depending upon the size of your query and the number of requests ahead of you). If you haven't had a reply within 24 hours, contact us and we will try to determine what happened.

The query screen is the "home base" for the application. Most of the fields on the query form are linked to help text that can be displayed at any time.

Processing options include the ability to select only those records that contain both temperature and salinity observations and an option to average the values within a profile according to the depth specification. This reduces the resolution of highly sampled data to more closely resemble observations sampled much less frequently.

Users can request a number of different data products which include a station index of latitude, longitude and date/time for each profile selected, individual observations making up the profile, time series based on monthly averages within the latitude, longitude, depth volume, or a seasonal cycle based on averages over all months from the time series statistics.

## A Brief Tour

### (A) Query Identification

All of your queries are assigned a unique query number and saved under your user name. You can re-run existing queries or edit them and submit them as new queries. You can assign a name under TITLE. This name is saved internally in the results files for your reference.

### (B) Area Selection Type

You may define a geographical area in one of three ways;

1. choose from a list of predefined polygons (multiple selections are permitted). The pre-defined polygons are shown on the System Polygons page.
2. provide your own polygon definition by latitude/longitude co-ordinates. To do this, select the Define Area button at the top of your screen, follow the instructions to create a new polygon, and then select it from the list of predefined polygons.
3. define a rectangle by latitude/longitude coordinates. The blocks parameter permits one to subdivide the entire rectangle into x by y blocks. For example, specifying latitude from 42° to 45° and longitude from -62° to -65° with 1° blocks in both latitude and longitude would result in defining 9 separate 1° grid squares for which statistics would be generated.

Note that the convention for longitude is positive East. Latitude and Longitude must be specified as decimal degrees, but we provide a converter if you prefer degrees, minutes, seconds.

A note on blocking (gridding):

If latitude longitude blocking is not requested (i.e., the lat_block field and lon_block fields are left blank) the output will contain all the data within the search rectangle, including points that lie on all the boundaries of the rectangle.

If blocking is requested then the following algorithm will be used.

The bins of user specified size will be created originating from point (0,0). The data that belong to the user specified rectangle will then be distributed among these bins. Data on the left and upper boundary of the bin are excluded. These points are assigned to the adjacent bin. The final output will only include the data that belong to the bins whose centers are within or on the boundary of the rectangle.

As a result, depending on the rectangle co-ordinates and the block size, the output may exclude part of the data that belong to the specified rectangle.

#### Scenario a, b and c

For example, in scenario (a) of the above diagram, the boundaries of the specified rectangle align exactly on the gridlines. The points along the dotted boundaries are excluded because the centers of the bins that contain these points lie outside the rectangle.

In example (b), the boundaries of the rectangle are on the center of the surrounding grid. All the points that are within the rectangle as well as those on the boundaries of the rectangle will be included. However, it is important to note that the bins that are along the boundaries would include only the data for the section of the grid that are within the rectangle.

In case (c), the output would exclude the data on the dotted lines and in the shaded area as the centers of the bins that contain these points lie outside the rectangle.

### (C) Time Specification

Specify a continuous time period from (month/day/year) to (month/day/year) and/or a seasonal window. Months are inclusive. Months six to nine means June to September. Months 12 to two would select only December to February. Default is all months for the entire time period for which there are data.

### (D) Depth Specification

Depth ranges are specified by entering a string of comma-separated fields defining; First_bin, Bin_size, Bin_interval, Last_bin.

The string is read as "Starting with First_bin, plus/minus Bin_size, repeat every Bin_interval until Last_bin". Multiple depth specifications can be included for each query.

Example:

0, 5, 25, 100
150, 10, 50, 250

This will generate statistics for the bin ranges 0±5, 25±5, 50±5, 75±5, 100±5, 150±10, 200±10, 250±10

If not specified, the last two values (Bin_Interval and Last_bin) default to 0 and First_bin. e.g. 500,500 will extract all data for 500±500 metres. This is the same as the old specification.

The numbers do not have to be integers. Real number entries will be rounded to the second decimal place during query processing. If you are specifying multiple depth specifications, separate each set by pressing the enter key, ensuring that each set appears on a line by itself.

### (E) Processing Options

There are two optional selection criteria. One or both may be selected.

The TS Only option selects only records with contain both temperature and salinity observations. Because we have a lot of XBT data, there are many more temperature observations than salinity. The statistics for sigma-t are based on individual computations. There are not based on the overall statistics of temperature and salinity.

The Bin Averaging option is appropriate only if you have requested the complete profile data. This option averages the values within a profile according to the depth specification. What this does is reduce the resolution of highly sampled data to something more closely resembling observations sampled much less frequently. Suppose for example that you intended to use the individual observations to optimally estimate temperature. CTD data, sampled at every metre, would dominate bottle data sampled every 25 meters. Specifying bin averaging and depth ranges of 10 or 20 meters would result in getting a single average CTD observation for the 10 (or 20) metre level, much closer to the 25 to 50 resolution you would expect from a bottle.

### (F) Product Selection

The result set files returned to you depend upon what options you request.

The standard deviation is calculated using the "nonbiased" or "n-1" method using the following formula:

$s={\left(\frac{1}{n-1}\sum _{i=1}^{n}{\left({x}_{i}^{}-\overline{\underset{}{x}}\right)}_{}^{2}\right)}_{}^{\frac{1}{2}}$

The Seasonal Cycle, Timeseries options will get both the Timesseries of monthly statistics (average, minimum, maximum and count of observations (T,S,sigma) for each year and month and depth level for which there are data) and Seasonal Cycle (average, minimum. maximum, standard deviation, count of observations and count of months in average). The values are determined by an un-weighted average over all months from the time series statistics.

The Complete Profile opiton extracts every value of temperature and salinity referenced to depth, latitude, longitude and date. These files can be very large. Many people request it simply because they can. Make sure it is what you really require before requesting it. There are also some file size restrictions. See Caveats.

The Station Index option lists latitude, longitude and date/time for each profile selected.

The complete result set can consist of up to five ascii text files. All files are of the form qry_xx.txt, where xx is your unique query identifier. Files are comma delimited with a header label for easy import into a spreadsheet or database application. The definition for each file and explanation follows (see detailed file description).

### (G) Run

After the query specification is complete, selecting Run will submit the query.