Cleans and converts laboratory data to the SNIRH (National Information System on Water Resources) import format. It handles data validation, unit conversions, station validation, and formatting according to SNIRH standards.
Arguments
- data
A data.frame or data.table containing the original laboratory data. Must contain the following columns in order: snirh_entity, station_name, station_id, sampling_date, parameter, unit, value.
- matrix
Character string specifying the type of matrix being processed. Must be one of: "surface.water" or "biota".
- validate_stations
Logical. Whether to validate station IDs against the SNIRH database. Defaults to TRUE. Set to FALSE for offline use, testing, or matrices that don't support validation.
Value
A data.table formatted for SNIRH import with the following structure:
First row contains network specification (REDE=NETWORK_NAME)
Station identifiers (ESTACAO=STATION_ID) before each group of measurements
Date/time stamps in DD/MM/YYYY HH:MM format
Parameter values in SNIRH-compatible units and symbols
Details
The function performs several key operations:
Validates input data structure and removes empty rows/columns
Validates station IDs against SNIRH database (for surface.water and biota)
Checks for duplicate measurements (same station, date, and parameter)
Extracts pH temperature measurements when present
Converts measurement values to SNIRH-compatible units
Handles measurement flags (<, >, =) and special values
Formats output according to SNIRH import specifications
Station Validation
For surface.water and biota matrices, the function validates that:
All station IDs exist in the SNIRH database
All stations have status "ATIVA" (active)
Internet connection is available for downloading station data
If validation fails, the function will stop and provide details about invalid stations that need to be corrected in the database.
Input Data Requirements
The input data must be a data.frame/data.table with exactly these columns:
- snirh_entity
Entity responsible for the data
- station_name
Human-readable station name
- station_id
Unique station identifier (must match SNIRH database)
- sampling_date
Date and time of sampling (POSIXct recommended)
- parameter
Parameter name as used in laboratory
- unit
Unit of measurement as used in laboratory
- value
Measured value (may include flags like <, >)
Parameter Conversion
Relies on an internal parameters dataset that maps laboratory
parameter names and units to SNIRH equivalents. This dataset must contain
conversion factors and SNIRH symbols for all parameters in the input data.
Examples
# Example data structure
# \donttest{
lab_data <- data.table::data.table(
snirh_entity = "APA",
station_name = "River station 1",
station_id = "01F/01", # Must be valid SNIRH station ID
sampling_date = as.POSIXct("2024-01-15 10:30:00"),
parameter = "pH - Campo",
unit = "Escala Sorensen",
value = "7.2"
)
# Convert surface water data (with station validation)
snirh_data <- convert_to_snirh(lab_data, "surface.water")
# Skip station validation if needed (not recommended)
snirh_data <- convert_to_snirh(lab_data, "surface.water",
validate_stations = FALSE)
# }
