8 A guide for BMS sampling design with R

In this section, we will provide an overview and explanation of the code used for the final step of the BMS sampling design. This report is designed to be self-contained and provides all the necessary steps for processing the data layers in conjunction with the GRTS algorithm to perform the sampling. The complete procedure, including code for data preparation and analysis, is organized in sequential R scripts and can be found in the GitHub repository. This specific report refers to the R script 08_runGRTS.R.

Setup

Bellow we list the packages used in the sampling process. Note that the code described in the report is only compatible with spsurvey R package versions v5.0 and above. For consistent results, we recommend loading the same environment that was used in this project, which is saved in the renv.lock file. The following steps can be used to load the same environment:

Code

# install `renv` package if necessary
if (!require(renv)) install.packages('renv')
# Restore the project environment
renv::activate(project = "/path/to/project")

The first line of code will check load (and install if necessary) the R pacakge renv to manage the version of the dependences. The activate() function will install all necessary packages with their respective version into a local container accessible only by this project. If you’re not already in the path of the project, make sure to replace /path/to/project with the path to the directory where the project is stored.

Code

library(raster)
library(exactextractr)
library(tidyverse)
library(spatstat)
library(sf)
library(spsurvey)
library(MBHdesign)

# to reproduce the same results
set.seed(0.0)

Custom parameters

Here is the main section the user will need to interact. Below is a list of all the variables that can be used to customize the sampling process, each accompanied by detailed comments.

Code

# Threshold to determine whether a hexagon is suitable for sampling is based on
# the proportion of NA pixels (non-natural habitat).
# `0.8` means a hexagon has to have at least 20% of habitat pixels to be
# suitable for sampling
prop_na = 0.8

# Sample effort in percentage
sample_effort = 0.02

# Total sample size for SSU within a hexagon (Main + Over)
# It must be an even numbers
ssu_N = 6

# Number of replications when selecting the PSU with the GRTS algorithm
nb_rep = 15

# Code of ecoregions to sample
eco_sim = c(
    '7', '28', '30', '31',
    '46', '47', '48', '49',
    '73', '77', '78', '86',
    '72', '74', '75', '76',
    '96', '99', '100', '101',
    '102', '103', '117',
    '216', '217'
)

# File path of the csv file used to store the legacy sites
# E.g. https://github.com/willvieira/sampling_BMS/blob/main/data/legacySites.csv
# `lat` and `lon` arguments are used to define the character name of the
# columns in the csv
and columns to extract legacy sites
legacyFile = file.path('..', 'data', 'legacySites.csv')
lat = 'latitude'
lon = 'longitude'

# Buffer size (in Km) to adjust inclusion probability of hexagons around the
# legacy sites using the MBHdesign R package
bufferSize_p = 10

# Buffer size (in Km) to adjust the sample size of an ecoregion as a function of
# the number and distribution of its legacy sites
# The value provided can either be a single number applied to all ecoregions or
# a file path containing the buffer size information for each ecoregion.
# E.g.https://github.com/willvieira/sampling_BMS/blob/main/data/bufferSize_N.csv
bufferSize_N = file.path('..', 'data', 'bufferSize_N.csv')
# If you want to assign the same buffer size (in Km) for all ecoregion:
# bufferSize_N = 15

# Distance between SSU centroid (in meters)
ssu_dist = 294

# Path to save the shapefiles with the selected PSU and SSU
outputFolder = file.path('output', 'selection2023')

# suffix to add for each output layer
# e.g.: PSU-SOQB_ALL-SUFFIX.shp
fileSuffix = 'V2023'

Prepare layers for sampling

The first step is loading the complete hexagons list within the study area. While loading, we keep only the hexagons for the ecoregions of interest (eco_sim) and remove the hexagons with too many NA habitats. We then compute the hexagon inclusion probability from the habitat and cost layers normalized for the study area.

Code

hexas <- readRDS(file.path('..', 'data', 'hexa_complete.RDS')) |>
    filter(propNA <= prop_na) |>
    filter(ecoregion %in% eco_sim) |>
    mutate(
        p = (hab_prob * cost_prob) / sum(hab_prob * cost_prob)
    ) |>
    filter(p != 0)

The legacy sites are the second layer of information needed to run the GRTS. Bellow the code is written to load the list of coordinate points from the legacy sites and create a data frame describing the count number of legacy sites per hexagon. This procedure scales the point data to the hexagon level, allowing us to modify the inclusion probability of the neighboring hexagons and adjust the ecoregion sample size.

Code

# function to transform Latitude & longitude legacy site points in a table
# with the number of points per hexagon ID (ET_Index)
import_legacySites <- function(File, lat_name, lon_name)
{
    # transform hexagons projection to the same of the legacy points
    hx <- hexas |>
        st_transform(4326)
    
    # read legacy csv file
    lg <- read_csv(File, show_col_types = FALSE) |>
        rename(
            lat = all_of(lat_name),
            lon = all_of(lon_name)
        ) |>
        st_as_sf(
            coords = c('lon', 'lat'),
            crs = st_crs(hx)
        )

    # intersect legacy points with hexagon polygons
    nbLegacy <- hx |>
        st_contains(lg, sparse = FALSE) |>
        apply(1, sum)

    # Return a transformed data to data.frame
    # and keep only the hexagons that contains legacy sites
    tibble(
        ET_Index = hx$ET_Index,
        nbLegacySites = nbLegacy
    ) |>
    filter(nbLegacySites > 0)
}

# load and transform legacy sites (slow function)
legacySites <- import_legacySites(
    File = legacyFile,
    lat_name = lat,
    lon_name = lon
)

# merge legacy sites data.frame into hexagon object
hexas <- hexas |>
    left_join(legacySites) |>
    mutate(nbLegacySites = replace_na(nbLegacySites, 0))

Adjust sample size and inclusion probability as a function of legacy sites

After merging the hexagons and legacy sites, the next step is to adjust the sample size and modify the inclusion probabilities. The sample size, defined in the number of hexagons, is defined in function of the size of the ecoregion, the sampling effort (sample_effort), and the number of legacy sites. The input bufferSize_N quantify the effect of each legacy site in reducing the final sample size. The value provided can either be a single number that will be applied to all ecoregions or a file path that contains the buffer size information for each individual ecoregion. For more details on defining the buffer size to adjust the sample size, please refer to Chapter 5.

Code

# function to get sample size for a specific ecoregion given:
# number of hexagons, number of legacy sites, bufferSize, and sample effort
get_sampleSize <- function(eco, hx, bf_N, sample_e)
{
    # get the hexagons centroid for a specific ecoregion
    hexa_eco <- hx |>
        filter(ecoregion == eco) |>
        st_centroid()

    if(nrow(subset(bf_N, ecoregion == eco)) > 0) {
        # create a buffer of size `BufferSize_N` around the legacy site centroid
        hexa_legacy_bf <- hexa_eco |>
            filter(nbLegacySites > 0) |>
            st_buffer(subset(bf_N, ecoregion == eco)$bufferSize * 1000) |>
            st_union()

        # Compute the number of hexagons from the ecoregion in which their
        # centroid falls within the buffer around the legacy sites
        nbHexas_legacy <- hexa_eco |>
            st_intersects(hexa_legacy_bf) |>
            unlist() |>
            sum()
    }else{
        nbHexas_legacy = 0
    }

    # Compute the adjusted sample size using only the total amount of hexagons
    # that do not fall within the legacy buffer
    adj_sampleSize <- round((nrow(hexa_eco) - nbHexas_legacy) * sample_e, 0)

    # If the ecoregion is too small to accommodate 2 hexagons with the defined
    # sampling effort, ensure to sample at least two hexagons.
    if((nrow(hexa_eco) * sample_e) < 2 & nbHexas_legacy < 1)
        adj_sampleSize = 2

    return(adj_sampleSize)
}

Code

# Load buffer size information regardless of the `byfferSize_N` class
if(is.character(bufferSize_N)) {
    if(file.exists(bufferSize_N)) {
        buffSizeN <- read_csv(bufferSize_N, show_col_types = FALSE) |>
            mutate(ecoregion = as.character(ecoregion))
    }else{
        stop(
            paste0('File "', bufferSize_N,
            '" does not exist. Please check if the name is correct.')
        )
    }
}else if(is.numeric(bufferSize_N)){
    buffSizeN <- tibble(
        ecoregion = eco_sim,
        bufferSize = bufferSize_N
    )
}else{
    stop('The type of `bufferSize_N` must be either numeric or a character')
}

Code

# Run the function to define sample size for all selected ecoregions
sampleSize <- map_dbl(
    setNames(eco_sim, paste0('eco_', eco_sim)),
    get_sampleSize,
    hx = hexas,
    bf_N = buffSizeN,
    sample_e = sample_effort
)

# Define the stratum object with ecoregion that have
# at least a sample size of 1
Stratdsgn  <- sampleSize[sampleSize > 0]

# Because we are sampling an extra hexagon for each selected one,
# make sure that each ecoregion has at least 2 times more available
# hexagons than sample N
eco_to_remove <- hexas |>
    st_drop_geometry() |>
    group_by(ecoregion) |>
    summarise(nbHexas = n()) |>
    left_join(
        sampleSize |>
            enframe() |>
            mutate(
                ecoregion = as.character(parse_number(name)),
                sampleSize_extra = value * 2
            ) |>
            select(ecoregion, sampleSize_extra)
    ) |>
    mutate(
        diff = nbHexas - sampleSize_extra
    ) |>
    filter(diff < 0) |>
    pull(ecoregion)

Stratdsgn <- Stratdsgn[!names(Stratdsgn) %in% paste0('eco_', eco_to_remove)]

# Prepare sample frame to be used in GRTS
# scale inclusion probability to the total sample size
# Use only the centroid of the hexagon as a sample point
sampleFrame <- hexas |>
    filter(ecoregion %in% parse_number(names(Stratdsgn))) |>
    mutate(    
        eco_name = paste0('eco_', ecoregion), # to match design name
        mdcaty  = sum(Stratdsgn) * p/sum(p),
        geometry = sf::st_geometry(sf::st_centroid(geometry))
)

The last step involves adjusting inclusion probabilities for neighboring hexagons around legacy sites. To do this, we used the MBHdesign R package and adjusted the inclusion probability based on the bufferSize_p parameter.

Code

# Coordinates of all hexagons in matrix format for MBHdesign
coord_mt <- sampleFrame |>
    st_coordinates()

# Coordinates of legacy hexagons
legacySites <- sampleFrame |>
    filter(nbLegacySites > 0) |>
    st_coordinates()

# Adjust inclusion probability of neighbour hexagons in function of legacy sites
sampleFrame$adj_p <- MBHdesign::alterInclProbs(
    legacy.sites = legacySites,
    potential.sites = coord_mt,
    inclusion.probs = sampleFrame$mdcaty,
    sigma = bufferSize_p * 1000
)

Primary Sampling Unit (PSU)

The spsurvey R package was used to implement the GRTS algorithm for sampling the PSUs of each ecoregion stratum. The sample size for each stratum, as defined in Stratdsn, is used to sample the hexagon points from the sampleFrame object. Note that the same sample size stratum is used for sampling the replacement (n_over) sites for each ecoregion stratum. Within each stratum, the sample selection is weighted based on the inclusion probability adj_p, which is derived from the habitat, cost, and legacy site layers. We replicate this sampling process nb_rep times, sum the total cost of sampling for the selected hexagons, and choose the replication with the lowest cost.

Code

# list to store the hexagons ID (ET_Index) for the main and the replacement
grts_main <- grts_over <- list()
# Run GRTS selection over nb_rep times
for(Rep in 1:nb_rep)
{
    out_sample <- spsurvey::grts(
        sframe = sampleFrame,
        n_base = Stratdsgn,
        n_over = Stratdsgn,
        stratum_var = 'eco_name',
        aux_var = 'adj_p'
    )

    grts_main[[paste0('Rep_', Rep)]] <- out_sample$sites_base$ET_Index
    grts_over[[paste0('Rep_', Rep)]] <- out_sample$sites_over$ET_Index
}

# Select cheapest replication (based on 'main' selection)
cheapest_rep <- map_df(
    grts_main,
    ~ hexas |>
        st_drop_geometry() |>
        filter(ET_Index %in% .x) |>
        summarise(totalCost = sum(costSum))
) |>
pull(totalCost) |>
which.min()

# Save selected main and replacement hexagons
selected_main <- hexas |>
    filter(ET_Index %in% grts_main[[cheapest_rep]])
selected_over <- hexas |>
    filter(ET_Index %in% grts_over[[cheapest_rep]])

Since the replacement hexagons are randomly selected in the stratum space, we also selected an extra layer of replacement hexagons that are obligatory neighbors of each main hexagon. For each selected main hexagon, we extract all neighboring hexagons and select the one with the highest inclusion probability. If a selected main hexagon has no available neighbor hexagon, we will select a random one over the ecoregion.

Code

# object to store extra layer of selected hexagons
selected_extra <- hexas[0, ]

# loop over ecoregions to make sure neighbours are from the same ecoregion
for(eco in parse_number(names(Stratdsgn)))
{
    hexas_eco <- subset(hexas, ecoregion == eco)
    hexas_sel_eco <- subset(selected_main, ecoregion == eco)

    # Extract neighbours hexagons
    neigh_mt <- hexas_eco |>
        st_centroid() |>
        st_intersects(
            y = st_buffer(hexas_sel_eco, dist = 3500),
            sparse = FALSE
        )

    # Remove the focus hexagon (the selected one)
    rr = match(hexas_sel_eco$ET_Index, hexas_eco$ET_Index)
    cc = seq_along(rr)
    neigh_mt[rr + nrow(neigh_mt) * (cc - 1)] <- FALSE

    # Select the extra hexagon based on the highest p
    best_p <- apply(
        neigh_mt,
        2,
        function(x)
            hexas_eco$ET_Index[x][which.max(hexas_eco$p[x])]
    )

    # if a selected hexagon has no neighbour, select a random from the ecoregion
    toCheck <- unlist(lapply(best_p, length))
    if(any(toCheck == 0)) {
        # which hexagons were not selected? 
        nonSelected_hexas <- setdiff(hexas_eco$ET_Index, hexas_sel_eco$ET_Index)
        
        # sample from non selected hexas
        best_p[which(toCheck == 0)] <- sample(
            nonSelected_hexas, sum(toCheck == 0)
        )
    }
    
    selected_extra <- rbind(
        selected_extra,
        hexas_eco[match(best_p, hexas_eco$ET_Index), ]
    )
}

Secondary Sampling Unit (SSU)

For each of the selected PSU hexagons (main, over, and extra), we generate a grid of SSU points equally spaced with a distance defined by ssu_dist. The following function to generate SSUs is from the Ontario’s sampling approach (code source).

Code

genSSU <- function(h, spacing)
{
    ch <- as_tibble(st_coordinates(h))
    top_point <- ch[which.max(ch$Y),]
    bottom_point <- ch[which.min(ch$Y),]
    gridsize <- 2*floor(abs(top_point$Y-bottom_point$Y)/spacing)+3
    rowAngle <- tanh((top_point$X-bottom_point$X)/(top_point$Y-bottom_point$Y))

    cent <- st_centroid(h) %>%
        bind_cols(as_tibble(st_coordinates(.))) %>%
        st_drop_geometry %>%
        dplyr::select(ET_Index, X, Y)

    genRow <- function(cX, cY, sp,...){
        tibble(rowid = seq(-gridsize,gridsize)) %>%
        mutate(X = sin(60*pi/180+rowAngle) *sp*rowid + {{cX}},
                Y = cos(60*pi/180+rowAngle) *sp*rowid  + {{cY}})
    }

    centroids <- tibble(crowid=seq(-gridsize,gridsize)) %>%
        mutate(cY = cos(rowAngle) *spacing*crowid + cent$Y,
            #spacing/2*crowid + cent$Y,
            cX =  sin(rowAngle) *spacing*crowid + cent$X) %>%
        #cent$X + crowid* sqrt(spacing**2-(spacing/2)**2)) %>%
        rowwise() |>
        mutate(row = list(genRow(cX = cX,cY = cY,sp = spacing))) %>%
        unnest(row) |>
        dplyr::select(X,Y) %>%
        st_as_sf(coords = c("X", "Y"), crs = st_crs(h)) %>%
        st_filter(h) %>%
        mutate(
            ET_Index = h$ET_Index,
            ecoregion = h$ecoregion,
            ssuID = row_number()
        )
    return(centroids)
}

The following step involves defining a function to sample the SSUs within each selected PSU hexagon. SSU sampling is done sequentially, so for each selected SSU, the first and second layers of neighbour points (6 and 12, respectively) are made unavailable for sampling the next SSU. We repeat this process until we reach the total sampling size (ssu_N) or when there are no more available SSUs left to sample.

Code

sample_SSU <- function(ssuid, prob, geom, filtered, ssuDist, N)
{
    # check if N is even
    if(N %% 2 != 0)
        stop('`ssu_N` must be a even number.')

    filtered_1 <- filtered
    out <- rep(0, length(filtered))

    # loop to sample N SSUs
    count = 1
    while(
        count <= N &
        sum(get(paste0('filtered_', count))) > 1
    ){
        # sample point
        assign(
            paste0('sample_', count),
            sample(
                ssuid[get(paste0('filtered_', count))],
                size = 1,
                prob = prob[get(paste0('filtered_', count))]
            )
        )

        # remove points around the first sample for second point
        toKeep <- !st_intersects(
            geom,
            st_buffer(
                geom[which(get(paste0('sample_', count)) == ssuid)],
                dist = ssuDist * 2 + ssuDist * 0.1),
                sparse = FALSE
        )[, 1]

        # update available points
        assign(
            paste0('filtered_', count + 1),
            get(paste0('filtered_', count)) & toKeep
        )

        # assign point to output vector
        out[get(paste0('sample_', count))] = count
        
        count = count + 1
    }
    return( out )
}

With these two functions, we can loop over the main, over, and extra selected hexagons to create and sample SSU points. First, we create the SSU points for all selected hexagons. Then, we generate a buffer around each SSU point with a distance of ssu_dist/2. We use the habitat pixels within each buffer to calculate the inclusion probability and the proportion of NA pixels. To save computation time, we compute and store the six neighbour points around each SSU point. Before sampling the SSU points, we classify each point as available or not following these rules:

it must have 6 neighbours (less than that means it’s a border SSU)
it must have at least 1 - prop_na of non-empty pixels
4 out of 6 neighbours must also meet these rules

In the final step, we assign a specific code to each selected SSU point and its neighbors, following the A_B pattern, where A represents the SSU sample ID (ranging from 1 to ssu_N), and B represents the neighbor ID (ranging from 0 to 6). Here, 0 represents the focal point, and 1 to 6 represent the six neighbors, starting from the top point and moving clockwise. please refer to Chapter 7 for more details.

Code

# Habitat shapefile to calculate inclusion probability
land_ca <- raster(file.path('..', 'data', 'landcover_ca_30m.tif'))
# Distribution of habitat types per ecoregion
prev_all <- readRDS(file.path('..', 'data', 'prev_all.RDS'))

for(lyr in c('main', 'over', 'extra'))
{
    sel_lyr <- get(paste0('selected_', lyr))

    # Generate SSU points
    SSUs <- map_dfr(
        seq_len(nrow(sel_lyr)),
        ~ genSSU(sel_lyr[.x, ], spacing = ssu_dist)
    )

    # Buffer of half `ssu_dist` to compute habitat inclusion prob
    SSU_bf <- st_buffer(SSUs, dist = ssu_dist/2)

    # extract pixels for each SSU polygon
    hab_pixels <- exactextractr::exact_extract(
        land_ca,
        SSU_bf,
        progress = FALSE
    )
    rm(SSU_bf)

    # get frequence of each class of habitat
    count_hab <- Map(
                function(x, y) {
                    freq <- table(x$value)
                    if(length(freq) > 0) {
                        data.frame(freq, ecoregion = y)
                    }else{
                        NA
                    }
                },
                x = hab_pixels,
                y = SSUs$ecoregion
            )

    # merge with inclusion probability
    # and calculate inclusion probability for each NON empty polygon
    SSUs$incl_prob <- unlist(
            lapply(
                count_hab,
                function(x) {
                    if(is.data.frame(x)) {
                        mg_df <- merge(
                            x,
                            subset(
                                prev_all, ID_ecoregion == x$ecoregion[1]
                            ),
                            by.x = "Var1",
                            by.y = "code",
                            all.x = TRUE
                        )
                        sum(mg_df$Freq * mg_df$incl_prob)
                    }else{
                        NA
                    }
                }
            )
        )
    rm(count_hab)

    SSUs <- SSUs |>
        group_by(ET_Index) |>
        mutate(
            incl_prob = incl_prob/sum(incl_prob, na.rm = TRUE)
        ) |>
        ungroup()

    # Calculate proportion of NA
    SSUs$propNA <- map_dbl(
        hab_pixels,
        ~ sum(is.na(.x$value))/nrow(.x)
    )
    rm(hab_pixels)

    # neighbours matrix
    neighbour_ls <- list()
    for(hx in unique(SSUs$ET_Index))
    {
        ssuhx <- subset(SSUs, ET_Index == hx)

        neighbour_ls[[hx]] <- st_intersects(
            ssuhx,
            st_buffer(ssuhx, dist = ssu_dist + ssu_dist * 0.1),
            sparse = FALSE
        )
    }

    # These are the following rules to a SSU be suitable for sampling:
    # - Must have 6 neighbours (less than that means it's a border SSU)
    # - Must have at least 1 - `prop_na` of non empty pixels
    # - 4 out 6 neighbours must also respect the above rule
    SSU_selected = SSUs |>
        group_by(ET_Index) |>
        mutate(
            nbNeighb = map_dbl(
                ssuID,
                ~ sum(neighbour_ls[[unique(ET_Index)]][, .x]) - 1
            ),
            nbPropNA = map_int(
                ssuID,
                .f = function(x, pNA, ETI)
                    sum(
                        pNA[
                            setdiff(which(neighbour_ls[[ETI]][, x]), x)
                        ] <= prop_na
                    ),
                pNA = propNA,
                ETI = unique(ET_Index)
            ),
            sampled = sample_SSU(
                ssuid = ssuID,
                prob = incl_prob,
                geom = geometry,
                filtered = propNA <= prop_na & nbNeighb == 6  & nbPropNA >= 4,
                ssuDist = ssu_dist,
                N = ssu_N
            )
        ) |>
        ungroup()

    # Prepare selected SSU and their specific neighbours with code like `A_B`
    # `A` is for the SSU sample ID (1:`ssu_N`)
    # `B` is for the neighbour ID (0:6 where zero is the focal point, and
    # 1:6 are the 6 neighbours starting from the top point moving clockwise)
    SSU_lyr <- subset(SSU_selected, sampled > 0)
    SSU_lyr_ls <- list()

    for(i in 1:nrow(SSU_lyr))
    {
        # get neighbours for specific row
        nei_hx <- SSU_selected |> 
            filter(ET_Index == SSU_lyr$ET_Index[i]) |> 
            filter(
                ssuID %in% which(
                    neighbour_ls[[SSU_lyr$ET_Index[i]]][, SSU_lyr$ssuID[i]]
                )
            )

        # code A 
        nei_hx$codeA <- SSU_lyr$sampled[i]
        
        # code B
        nei_hx$codeB <- c(4, 3, 5, 0, 2, 6, 1)

        SSU_lyr_ls[[i]] <- nei_hx
    }

    # save
    assign(
        paste0('SSU_all_', lyr),
        SSU_selected
    )
    assign(
        paste0('SSU_', lyr),
        do.call(rbind, SSU_lyr_ls)
    )
}

Export PSU and SSU samples in shapefiles

The purpose of this final section is to export all PSU and SSU points in a shapefile format. The format was chosen to facilitate post-processing, but it can be modified to fit different needs. The first code chunk saves all PSU and SSU points into a single file, while the second code chunk separates the same sampled points by ecoregion.

Grouped ecoregions

Code

savePath = file.path(outputFolder, 'allEcoregion')
dir.create(savePath, recursive = TRUE)

varsToRm = c(
    'OBJECTID', 'Join_Count', 'TARGET_FID',
    'ET_ID', 'ET_ID_Old', 'ET_IDX_Old'
)

hexas_save <- hexas |>
    select(-all_of(varsToRm))

# rename attributes table so it has a maximum of 10 characters
names(hexas_save) <- abbreviate(names(hexas_save), minlength = 10)

# add coordinates
coords <- hexas_save |>
    sf::st_centroid() |>
    sf::st_transform(4326) |>
    sf::st_coordinates() |>
    as.data.frame()

hexas_save <- hexas_save |>
    mutate(
        latitude = coords$Y,
        longitude = coords$X
    )

# save all PSU
hexas_save |>
    write_sf(
        file.path(
            savePath,
            paste0('PSU-SOBQ_ALL-', fileSuffix, '.shp')
        )
    )

# Selected PSUs
for(lyr in c('main', 'over', 'extra'))
{
    hexas_save |>
        filter(
            ET_Index %in% get(paste0('selected_', lyr))$ET_Index
        ) |>
        write_sf(
            file.path(
                savePath,
                paste0('PSU-SOBQ_', lyr, '-', fileSuffix, '.shp')
            )
        )
}

# Save all SSU
for(lyr in c('main', 'over', 'extra'))
{
    # SSU main
    SSU_lyr <- get(paste0('SSU_all_', lyr)) |>
        select(-c('nbNeighb', 'nbPropNA', 'sampled'))

    coords <- SSU_lyr |>
        sf::st_transform(4326) |>
        sf::st_coordinates() |>
        as.data.frame()

    SSU_lyr |>
        mutate(
            latitude = coords$Y,
            longitude = coords$X
        ) |>
        write_sf(
            file.path(
                savePath,
                paste0('SSU-SOBQ_ALL_', lyr, '-', fileSuffix, '.shp')
            )
        )
}

# Save selected SSU
for(lyr in c('main', 'over', 'extra'))
{
    # SSU main
    SSU_lyr <- get(paste0('SSU_', lyr)) |>
        select(-c('nbNeighb', 'nbPropNA', 'sampled'))

    coords <- SSU_lyr |>
        sf::st_transform(4326) |>
        sf::st_coordinates() |>
        as.data.frame()

    SSU_lyr |>
        mutate(
            latitude = coords$Y,
            longitude = coords$X
        ) |>
        write_sf(
            file.path(
                savePath,
                paste0('SSU-SOBQ_selected_', lyr, '-', fileSuffix, '.shp')
            )
        )
}

Splited ecoregions

Code

for(eco in eco_sim)
{
    # create folder
    eco_path <- file.path(
        outputFolder,
        'byEcoregion',
        paste0('ecoregion_', eco)
    )
    dir.create(eco_path, recursive = TRUE)

    # PSU
    ###########################################
    hexas_save |>
        filter(ecoregion == eco) |>
        write_sf(
            file.path(
                eco_path,
                paste0('PSU-SOBQ_eco', eco, '_ALL-', fileSuffix, '.shp')
            )
        )

    for(lyr in c('main', 'over', 'extra'))
    {
        hexas_save |>
            filter(
                ET_Index %in% subset(
                    get(paste0('selected_', lyr)),
                    ecoregion == eco
                )$ET_Index
            ) |>
            write_sf(
                file.path(
                    eco_path,
                    paste0(
                        'PSU-SOBQ_eco',
                        eco, '_',
                        lyr, '-',
                        fileSuffix,
                        '.shp'
                    )
                )
            )
    }

    # SSU
    ###########################################

    # Save all SSU
    for(lyr in c('main', 'over', 'extra'))
    {
        # SSU main
        SSU_lyr <- get(paste0('SSU_all_', lyr)) |>
            filter(ecoregion == eco) |>
            select(-c('nbNeighb', 'nbPropNA', 'sampled'))

        coords <- SSU_lyr |>
            sf::st_transform(4326) |>
            sf::st_coordinates() |>
            as.data.frame()

        SSU_lyr |>
            mutate(
                latitude = coords$Y,
                longitude = coords$X
            ) |>
            write_sf(
                file.path(
                    eco_path,
                    paste0(
                        'SSU-SOBQ_eco',
                        eco,
                        '_ALL_',
                        lyr, '-',
                        fileSuffix,
                        '.shp'
                    )
                )
            )
    }

    # Save selected SSU
    for(lyr in c('main', 'over', 'extra'))
    {
        # SSU main
        SSU_lyr <- get(paste0('SSU_', lyr)) |>
            filter(ecoregion == eco) |>
            select(-c('nbNeighb', 'nbPropNA', 'sampled'))

        coords <- SSU_lyr |>
            sf::st_transform(4326) |>
            sf::st_coordinates() |>
            as.data.frame()

        SSU_lyr |>
            mutate(
                latitude = coords$Y,
                longitude = coords$X
            ) |>
            write_sf(
                file.path(
                    eco_path,
                    paste0(
                        'SSU-SOBQ_eco',
                        eco,
                        '_selected_',
                        lyr, '-',
                        fileSuffix,
                        '.shp'
                    )
                )
            )
    }
}

--- title: "A guide for BMS sampling design with R" format: html: code-fold: show code-tools: true --- ```{r setup,echo=FALSE} Eval=FALSE ``` In this section, we will provide an overview and explanation of the code used for the final step of the BMS sampling design. This report is designed to be self-contained and provides all the necessary steps for processing the data layers in conjunction with the GRTS algorithm to perform the sampling. The complete procedure, including code for data preparation and analysis, is organized in sequential R scripts and can be found in the [GitHub repository](https://github.com/willvieira/sampling_BMS/). This specific report refers to the R script [`08_runGRTS.R`](https://github.com/willvieira/sampling_BMS/blob/main/08_runGRTS.R). ## Setup Bellow we list the packages used in the sampling process. Note that the code described in the report is only compatible with spsurvey R package versions v5.0 and above. For consistent results, we recommend loading the same environment that was used in this project, which is saved in the `renv.lock` file. The following steps can be used to load the same environment: ```{r load enviroment,eval=Eval} # install `renv` package if necessary if (!require(renv)) install.packages('renv') # Restore the project environment renv::activate(project = "/path/to/project") ``` The first line of code will check load (and install if necessary) the R pacakge `renv` to manage the version of the dependences. The `activate()` function will install all necessary packages with their respective version into a local container accessible only by this project. If you're not already in the path of the project, make sure to replace `/path/to/project` with the path to the directory where the project is stored. ```{r loadPackages,results='hide',message=FALSE,warning=FALSE,eval=Eval} library(raster) library(exactextractr) library(tidyverse) library(spatstat) library(sf) library(spsurvey) library(MBHdesign) # to reproduce the same results set.seed(0.0) ``` ## Custom parameters Here is the main section the user will need to interact. Below is a list of all the variables that can be used to customize the sampling process, each accompanied by detailed comments. ```{r parameters,eval=Eval} # Threshold to determine whether a hexagon is suitable for sampling is based on # the proportion of NA pixels (non-natural habitat). # `0.8` means a hexagon has to have at least 20% of habitat pixels to be # suitable for sampling prop_na = 0.8 # Sample effort in percentage sample_effort = 0.02 # Total sample size for SSU within a hexagon (Main + Over) # It must be an even numbers ssu_N = 6 # Number of replications when selecting the PSU with the GRTS algorithm nb_rep = 15 # Code of ecoregions to sample eco_sim = c( '7', '28', '30', '31', '46', '47', '48', '49', '73', '77', '78', '86', '72', '74', '75', '76', '96', '99', '100', '101', '102', '103', '117', '216', '217' ) # File path of the csv file used to store the legacy sites # E.g. https://github.com/willvieira/sampling_BMS/blob/main/data/legacySites.csv # `lat` and `lon` arguments are used to define the character name of the # columns in the csv and columns to extract legacy sites legacyFile = file.path('..', 'data', 'legacySites.csv') lat = 'latitude' lon = 'longitude' # Buffer size (in Km) to adjust inclusion probability of hexagons around the # legacy sites using the MBHdesign R package bufferSize_p = 10 # Buffer size (in Km) to adjust the sample size of an ecoregion as a function of # the number and distribution of its legacy sites # The value provided can either be a single number applied to all ecoregions or # a file path containing the buffer size information for each ecoregion. # E.g.https://github.com/willvieira/sampling_BMS/blob/main/data/bufferSize_N.csv bufferSize_N = file.path('..', 'data', 'bufferSize_N.csv') # If you want to assign the same buffer size (in Km) for all ecoregion: # bufferSize_N = 15 # Distance between SSU centroid (in meters) ssu_dist = 294 # Path to save the shapefiles with the selected PSU and SSU outputFolder = file.path('output', 'selection2023') # suffix to add for each output layer # e.g.: PSU-SOQB_ALL-SUFFIX.shp fileSuffix = 'V2023' ``` ## Prepare layers for sampling The first step is loading the complete hexagons list within the study area. While loading, we keep only the hexagons for the ecoregions of interest (`eco_sim`) and remove the hexagons with too many NA habitats. We then compute the hexagon inclusion probability from the habitat and cost layers normalized for the study area. ```{r loadHex,eval=Eval} hexas <- readRDS(file.path('..', 'data', 'hexa_complete.RDS')) |> filter(propNA <= prop_na) |> filter(ecoregion %in% eco_sim) |> mutate( p = (hab_prob * cost_prob) / sum(hab_prob * cost_prob) ) |> filter(p != 0) ``` The legacy sites are the second layer of information needed to run the GRTS. Bellow the code is written to load the list of coordinate points from the legacy sites and create a data frame describing the count number of legacy sites per hexagon. This procedure scales the point data to the hexagon level, allowing us to modify the inclusion probability of the neighboring hexagons and adjust the ecoregion sample size. ```{r legacySites,warning=FALSE,message=FALSE,eval=Eval} # function to transform Latitude & longitude legacy site points in a table # with the number of points per hexagon ID (ET_Index) import_legacySites <- function(File, lat_name, lon_name) { # transform hexagons projection to the same of the legacy points hx <- hexas |> st_transform(4326) # read legacy csv file lg <- read_csv(File, show_col_types = FALSE) |> rename( lat = all_of(lat_name), lon = all_of(lon_name) ) |> st_as_sf( coords = c('lon', 'lat'), crs = st_crs(hx) ) # intersect legacy points with hexagon polygons nbLegacy <- hx |> st_contains(lg, sparse = FALSE) |> apply(1, sum) # Return a transformed data to data.frame # and keep only the hexagons that contains legacy sites tibble( ET_Index = hx$ET_Index, nbLegacySites = nbLegacy ) |> filter(nbLegacySites > 0) } # load and transform legacy sites (slow function) legacySites <- import_legacySites( File = legacyFile, lat_name = lat, lon_name = lon ) # merge legacy sites data.frame into hexagon object hexas <- hexas |> left_join(legacySites) |> mutate(nbLegacySites = replace_na(nbLegacySites, 0)) ``` ## Adjust sample size and inclusion probability as a function of legacy sites After merging the hexagons and legacy sites, the next step is to adjust the sample size and modify the inclusion probabilities. The sample size, defined in the number of hexagons, is defined in function of the size of the ecoregion, the sampling effort (`sample_effort`), and the number of legacy sites. The input `bufferSize_N` quantify the effect of each legacy site in reducing the final sample size. The value provided can either be a single number that will be applied to all ecoregions or a file path that contains the buffer size information for each individual ecoregion. For more details on defining the buffer size to adjust the sample size, please refer to @sec-legacy-sites. ```{r sampleSizeF,warning=FALSE,message=FALSE,eval=Eval} # function to get sample size for a specific ecoregion given: # number of hexagons, number of legacy sites, bufferSize, and sample effort get_sampleSize <- function(eco, hx, bf_N, sample_e) { # get the hexagons centroid for a specific ecoregion hexa_eco <- hx |> filter(ecoregion == eco) |> st_centroid() if(nrow(subset(bf_N, ecoregion == eco)) > 0) { # create a buffer of size `BufferSize_N` around the legacy site centroid hexa_legacy_bf <- hexa_eco |> filter(nbLegacySites > 0) |> st_buffer(subset(bf_N, ecoregion == eco)$bufferSize * 1000) |> st_union() # Compute the number of hexagons from the ecoregion in which their # centroid falls within the buffer around the legacy sites nbHexas_legacy <- hexa_eco |> st_intersects(hexa_legacy_bf) |> unlist() |> sum() }else{ nbHexas_legacy = 0 } # Compute the adjusted sample size using only the total amount of hexagons # that do not fall within the legacy buffer adj_sampleSize <- round((nrow(hexa_eco) - nbHexas_legacy) * sample_e, 0) # If the ecoregion is too small to accommodate 2 hexagons with the defined # sampling effort, ensure to sample at least two hexagons. if((nrow(hexa_eco) * sample_e) < 2 & nbHexas_legacy < 1) adj_sampleSize = 2 return(adj_sampleSize) } ``` ```{r loadBuffFile,warning=FALSE,message=FALSE,eval=Eval} # Load buffer size information regardless of the `byfferSize_N` class if(is.character(bufferSize_N)) { if(file.exists(bufferSize_N)) { buffSizeN <- read_csv(bufferSize_N, show_col_types = FALSE) |> mutate(ecoregion = as.character(ecoregion)) }else{ stop( paste0('File "', bufferSize_N, '" does not exist. Please check if the name is correct.') ) } }else if(is.numeric(bufferSize_N)){ buffSizeN <- tibble( ecoregion = eco_sim, bufferSize = bufferSize_N ) }else{ stop('The type of `bufferSize_N` must be either numeric or a character') } ``` ```{r RunsampleSizeF,warning=FALSE,message=FALSE,eval=Eval} # Run the function to define sample size for all selected ecoregions sampleSize <- map_dbl( setNames(eco_sim, paste0('eco_', eco_sim)), get_sampleSize, hx = hexas, bf_N = buffSizeN, sample_e = sample_effort ) # Define the stratum object with ecoregion that have # at least a sample size of 1 Stratdsgn <- sampleSize[sampleSize > 0] # Because we are sampling an extra hexagon for each selected one, # make sure that each ecoregion has at least 2 times more available # hexagons than sample N eco_to_remove <- hexas |> st_drop_geometry() |> group_by(ecoregion) |> summarise(nbHexas = n()) |> left_join( sampleSize |> enframe() |> mutate( ecoregion = as.character(parse_number(name)), sampleSize_extra = value * 2 ) |> select(ecoregion, sampleSize_extra) ) |> mutate( diff = nbHexas - sampleSize_extra ) |> filter(diff < 0) |> pull(ecoregion) Stratdsgn <- Stratdsgn[!names(Stratdsgn) %in% paste0('eco_', eco_to_remove)] # Prepare sample frame to be used in GRTS # scale inclusion probability to the total sample size # Use only the centroid of the hexagon as a sample point sampleFrame <- hexas |> filter(ecoregion %in% parse_number(names(Stratdsgn))) |> mutate( eco_name = paste0('eco_', ecoregion), # to match design name mdcaty = sum(Stratdsgn) * p/sum(p), geometry = sf::st_geometry(sf::st_centroid(geometry)) ) ``` The last step involves adjusting inclusion probabilities for neighboring hexagons around legacy sites. To do this, we used the `MBHdesign` R package and adjusted the inclusion probability based on the `bufferSize_p` parameter. ```{r adjInclProb,warning=FALSE,message=FALSE,eval=Eval} # Coordinates of all hexagons in matrix format for MBHdesign coord_mt <- sampleFrame |> st_coordinates() # Coordinates of legacy hexagons legacySites <- sampleFrame |> filter(nbLegacySites > 0) |> st_coordinates() # Adjust inclusion probability of neighbour hexagons in function of legacy sites sampleFrame$adj_p <- MBHdesign::alterInclProbs( legacy.sites = legacySites, potential.sites = coord_mt, inclusion.probs = sampleFrame$mdcaty, sigma = bufferSize_p * 1000 ) ``` ## Primary Sampling Unit (PSU) The `spsurvey` R package was used to implement the GRTS algorithm for sampling the PSUs of each ecoregion stratum. The sample size for each stratum, as defined in `Stratdsn`, is used to sample the hexagon points from the `sampleFrame` object. Note that the same sample size stratum is used for sampling the replacement (`n_over`) sites for each ecoregion stratum. Within each stratum, the sample selection is weighted based on the inclusion probability `adj_p`, which is derived from the habitat, cost, and legacy site layers. We replicate this sampling process `nb_rep` times, sum the total cost of sampling for the selected hexagons, and choose the replication with the lowest cost. ```{r runGRTS, warning=FALSE,message=FALSE,eval=Eval} # list to store the hexagons ID (ET_Index) for the main and the replacement grts_main <- grts_over <- list() # Run GRTS selection over nb_rep times for(Rep in 1:nb_rep) { out_sample <- spsurvey::grts( sframe = sampleFrame, n_base = Stratdsgn, n_over = Stratdsgn, stratum_var = 'eco_name', aux_var = 'adj_p' ) grts_main[[paste0('Rep_', Rep)]] <- out_sample$sites_base$ET_Index grts_over[[paste0('Rep_', Rep)]] <- out_sample$sites_over$ET_Index } # Select cheapest replication (based on 'main' selection) cheapest_rep <- map_df( grts_main, ~ hexas |> st_drop_geometry() |> filter(ET_Index %in% .x) |> summarise(totalCost = sum(costSum)) ) |> pull(totalCost) |> which.min() # Save selected main and replacement hexagons selected_main <- hexas |> filter(ET_Index %in% grts_main[[cheapest_rep]]) selected_over <- hexas |> filter(ET_Index %in% grts_over[[cheapest_rep]]) ``` Since the replacement hexagons are randomly selected in the stratum space, we also selected an extra layer of replacement hexagons that are obligatory neighbors of each main hexagon. For each selected main hexagon, we extract all neighboring hexagons and select the one with the highest inclusion probability. If a selected main hexagon has no available neighbor hexagon, we will select a random one over the ecoregion. ```{r neighbor, message=FALSE,warning=FALSE,eval=Eval} # object to store extra layer of selected hexagons selected_extra <- hexas[0, ] # loop over ecoregions to make sure neighbours are from the same ecoregion for(eco in parse_number(names(Stratdsgn))) { hexas_eco <- subset(hexas, ecoregion == eco) hexas_sel_eco <- subset(selected_main, ecoregion == eco) # Extract neighbours hexagons neigh_mt <- hexas_eco |> st_centroid() |> st_intersects( y = st_buffer(hexas_sel_eco, dist = 3500), sparse = FALSE ) # Remove the focus hexagon (the selected one) rr = match(hexas_sel_eco$ET_Index, hexas_eco$ET_Index) cc = seq_along(rr) neigh_mt[rr + nrow(neigh_mt) * (cc - 1)] <- FALSE # Select the extra hexagon based on the highest p best_p <- apply( neigh_mt, 2, function(x) hexas_eco$ET_Index[x][which.max(hexas_eco$p[x])] ) # if a selected hexagon has no neighbour, select a random from the ecoregion toCheck <- unlist(lapply(best_p, length)) if(any(toCheck == 0)) { # which hexagons were not selected? nonSelected_hexas <- setdiff(hexas_eco$ET_Index, hexas_sel_eco$ET_Index) # sample from non selected hexas best_p[which(toCheck == 0)] <- sample( nonSelected_hexas, sum(toCheck == 0) ) } selected_extra <- rbind( selected_extra, hexas_eco[match(best_p, hexas_eco$ET_Index), ] ) } ``` ## Secondary Sampling Unit (SSU) For each of the selected PSU hexagons (`main`, `over`, and `extra`), we generate a grid of SSU points equally spaced with a distance defined by `ssu_dist`. The following function to generate SSUs is from the Ontario's sampling approach ([code source](https://github.com/dhope/BASSr)). ```{r funGridSSU,message=FALSE,warning=FALSE,eval=Eval} genSSU <- function(h, spacing) { ch <- as_tibble(st_coordinates(h)) top_point <- ch[which.max(ch$Y),] bottom_point <- ch[which.min(ch$Y),] gridsize <- 2*floor(abs(top_point$Y-bottom_point$Y)/spacing)+3 rowAngle <- tanh((top_point$X-bottom_point$X)/(top_point$Y-bottom_point$Y)) cent <- st_centroid(h) %>% bind_cols(as_tibble(st_coordinates(.))) %>% st_drop_geometry %>% dplyr::select(ET_Index, X, Y) genRow <- function(cX, cY, sp,...){ tibble(rowid = seq(-gridsize,gridsize)) %>% mutate(X = sin(60*pi/180+rowAngle) *sp*rowid + {{cX}}, Y = cos(60*pi/180+rowAngle) *sp*rowid + {{cY}}) } centroids <- tibble(crowid=seq(-gridsize,gridsize)) %>% mutate(cY = cos(rowAngle) *spacing*crowid + cent$Y, #spacing/2*crowid + cent$Y, cX = sin(rowAngle) *spacing*crowid + cent$X) %>% #cent$X + crowid* sqrt(spacing**2-(spacing/2)**2)) %>% rowwise() |> mutate(row = list(genRow(cX = cX,cY = cY,sp = spacing))) %>% unnest(row) |> dplyr::select(X,Y) %>% st_as_sf(coords = c("X", "Y"), crs = st_crs(h)) %>% st_filter(h) %>% mutate( ET_Index = h$ET_Index, ecoregion = h$ecoregion, ssuID = row_number() ) return(centroids) } ``` The following step involves defining a function to sample the SSUs within each selected PSU hexagon. SSU sampling is done sequentially, so for each selected SSU, the first and second layers of neighbour points (6 and 12, respectively) are made unavailable for sampling the next SSU. We repeat this process until we reach the total sampling size (`ssu_N`) or when there are no more available SSUs left to sample. ```{r funSampleSSU,message=FALSE,warning=FALSE,eval=Eval} sample_SSU <- function(ssuid, prob, geom, filtered, ssuDist, N) { # check if N is even if(N %% 2 != 0) stop('`ssu_N` must be a even number.') filtered_1 <- filtered out <- rep(0, length(filtered)) # loop to sample N SSUs count = 1 while( count <= N & sum(get(paste0('filtered_', count))) > 1 ){ # sample point assign( paste0('sample_', count), sample( ssuid[get(paste0('filtered_', count))], size = 1, prob = prob[get(paste0('filtered_', count))] ) ) # remove points around the first sample for second point toKeep <- !st_intersects( geom, st_buffer( geom[which(get(paste0('sample_', count)) == ssuid)], dist = ssuDist * 2 + ssuDist * 0.1), sparse = FALSE )[, 1] # update available points assign( paste0('filtered_', count + 1), get(paste0('filtered_', count)) & toKeep ) # assign point to output vector out[get(paste0('sample_', count))] = count count = count + 1 } return( out ) } ``` With these two functions, we can loop over the `main`, `over`, and `extra` selected hexagons to create and sample SSU points. First, we create the SSU points for all selected hexagons. Then, we generate a buffer around each SSU point with a distance of `ssu_dist/2`. We use the habitat pixels within each buffer to calculate the inclusion probability and the proportion of NA pixels. To save computation time, we compute and store the six neighbour points around each SSU point. Before sampling the SSU points, we classify each point as available or not following these rules: - it must have 6 neighbours (less than that means it's a border SSU) - it must have at least 1 - `prop_na` of non-empty pixels - 4 out of 6 neighbours must also meet these rules In the final step, we assign a specific code to each selected SSU point and its neighbors, following the `A_B` pattern, where `A` represents the SSU sample ID (ranging from 1 to `ssu_N`), and `B` represents the neighbor ID (ranging from 0 to 6). Here, 0 represents the focal point, and 1 to 6 represent the six neighbors, starting from the top point and moving clockwise. please refer to @sec-ssu for more details. ```{r SampleSSU,message=FALSE,warning=FALSE,eval=Eval} # Habitat shapefile to calculate inclusion probability land_ca <- raster(file.path('..', 'data', 'landcover_ca_30m.tif')) # Distribution of habitat types per ecoregion prev_all <- readRDS(file.path('..', 'data', 'prev_all.RDS')) for(lyr in c('main', 'over', 'extra')) { sel_lyr <- get(paste0('selected_', lyr)) # Generate SSU points SSUs <- map_dfr( seq_len(nrow(sel_lyr)), ~ genSSU(sel_lyr[.x, ], spacing = ssu_dist) ) # Buffer of half `ssu_dist` to compute habitat inclusion prob SSU_bf <- st_buffer(SSUs, dist = ssu_dist/2) # extract pixels for each SSU polygon hab_pixels <- exactextractr::exact_extract( land_ca, SSU_bf, progress = FALSE ) rm(SSU_bf) # get frequence of each class of habitat count_hab <- Map( function(x, y) { freq <- table(x$value) if(length(freq) > 0) { data.frame(freq, ecoregion = y) }else{ NA } }, x = hab_pixels, y = SSUs$ecoregion ) # merge with inclusion probability # and calculate inclusion probability for each NON empty polygon SSUs$incl_prob <- unlist( lapply( count_hab, function(x) { if(is.data.frame(x)) { mg_df <- merge( x, subset( prev_all, ID_ecoregion == x$ecoregion[1] ), by.x = "Var1", by.y = "code", all.x = TRUE ) sum(mg_df$Freq * mg_df$incl_prob) }else{ NA } } ) ) rm(count_hab) SSUs <- SSUs |> group_by(ET_Index) |> mutate( incl_prob = incl_prob/sum(incl_prob, na.rm = TRUE) ) |> ungroup() # Calculate proportion of NA SSUs$propNA <- map_dbl( hab_pixels, ~ sum(is.na(.x$value))/nrow(.x) ) rm(hab_pixels) # neighbours matrix neighbour_ls <- list() for(hx in unique(SSUs$ET_Index)) { ssuhx <- subset(SSUs, ET_Index == hx) neighbour_ls[[hx]] <- st_intersects( ssuhx, st_buffer(ssuhx, dist = ssu_dist + ssu_dist * 0.1), sparse = FALSE ) } # These are the following rules to a SSU be suitable for sampling: # - Must have 6 neighbours (less than that means it's a border SSU) # - Must have at least 1 - `prop_na` of non empty pixels # - 4 out 6 neighbours must also respect the above rule SSU_selected = SSUs |> group_by(ET_Index) |> mutate( nbNeighb = map_dbl( ssuID, ~ sum(neighbour_ls[[unique(ET_Index)]][, .x]) - 1 ), nbPropNA = map_int( ssuID, .f = function(x, pNA, ETI) sum( pNA[ setdiff(which(neighbour_ls[[ETI]][, x]), x) ] <= prop_na ), pNA = propNA, ETI = unique(ET_Index) ), sampled = sample_SSU( ssuid = ssuID, prob = incl_prob, geom = geometry, filtered = propNA <= prop_na & nbNeighb == 6 & nbPropNA >= 4, ssuDist = ssu_dist, N = ssu_N ) ) |> ungroup() # Prepare selected SSU and their specific neighbours with code like `A_B` # `A` is for the SSU sample ID (1:`ssu_N`) # `B` is for the neighbour ID (0:6 where zero is the focal point, and # 1:6 are the 6 neighbours starting from the top point moving clockwise) SSU_lyr <- subset(SSU_selected, sampled > 0) SSU_lyr_ls <- list() for(i in 1:nrow(SSU_lyr)) { # get neighbours for specific row nei_hx <- SSU_selected |> filter(ET_Index == SSU_lyr$ET_Index[i]) |> filter( ssuID %in% which( neighbour_ls[[SSU_lyr$ET_Index[i]]][, SSU_lyr$ssuID[i]] ) ) # code A nei_hx$codeA <- SSU_lyr$sampled[i] # code B nei_hx$codeB <- c(4, 3, 5, 0, 2, 6, 1) SSU_lyr_ls[[i]] <- nei_hx } # save assign( paste0('SSU_all_', lyr), SSU_selected ) assign( paste0('SSU_', lyr), do.call(rbind, SSU_lyr_ls) ) } ``` ## Export PSU and SSU samples in shapefiles The purpose of this final section is to export all PSU and SSU points in a shapefile format. The format was chosen to facilitate post-processing, but it can be modified to fit different needs. The first code chunk saves all PSU and SSU points into a single file, while the second code chunk separates the same sampled points by ecoregion. #### Grouped ecoregions ```{r,eval=Eval, message=FALSE, warning=FALSE} savePath = file.path(outputFolder, 'allEcoregion') dir.create(savePath, recursive = TRUE) varsToRm = c( 'OBJECTID', 'Join_Count', 'TARGET_FID', 'ET_ID', 'ET_ID_Old', 'ET_IDX_Old' ) hexas_save <- hexas |> select(-all_of(varsToRm)) # rename attributes table so it has a maximum of 10 characters names(hexas_save) <- abbreviate(names(hexas_save), minlength = 10) # add coordinates coords <- hexas_save |> sf::st_centroid() |> sf::st_transform(4326) |> sf::st_coordinates() |> as.data.frame() hexas_save <- hexas_save |> mutate( latitude = coords$Y, longitude = coords$X ) # save all PSU hexas_save |> write_sf( file.path( savePath, paste0('PSU-SOBQ_ALL-', fileSuffix, '.shp') ) ) # Selected PSUs for(lyr in c('main', 'over', 'extra')) { hexas_save |> filter( ET_Index %in% get(paste0('selected_', lyr))$ET_Index ) |> write_sf( file.path( savePath, paste0('PSU-SOBQ_', lyr, '-', fileSuffix, '.shp') ) ) } # Save all SSU for(lyr in c('main', 'over', 'extra')) { # SSU main SSU_lyr <- get(paste0('SSU_all_', lyr)) |> select(-c('nbNeighb', 'nbPropNA', 'sampled')) coords <- SSU_lyr |> sf::st_transform(4326) |> sf::st_coordinates() |> as.data.frame() SSU_lyr |> mutate( latitude = coords$Y, longitude = coords$X ) |> write_sf( file.path( savePath, paste0('SSU-SOBQ_ALL_', lyr, '-', fileSuffix, '.shp') ) ) } # Save selected SSU for(lyr in c('main', 'over', 'extra')) { # SSU main SSU_lyr <- get(paste0('SSU_', lyr)) |> select(-c('nbNeighb', 'nbPropNA', 'sampled')) coords <- SSU_lyr |> sf::st_transform(4326) |> sf::st_coordinates() |> as.data.frame() SSU_lyr |> mutate( latitude = coords$Y, longitude = coords$X ) |> write_sf( file.path( savePath, paste0('SSU-SOBQ_selected_', lyr, '-', fileSuffix, '.shp') ) ) } ``` #### Splited ecoregions ```{r,eval=Eval, message=FALSE, warning=FALSE} for(eco in eco_sim) { # create folder eco_path <- file.path( outputFolder, 'byEcoregion', paste0('ecoregion_', eco) ) dir.create(eco_path, recursive = TRUE) # PSU ########################################### hexas_save |> filter(ecoregion == eco) |> write_sf( file.path( eco_path, paste0('PSU-SOBQ_eco', eco, '_ALL-', fileSuffix, '.shp') ) ) for(lyr in c('main', 'over', 'extra')) { hexas_save |> filter( ET_Index %in% subset( get(paste0('selected_', lyr)), ecoregion == eco )$ET_Index ) |> write_sf( file.path( eco_path, paste0( 'PSU-SOBQ_eco', eco, '_', lyr, '-', fileSuffix, '.shp' ) ) ) } # SSU ########################################### # Save all SSU for(lyr in c('main', 'over', 'extra')) { # SSU main SSU_lyr <- get(paste0('SSU_all_', lyr)) |> filter(ecoregion == eco) |> select(-c('nbNeighb', 'nbPropNA', 'sampled')) coords <- SSU_lyr |> sf::st_transform(4326) |> sf::st_coordinates() |> as.data.frame() SSU_lyr |> mutate( latitude = coords$Y, longitude = coords$X ) |> write_sf( file.path( eco_path, paste0( 'SSU-SOBQ_eco', eco, '_ALL_', lyr, '-', fileSuffix, '.shp' ) ) ) } # Save selected SSU for(lyr in c('main', 'over', 'extra')) { # SSU main SSU_lyr <- get(paste0('SSU_', lyr)) |> filter(ecoregion == eco) |> select(-c('nbNeighb', 'nbPropNA', 'sampled')) coords <- SSU_lyr |> sf::st_transform(4326) |> sf::st_coordinates() |> as.data.frame() SSU_lyr |> mutate( latitude = coords$Y, longitude = coords$X ) |> write_sf( file.path( eco_path, paste0( 'SSU-SOBQ_eco', eco, '_selected_', lyr, '-', fileSuffix, '.shp' ) ) ) } } ```