Package 'itsdm'

Title: Isolation Forest-Based Presence-Only Species Distribution Modeling
Description: Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.
Authors: Lei Song [aut, cre] , Lyndon Estes [ths]
Maintainer: Lei Song <[email protected]>
License: MIT + file LICENSE
Version: 0.2.1
Built: 2024-11-02 03:57:46 UTC
Source: https://github.com/lleisong/itsdm

Help Index


Isolation forest-based presence-only species distribution modeling

Description

This package is a wrapper for a few packages including isotree, outliertree, fastshap, etc. It does purely presence-only species distribution modeling with isolation forest and variations such as SCiForest and EIF. It also provides functions to make response curves, analyze variable importance, analyze variable dependence and analyze variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including worldclim version 2.0 and CMCC-BioClimInd. There are also functions to detect outliers in the occurrence dataset to do data cleaning.

Details

This package provides multiple features.

  1. Download bioclimatic variables and reduce their dimensions. This includes historic and future climatic indicators from two sources:

  2. Detect suspicous environmental outliers.

  3. Fit a isolation forest-based SDM.

  4. Make presence-only evaluation.

  5. Generate response curves of environmental variables including marginal and independent responses and analyze interactions between environmental variables.

  6. Analyze variable importance using Shapley values.

  7. Convert predicted environmental suitability to presence-absence map.

  8. Analyze variable contributions to any specific observations.

Author(s)

Lei Song [email protected]

Maintainer: Lei Song [email protected]

References

Please check references in R documentation of each specific function.


Download historic Bioclimatic indicators (BIOs) named CMCC-BioClimInd.

Description

Parse historic CMCC-BioClimInd bioclimatic indicators optionally with a setting of boundary and a few other options.

Usage

cmcc_bioclim(bry = NULL, path = NULL, nm_mark = "clip", return_stack = TRUE)

Arguments

bry

(sf or sp) The boundary to mask the downloaded original data. If NULL, it would get global map. If not NULL, it can take sf, sfc, SpatialPolygonsDataFrame, SpatialPolygons, etc. The default is NULL.

path

(character) The path to save the downloaded imagery. If NULL, it would use the current working directory. The default is NULL.

nm_mark

(character) the name mark of clipped images. The default is "clip". It would be ignored if bry is NULL.

return_stack

(logical) if TRUE, stack the imagery together and return. If the area is large and resolution is high, it is better not to stack them. The default is TRUE.

Details

Web page page for this dataset

Value

if return_stack is TRUE, the images would be returned as a stars. Otherwise, nothing to return, but the user would receive a message of where the images are.

Note

The function is experimental at the moment, because the download server of this dataset is not as stable as Worldclim yet. If it fails due to slow internet, try to set a larger timeout option, e.g., using options(timeout = 1e3).

References

Noce, Sergio, Luca Caporaso, and Monia Santini."A new global dataset of bioclimatic indicators. "Scientific data 7.1 (2020): 1-12. doi:10.1038/s41597-020-00726-5

Examples

## Not run: 
library(dplyr)
library(sf)
library(itsdm)
bry <- st_polygon(
  list(rbind(c(29.34, -11.72), c(29.34, -0.95),
             c(40.31, -0.95), c(40.31, -11.72),
             c(29.34, -11.72)))) %>%
  st_sfc(crs = 4326)

cmcc_bios <- cmcc_bioclim(bry = bry,
  nm_mark = 'tza', path = tempdir())

## End(Not run)

Convert predicted suitability to presence-absence map.

Description

Use threshold-based, logistic or linear conversion method to convert predicted suitability map to presence-absence map.

Usage

convert_to_pa(
  suitability,
  method = "logistic",
  beta = 0.5,
  alpha = -0.05,
  a = 1,
  b = 0,
  species_prevalence = NA,
  threshold = 0.5,
  seed = 10L,
  visualize = TRUE
)

Arguments

suitability

(stars or RasterLayer) The suitability raster.

method

(character) The conversion method, must be one of 'threshold', 'logistic', and 'linear'. The default is 'logistic'.

beta

(numeric) Works for 'threshold' or 'logistic' method. If method is threshold, then beta is the threshold value to cutoff. If method is logistic, it is the sigmoid midpoint. The default is 0.5.

alpha

(numeric) Works for logistic method. It is the logistic growth rate or steepness of the curve. The default is -.05.

a

(numeric) Works for linear method. It is the slope of the line. The default is 1.

b

(numeric) Works for linear method. It is the intercept of the line. The default is 0.

species_prevalence

(numeric or NA) Works for all three methods. It is the species prevalence to classify suitability map. It could be NA, when the will be calculated automatically based on other arguments. The default is NA.

threshold

(numeric) The threshold used to convert probability of occurrence to presence-absence map. It ranges in ⁠[0, 1]⁠. The default is 0.5.

seed

(integer) The seed for random progress. The default is 10L

visualize

(logical) If TRUE, plot map of suitability, probability of occurrence, and presence-absence together. The default is TRUE.

Details

Multiple methods and arguments could be used as a combination to do the conversion.

Value

(PAConversion) A list of

  • suitability (stars) The input suitability map

  • probability_of_occurrence (stars) The map of occurrence probability

  • pa_conversion (list) A list of conversion arguments

  • pa_map (stars) The presence-absence map

References

c onvertToPA in package virtualspecies

See Also

plot.PAConversion

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 5,
  sample_size = 0.8, ndim = 1L,
  nthreads = 1,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction,
                          method = 'threshold', beta = 0.5, visualize = FALSE)
pa_thred
plot(pa_thred)

## Not run: 
# Logistic conversion
pa_log <- convert_to_pa(mod$prediction, method = 'logistic',
                        beta = 0.5, alpha = -.05)

# Linear conversion
pa_lin <- convert_to_pa(mod$prediction, method = 'linear',
                        a = 1, b = 0)

## End(Not run)

Detect areas influenced by a changing environment variable.

Description

Use shapley values to detect the potential areas that will impact the species distribution. It only works on continuous variables.

Usage

detect_envi_change(
  model,
  var_occ,
  variables,
  target_var,
  bins = NULL,
  shap_nsim = 10,
  seed = 10,
  var_future = NULL,
  variables_future = NULL,
  pfun = .pfun_shap,
  method = "gam",
  formula = y ~ s(x)
)

Arguments

model

(isolation_forest or other model). It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

target_var

(character) The selected variable to process.

bins

(integer) The bin to cut the target variable for the analysis. If it is NULL, no cut to apply. The default is NULL.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap. When the number of variables is large, a smaller shap_nsim could be used. Be cautious that making SHAP-based spatial dependence will be slow because of Monte-Carlo computation for all pixels. But it is worth the time because it is much more informative. See details in documentation of function explain in package fastshap. The default is 10. Usually a value 10 - 20 is enough.

seed

(integer) The seed for any random progress. The default is 10L.

var_future

(numeric or stars) A number to apply to the current variable or a stars layer as the future variable. It can be NULL if variables_future is set.

variables_future

(stars) A stars raster stack for future variables. It could be NULL if var_future is set.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

method

Argument passed on to geom_smooth to fit the line. Note that the same arguments will be used for all target variables. User could set variable one by one to set the arguments separately. Default value is "gam".

formula

Argument passed on to geom_smooth to fit the line. Note that the same arguments will be used for all target variables. User could set variable one by one to set the arguments separately. The default is y ~ s(x).

Details

The values show how changes in environmental variable affects the modeling prediction in space. These maps could help to answer questions of where will be affected by a changing variable.

Value

(EnviChange) A list of

  • A figure of fitted variable curve

  • A map of variable contribiution change

  • Tipping points of variable contribution

  • A stars of variable contribution under current and future condition, and the detected changes

See Also

shap_spatial_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")
#'
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 5,
  sample_size = 0.8, ndim = 1L,
  nthreads = 1,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Use a fixed value
bio1_changes <- detect_envi_change(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1,
  target_var = "bio1",
  var_future = 5)

## Not run: 
# Use a future layer
## Read the future Worldclim variables
future_vars <- system.file(
  'extdata/future_bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  split() %>% select(bioc1, bioc12)
# Rename the bands
names(future_vars) <- paste0("bio", c(1, 12))

## Just use the target future variable
climate_changes <- detect_envi_change(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1,
  target_var = "bio1",
  var_future = future_vars %>% select("bio1"))

## Use the whole future variable tack
bio12_changes <- detect_envi_change(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1,
  target_var = "bio12",
  variables_future = future_vars)

print(bio12_changes)

##### Use Random Forest model as an external model ########
library(randomForest)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
  filter(usage == "train")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12)) %>%
  split()

model_data <- stars::st_extract(
  env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
  as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
  mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)

mod_rf <- randomForest(
  occ ~ .,
  data = model_data,
  ntree = 200)

pfun <- function(X.model, newdata) {
  # for data.frame
  predict(X.model, newdata, type = "prob")[, "1"]
}

# Use a fixed value
bio5_changes <- detect_envi_change(
  model = mod_rf,
  var_occ = model_data %>% select(-occ),
  variables = env_vars,
  target_var = "bio5",
  bins = 20,
  var_future = 5,
  pfun = pfun)

plot(bio5_changes)

## End(Not run)

Remove environmental variables that have high correlation with others.

Description

Select environmental variables that have pairwise Pearson correlation lower than a user-defined threshold. NOTE that it only works on numeric variables, does not work on categorical variables.

Usage

dim_reduce(
  img_stack = NULL,
  threshold = 0.5,
  preferred_vars = NULL,
  samples = NULL
)

Arguments

img_stack

(stars or RasterStack) The image stack to work on.

threshold

(numeric) The threshold number of Pearson correlation that indicates two variables are strongly correlated. The default is 0.5.

preferred_vars

(vector of character) The preferred variables in order in dimension reduction. The preferred variables will move to the beginning before the reduction. So make sure they are placed in order. Furthermore, setting preferred_vars does not guarantee they can survive. For example, one preferred variable that is placed later has strong correlation with former preferred variable.

samples

(sf or sp) The samples to reduce dimension. If not NULL, it can take sf, sfc, SpatialPointsDataFrame, SpatialPoints, etc. If NULL, the whole raster stack would be used. The default is NULL.

Value

(ReducedImageStack) A list of

  • threshold (numeric) The threshold set in function inputs

  • img_reduced (stars) The image stack after dimension reduction

  • cors_original (data.frame) A table of Pearson correlations between all variables.

  • cors_reduced (data.frame) A table of Pearson correlations between variables after dimension reduction.

Examples

library(sf)
library(itsdm)
library(stars)
library(dplyr)
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars()
img_reduced <- dim_reduce(env_vars, threshold = 0.7,
  preferred_vars = c('bio1', 'bio12'))

Evaluate the model based on presence-only data.

Description

This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.

Usage

evaluate_po(
  model,
  occ_pred,
  bg_pred = NULL,
  var_pred,
  threshold = NULL,
  visualize = FALSE
)

Arguments

model

(isolation_forest) The extended isolation forest SDM. It could be the item model of POIsotree made by function isotree_po.

occ_pred

(vector of numeric) A vector contains predicted values at occurrence locations.

bg_pred

(vector of numeric) the vector contains predicted values with same number of background points.

var_pred

(vector of numeric) the vector contains predicted values of the whole area. The reason to take a vector is to keep this function flexible for multiple types of output.

threshold

(numeric or NULL) The threshold to calculate threshold-based evaluation metrics. If NULL, a recommended threshold will be calculated based on optimal TSS value. The default is NULL.

visualize

(logical) If TRUE, plot the evaluation figures. The default is FALSE.

Details

  • CVI is the proportion of presence points falling in cells having a threshold (0.5 for example) habitat suitability index minus the proportion of cells within this range of threshold of the model. Here we used varied thresholds: 0.25, 0.5, and 0.75.

  • continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.

  • ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.

  • Sensitivity (TPR) = TP/(TP + FN)

  • Specificity (TNR) = TN/(TN + FP)

  • True skill statistic (TSS) = Sensitivity + specificity - 1

  • Jaccard's similarity index = TP/(FN + TP + FP)

  • Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)

  • Overprediction rate = FP/(TP + FP)

  • Underprediction rate = FN/(TP + FN)

Value

(POEvaluation) A list of

  • po_evaluation is presence-only evaluation metrics. It is a list of

    • cvi (list) A list of CVI with 0.25, 0.5, and 0.75 as threshold

    • boyce (list) A list of items related to continuous Boyce index (CBI)

    • roc_ratio (list) A list of ROC ratio and AUC ratio

  • pb_evaluation is presence-background evaluation metrics. It is a list of

    • confusion matrix (table) A table of confusion matrix. The columns are true values, and the rows are predicted values.

    • sensitivity (numeric) The sensitivity or TPR

    • specificity (numeric) The specificity or TNR

    • TSS (list) A list of info related to true skill statistic (TSS)

      • cutoff (vector of numeric) A vector of cutoff threshold values

      • tss (vector of numeric) A vector of TSS for each cutoff threshold

      • Recommended threshold (numeric) A recommended threshold according to TSS

      • Optimal TSS (numeric) The best TSS value

    • roc (list) A list of ROC values and AUC value

    • Jaccard's similarity index (numeric) The Jaccard's similarity index

    • Sørensen's similarity index (numeric) The Sørensen's similarity index or F-measure

    • Overprediction rate (numeric) The Overprediction rate

    • Underprediction rate (numeric) The Underprediction rate

References

  • Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008

  • Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017

  • Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3

  • Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402

See Also

print.POEvaluation, plot.POEvaluation

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With perfect_presence mode,
# which should be very rare in reality.
mod <- isotree_po(
  obs_mode = "perfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Without background samples or absences
eval_train <- evaluate_po(
  mod$model,
  occ_pred = mod$pred_train$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)

# With background samples
bg_pred <- st_extract(
  mod$prediction, mod$background_samples) %>%
  st_drop_geometry()
eval_train <- evaluate_po(
  mod$model,
  occ_pred = mod$pred_train$prediction,
  bg_pred = bg_pred$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
#'

Format the occurrence dataset for usage in itsdm

Description

The focus of this function is to format the dataset but to keep the dataset as original as possible. Then the users can modify the data if they want before put it into this function.

Usage

format_observation(
  obs_df,
  eval_df = NULL,
  split_perc = 0.3,
  seed = 123,
  obs_crs = 4326,
  eval_crs = 4326,
  x_col = "x",
  y_col = "y",
  obs_col = "observation",
  obs_type = "presence_only"
)

Arguments

obs_df

(data.frame). The data.frame style table that include x and y coordinate and observation of training dataset. This parameter is required as it is the training dataset. Note: it only takes data.frame to reduce the risk of column name mismatch between data.frame and other formats such as tibble.

eval_df

(data.frame or NULL) The data.frame style table that include x and y coordinate and observation of evaluation dataset. Note: it only takes data.frame to reduce the risk of column name mismatch between data.frame and other formats such as tibble.

split_perc

(numeric) a numeric between 0 and 1 corresponding to the percentage of data used to evaluate the models. Only required if eval_df is NULL.

seed

(integer) The seed to split train and evaluation set. The default value is 123. Only required if eval_df is NULL.

obs_crs

(integer, numeric, character, or crs) The EPSG code, CRS string, or sf::crs object of the coordinate system of the training dataset. It corresponds to x_col and y_col in obs_df.

eval_crs

(integer, numeric, character, or crs) The EPSG code, CRS string, or sf::crs object of the coordinate system of the evaluation dataset. Only required if eval_df is not NULL. It corresponds to x_col and y_col in eval_df if any.

x_col

(character) The name of column that is x coordinate in obs_df and eval_df if not NULL.

y_col

(character) The name of column that is y coordinate in obs_df and eval_df if not NULL.

obs_col

(character) The name of column that represents observations in obs_df and eval_df if not NULL.

obs_type

(character) The type of observation to be formatted to. Only can be one of c("presence_only", "presence_absence"). Note that if "presence_only" is set, the absences in obs_df will be deleted. This only affect obs_df, eval_df will keep the original type no matter it is an independent one or is split from eval_df.

Value

(FormatOccurrence) A list of

  • obs (sf) the formatted pts of observations. The column of observation is "observation".

  • obs_type (character) the type of the observations, presence_only or presence_absence.

  • has_eval (logical) whether evaluation dataset is set or generated.

  • eval (sf) the formatted pts of observations for evaluation if any. The column of observation is "observation".

  • eval (eval_type) the type of the observations for evaluation, presence_only or presence_absence.

See Also

print.FormatOccurrence

Examples

library(dplyr)
library(itsdm)
data("occ_virtual_species")

# obs + eval, presence-absence
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

obs <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# obs + eval, presence-only
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"

obs <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# obs + eval, different crs, presence-only
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
obs_crs <- 4326
# Fake one
eval_crs <- 20935
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"

obs <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  obs_crs = obs_crs, eval_crs = eval_crs,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# obs + split, presence-absence
obs_df <- occ_virtual_species
split_perc <- 0.5
seed <- 123
obs_crs <- 4326
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

obs <- format_observation(
  obs_df = obs_df, split_perc = split_perc,
  x_col = x_col, y_col = y_col,
  obs_col = obs_col, obs_type = obs_type)

# obs, presence-only, no eval
obs_df <- occ_virtual_species
eval_df <- NULL
split_perc <- 0
seed <- 123
obs_crs <- 4326
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"

obs <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  split_perc = split_perc,
  x_col = x_col, y_col = y_col,
  obs_col = obs_col, obs_type = obs_type)

Download future Bioclimatic indicators (BIOs) named CMCC-BioClimInd.

Description

Parse future CMCC-BioClimInd bioclimatic indicators obtained by different Earth System Models (ESMs) optionally with a setting of boundary and a few other options.

Usage

future_cmcc_bioclim(
  bry = NULL,
  path = NULL,
  esm = "CMCC-CESM",
  rcp = 85,
  interval = "2040-2079",
  nm_mark = "clip",
  return_stack = TRUE
)

Arguments

bry

(sf or sp) The boundary to mask the downloaded original data. If NULL, it would get global map. If not NULL, it can take sf, sfc, SpatialPolygonsDataFrame, SpatialPolygons, etc. The default is NULL.

path

(character) The path to save the downloaded imagery. If NULL, it would use the current working directory. The default is NULL.

esm

(character) The option for Earth System Models (ESMs). Should be one of "CMCC-CESM", 'GFDL-ESM2M', 'HadGEM2-ES', 'IPSL-CM5A-LR', 'MIROC-ESM-CHEM', 'NorESM1-M'. The default is CMCC-CESM.

rcp

(numeric) The option of Representative Concentration Pathways (RCPs). Should be 45 or 85. Only 85 is available for CMCC-CESM. The default is 85.

interval

(character) The option for time interval. Should be one of "2040-2079", "2060-2099". The default is "2040-2079".

nm_mark

(character) the name mark of clipped images. The default is "clip". It would be ignored if bry is NULL.

return_stack

(logical) if TRUE, stack the imagery together and return. If the area is large and resolution is high, it is better not to stack them. The default is TRUE.

Details

https://doi.pangaea.de/10.1594/PANGAEA.904278?format=html#download

Value

if return_stack is TRUE, the images would be returned as a stars. Otherwise, nothing to return, but the user would receive a message of where the images are.

Note

The function is experimental at the moment, because the download server of this dataset is not as stable as Worldclim yet. If it fails due to slow internet, try to set a larger timeout option, e.g., using options(timeout = 1e3).

References

Noce, Sergio, Luca Caporaso, and Monia Santini."A new global dataset of bioclimatic indicators. "Scientific data 7.1 (2020): 1-12.doi:10.1038/s41597-020-00726-5

Examples

## Not run: 
library(itsdm)
future_cmcc_bioclim(path = tempdir(),
  esm = 'GFDL-ESM2M', rcp = 45,
  interval = "2040-2079", return_stack = FALSE)

## End(Not run)

A function to parse the future climate from worldclim version 2.1.

Description

This function allows you to parse worldclim version 2.1 future climatic files with a setting of boundary and a few other options.

Usage

future_worldclim2(
  var = "tmin",
  res = 10,
  gcm = "BCC-CSM2-MR",
  ssp = "ssp585",
  interval = "2021-2040",
  bry = NULL,
  path = NULL,
  nm_mark = "clip",
  return_stack = TRUE
)

Arguments

var

(character) The option for the variable to download. Should be one of tmin, tmax, prec, bioc. The default is tmin.

res

(numeric) The option for the resolution of image to download. Should be one of 0.5, 2.5, 5, 10. The default is 10.

gcm

(character) The option for global climate models. Check https://www.worldclim.org for all available GCM.

ssp

(character) The option for Shared Socio-economic Pathways. Should be one of "ssp126", "ssp245", "ssp370", "ssp585". The default is "ssp585".

interval

(character) The option for time interval. Should be one of "2021-2040", "2041-2060", "2061-2080", "2081-2100". The default is "2021-2040".

bry

(sf or sp) The boundary to mask the downloaded original data. If NULL, it would get global map. If not NULL, it can take sf, sfc, SpatialPolygonsDataFrame, SpatialPolygons, etc. The default is NULL.

path

(character) The path to save the downloaded imagery. If NULL, it would use the current working directory. The default is NULL.

nm_mark

(character) the name mark of clipped images. The default is "clip". It would be ignored if bry is NULL.

return_stack

(logical) if TRUE, stack the imagery together and return. If the area is large and resolution is high, it is better not to stack them. The default is TRUE.

Details

Web page page for this dataset

Value

if return_stack is TRUE, the images would be returned as a stars. Otherwise, nothing to return, but the user would receive a message of where the images are.

Note

If it fails due to slow internet, try to set a larger timeout option, e.g., using options(timeout = 1e3).

References

Fick, Stephen E., and Robert J. Hijmans. "WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas." International journal of climatology 37.12 (2017): 4302-4315.doi:10.1002/joc.5086

Examples

## Not run: 
future_worldclim2("tmin", 10, "BCC-CSM2-MR",
  "ssp585", "2021-2040",
  path = tempdir(), return_stack = FALSE)

## End(Not run)

Calculate independent responses of each variables.

Description

Calculate the independent responses of each variables within the model.

Usage

independent_response(model, var_occ, variables, si = 1000, visualize = FALSE)

Arguments

model

(Any predictive model). It is isolation_forest here. It could be the item model of POIsotree made by function isotree_po.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

si

(integer) The number of samples to generate response curves. If it is too small, the response curves might be biased. The default value is 1000.

visualize

(logical) if TRUE, plot the response curves. The default is FALSE.

Details

The values show how each environmental variable independently affects the modeling prediction. They show how the predicted result only using this variable changes as it is varied.

Value

(IndependentResponse) A list of

  • responses_cont (list) A list of response values of continuous variables

  • responses_cat (list) A list of response values of categorical variables

References

  • Elith, Jane, et al. "The evaluation strip: a new and robust method for plotting predicted responses from species distribution models." Ecological modelling 186.3 (2005): 280-289.doi:10.1016/j.ecolmodel.2004.12.007

See Also

plot.IndependentResponse

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

independent_responses <- independent_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(independent_responses)

Build Isolation forest species distribution model and explain the the model and outputs.

Description

Call Isolation forest and its variations to do species distribution modeling and optionally call a collection of other functions to do model explanation.

Usage

isotree_po(
  obs_mode = "imperfect_presence",
  obs,
  obs_ind_eval = NULL,
  variables,
  categ_vars = NULL,
  contamination = 0.1,
  ntrees = 100L,
  sample_size = 1,
  ndim = 1L,
  seed = 10L,
  ...,
  offset = 0,
  response = TRUE,
  spatial_response = TRUE,
  check_variable = TRUE,
  visualize = FALSE
)

Arguments

obs_mode

(string) The mode of observations for training. It should be one of c("perfect_presence", "imperfect_presence", "presence_absence"). "perfect_presence" means presence-only occurrences without errors/uncertainties/bias, which should be rare in reality. "Imperfect_presence" means presence-only occurrences with errors/uncertainties/bias, which should be a most common case. "presence_absence" means presence-absence observations regardless quality. See details to learn how to set it. The default is "imperfect_presence".

obs

(sf) The sf of observation for training. It is recommended to call function format_observation to format the occurrence (obs) before passing it here. Otherwise, make sure there is a column named "observation" for observation.

obs_ind_eval

(sf or NULL) Optional sf of observations for independent test. It is recommended to call function format_observation to format the occurrence (obs) before passing it here. Otherwise, make sure there is a column named "observation" for observation. If NULL, no independent test set will be used. The default is NULL.

variables

(RasterStack or stars) The stack of environmental variables.

categ_vars

(vector of character or NULL) The names of categorical variables. Must be the same as the names in variables.

contamination

(numeric) The percentage of abnormal cases within a dataset. Because iForest is an outlier detection algorithm. It picks up abnormal cases (much fewer) from normal cases. This argument is used to set how many abnormal cases should be there if the users have the power to control. See details for how to set it. The value should be less than 0.5. Here we constrain it in (0, 0.3]. The default value is 0.1.

ntrees

(integer) The number of trees for the isolation forest. It must be integer, which you could use function as.integer to convert to. The default is 100L.

sample_size

(numeric) It should be a rate for sampling size in ⁠[0, 1]⁠. The default is 1.0.

ndim

(integer) ExtensionLevel for isolation forest. It must be integer, which you could use function as.integer to convert to. Also, it must be no smaller than the dimension of environmental variables. When it is 1, the model is a traditional isolation forest, otherwise the model is an extended isolation forest. The default is 1.

seed

(integer) The random seed used in the modeling. It should be an integer. The default is 10L.

...

Other arguments that isolation.forest needs.

offset

(numeric) The offset to adjust fitted suitability. The default is zero. Highly recommend to leave it as default.

response

(logical) If TRUE, generate response curves. The default is TRUE.

spatial_response

(logical) If TRUE, generate spatial response maps. The default is TRUE because it might be slow. NOTE that here SHAP-based map is not generated because it is slow. If you want it be mapped, you could call function spatial_response to make it.

check_variable

(logical) If TRUE, check the variable importance. The default is TRUE.

visualize

(logical) If TRUE, generate the essential figures related to the model. The default is FALSE.

Details

For "perfect_presence", a user-defined number (contamination) of samples will be taken from background to let iForest function normally.

If "imperfect_presence", no further actions is required.

If the obs_mode is "presence_absence", a contamination percent of absences will be randomly selected and work together with all presences to train the model.

NOTE: obs_mode and mode only works for obs. obs_ind_eval will follow its own structure.

Please read details of algorithm isolation.forest on https://github.com/david-cortes/isotree, and the R documentation of function isolation.forest.

Value

(POIsotree) A list of

  • model (isolation.forest) The threshold set in function inputs

  • variables (stars) The formatted image stack of environmental variables

  • observation (sf) A sf of training occurrence dataset

  • background_samples (sf) A sf of background points for training dataset evaluation or SHAP dependence plot

  • independent_test (sf or NULL) A sf of test occurrence dataset

  • background_samples_test (sf or NULL) A sf of background points for test dataset evaluation or SHAP dependence plot

  • vars_train (data.frame) A data.frame with values of each environmental variables for training occurrence

  • pred_train (data.frame) A data.frame with values of prediction for training occurrence

  • eval_train (POEvaluation) A list of presence-only evaluation metrics based on training dataset. See details of POEvaluation in evaluate_po

  • var_test (data.frame or NULL) A data.frame with values of each environmental variables for test occurrence

  • pred_test (data.frame or NULL) A data.frame with values of prediction for test occurrence

  • eval_test (POEvaluation or NULL) A list of presence-only evaluation metrics based on test dataset. See details of POEvaluation in evaluate_po

  • prediction (stars) The predicted environmental suitability

  • marginal_responses (MarginalResponse or NULL) A list of marginal response values of each environmental variables. See details in marginal_response

  • offset (numeric) The offset value set as inputs.

  • independent_responses (IndependentResponse or NULL) A list of independent response values of each environmental variables. See details in independent_response

  • shap_dependences (ShapDependence or NULL) A list of variable dependence values of each environmental variables. See details in shap_dependence

  • spatial_responses (SpatialResponse or NULL) A list of spatial variable dependence values of each environmental variables. See details in shap_dependence

  • variable_analysis (VariableAnalysis or NULL) A list of variable importance analysis based on multiple metrics. See details in variable_analysis

References

  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining.IEEE, 2008. doi:10.1109/ICDM.2008.17

  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 1-39. doi:10.1145/2133360.2133363

  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "On detecting clustered anomalies using SCiForest." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010. doi:10.1007/978-3-642-15883-4_18

  • Ha riri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended isolation forest." IEEE Transactions on Knowledge and Data Engineering (2019). doi:10.1109/TKDE.2019.2947676

  • https://github.com/david-cortes/isotree

  • References of related feature such as response curves and variable importance will be listed under their own functions

See Also

evaluate_po, marginal_response, independent_response, shap_dependence, spatial_response, variable_analysis, isolation.forest

Examples

########### Presence-absence mode #################
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# Load variables
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# Modeling
mod_virtual_species <- isotree_po(
  obs_mode = "presence_absence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)

# Check results
## Evaluation based on training dataset
print(mod_virtual_species$eval_train)
plot(mod_virtual_species$eval_train)

## Response curves
plot(mod_virtual_species$marginal_responses)
plot(mod_virtual_species$independent_responses,
     target_var = c('bio1', 'bio5'))
plot(mod_virtual_species$shap_dependence)

## Relationships between target var and related var
plot(mod_virtual_species$shap_dependence,
     target_var = c('bio1', 'bio5'),
     related_var = 'bio12', smooth_span = 0)

# Variable importance
mod_virtual_species$variable_analysis
plot(mod_virtual_species$variable_analysis)

########### Presence-absence mode ##################
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

# Modeling with perfect_presence mode
mod_perfect_pres <- isotree_po(
  obs_mode = "perfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)

# Modeling with imperfect_presence mode
mod_imperfect_pres <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.6, ndim = 1L,
  seed = 123L, nthreads = 1)

Boundary of mainland Africa

Description

The overall continental boundary of mainland Africa queried from rnaturalearth and get processed.

Usage

mainland_africa

Format

A sf with one rows and 2 fields

name

(character) The name of the polygon: Africa

area

(units) The united number of the overall area in km2. This is not a consensus area, but just a calculated area under this resolution.

geometry

(sfc) The simple polygon feature of the boundary

Source

rnaturalearth


Calculate marginal responses of each variables.

Description

Calculate the marginal responses of each variables within the model.

Usage

marginal_response(model, var_occ, variables, si = 1000, visualize = FALSE)

Arguments

model

(Any predictive model). In this package, it is isolation_forest. It could be the item model of POIsotree made by function isotree_po.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

si

(integer) The number of samples to generate response curves. If it is too small, the response curves might be biased. The default value is 1000.

visualize

(logical) if TRUE, plot the response curves. The default is FALSE.

Details

The values show how each environmental variable affects the modeling prediction. They show how the predicted result changes as each environmental variable is varied while keeping all other environmental variables at average sample value. They might be hard to interpret if there are strongly correlated variables. The users could use dim_reduce function to remove the strong correlation from original environmental variable stack.

Value

(MarginalResponse) A nested list of

  • responses_cont (list) A list of response values of continuous variables

  • responses_cat (list) A list of response values of categorical variables

References

  • Elith, Jane, et al. "The evaluation strip: a new and robust method for plotting predicted responses from species distribution models." Ecological modelling 186.3 (2005): 280-289.doi:10.1016/j.ecolmodel.2004.12.007

See Also

plot.MarginalResponse

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

marginal_responses <- marginal_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(marginal_responses)
#'

Occurrence dataset of a virtual species

Description

A pseudo presence-absence occurrence dataset of a virtual species made by package virtualspecies.

Usage

occ_virtual_species

Format

A data.frame with 300 rows and 2 fields

x

(numeric) The x coordinates of the records in WGS84 geographic coordinate system

y

(numeric) The y coordinates of the records in WGS84 geographic coordinate system

observation

(numeric) The observations of presence and absence.

usage

(character) The usage of the occurrences, either be "train" as training set, or "eval" as test set.

Details

The environmental niche of the virtual species is made by defining its response functions to annual temperature and annual precipitation in mainland Africa. The response function of annual temperature is normal distribution with mean = 22 and standard deviation = 5. The response function of annual precipitation is normal distribution with mean = 1000 and standard deviation = 200. Then the suitability is convert to presence-absence map by logistic conversion with beta = 0.7, alpha = -0.05, and species prevalence = 0.27. Finally 500 presence-absence points are sampled across the whole region. Then these points were randomly split into train (0.7) and test set (0.3).

Source

virtualspecies


Display the figure and map of the EnviChange object.

Description

Show the response curve and the map of contribution change from detect_envi_change.

Usage

## S3 method for class 'EnviChange'
plot(x, ...)

Arguments

x

(EnviChange) A EnviChange object to be messaged. It could be the return of function detect_envi_change.

...

Not used.

Value

The same object that was passed as input.

See Also

detect_envi_change

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")
#'
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 1L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Use a fixed value
bio1_changes <- detect_envi_change(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1,
  target_var = "bio1",
  var_future = 5)

plot(bio1_changes)

Exhibit suspicious outliers in an observation dataset.

Description

Display observations and potential outliers diagnosed by function suspicious_env_outliers in a dataset.

Usage

## S3 method for class 'EnvironmentalOutlier'
plot(x, overlay_raster = NULL, pts_alpha = 0.5, ...)

Arguments

x

(EnvironmentalOutlier) The PAConversion object to plot. It could be the return of function suspicious_env_outliers.

overlay_raster

(RasterLayer or stars) The environmental raster to plot together with points.

pts_alpha

(numeric) The alpha used by geom_sf to show points.

...

Not used.

Value

A ggplot2 figure of outliers distribution among all observations.

See Also

suspicious_env_outliers, print.EnvironmentalOutlier

Examples

library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

occ_outliers <- suspicious_env_outliers(
  occ = occ_virtual_species, variables = env_vars,
  z_outlier = 3.5, outliers_print = 4L)

plot(occ_outliers)
plot(occ_outliers,
  overlay_raster = env_vars %>% slice('band', 1))

Show independent response curves.

Description

Plot independent response curves using ggplot2 by optionally set target variable(s).

Usage

## S3 method for class 'IndependentResponse'
plot(x, target_var = NA, smooth_span = 0.3, ...)

Arguments

x

(IndependentResponse) The independent response curve object to plot. It could be the return of function independent_response.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

smooth_span

(numeric) The span value for smooth fit in ggplot2. When it is 0, no smooth applied. The default is 0.3.

...

Not used.

Value

ggplot2 figure of response curves

See Also

independent_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

independent_responses <- independent_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(independent_responses)

Show marginal response curves.

Description

Plot marginal response curves using ggplot2 by optionally set target variable(s).

Usage

## S3 method for class 'MarginalResponse'
plot(x, target_var = NA, smooth_span = 0.3, ...)

Arguments

x

(MarginalResponse) The marginal response curve object to plot. It could be the return of function marginal_response.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

smooth_span

(numeric) The span value for smooth fit in ggplot2. When it is 0, no smooth applied. The default is 0.3.

...

Not used.

Value

ggplot2 figure of response curves

See Also

marginal_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

marginal_responses <- marginal_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(marginal_responses, target_var = 'bio1')

Display results of conversion to presence-absence (PA).

Description

Display raster of suitability, probability of occurrence, presence-absence binary map from presence-absence (PA) conversion.

Usage

## S3 method for class 'PAConversion'
plot(x, ...)

Arguments

x

(PAConversion) The PAConversion object to plot. It could be the return of function convert_to_pa.

...

Not used.

Value

A patchwork of ggplot2 figure of suitability, probability of occurrence, presence-absence binary map.

See Also

convert_to_pa, print.PAConversion

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction,
  method = 'threshold', beta = 0.5)
plot(pa_thred)

Show model evaluation.

Description

Display informative and detailed figures of continuous Boyce index, AUC curves, and TSS curve.

Usage

## S3 method for class 'POEvaluation'
plot(x, ...)

Arguments

x

(POEvaluation) The presence-only evaluation object to plot. It could be the return of function evaluate_po.

...

Not used.

Value

A patchwork of ggplot2 figure of AUC_ratio, AUC_background and CBI.

See Also

evaluate_po, print.POEvaluation

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

eval_train <- evaluate_po(
  mod$model,
  occ_pred = mod$pred_train$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))

plot(eval_train)

Show variable dependence plots and variable interaction plots obtained from Shapley values.

Description

Plot Shapley value-based variable dependence curves using ggplot2 by optionally selecting target variable(s). It also can plot the interaction between a related variable to the selected variable(s).

Usage

## S3 method for class 'ShapDependence'
plot(
  x,
  target_var = NA,
  related_var = NA,
  sample_prop = 0.3,
  sample_bin = 100,
  smooth_line = TRUE,
  seed = 123,
  ...
)

Arguments

x

(ShapDependence) The variable dependence object to plot. It could be the return of function shap_dependence.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

related_var

(character) The dependent variable to plot together with target variables. It could be NA. If it is NA, no related variable will be plotted.

sample_prop

(numeric) The proportion of points to sample for plotting. It will be ignored if the number of points is less than 1000. The default is 0.3.

sample_bin

(integer) The number of bins to use for stratified sampling.

smooth_line

(logical) Whether to fit the smooth line or not. It will be ignored if the number of points is less than 1000. The default is 100.

seed

(integer) The seed for sampling. It will be ignored if the number of points is less than 1000. The default is 123.

...

Other arguments passed on to geom_smooth. Mainly method and formula to fit the smooth line. Note that the same arguments will be used for all target variables. User could set variable one by one to set the arguments separately.

Details

If the number of samples is more than 1000, a stratified sampling is used to thin the sample pool, and then plot its subset. The user could set a proportion to sample and a number of bins for stratified sampling.

Value

ggplot2 figure of dependent curves

See Also

shap_dependence

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_dependence <- shap_dependence(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(var_dependence, target_var = 'bio1', related_var = 'bio12')

Display Shapley values-based spatial variable dependence maps.

Description

Plot Shapley values-based spatial variable dependence maps using ggplot2 by optionally setting target variable(s). This only works for SHAPSpatial even though it is part of SpatialResponse.

Usage

## S3 method for class 'SHAPSpatial'
plot(x, target_var = NA, ...)

Arguments

x

(SHAPSpatial) The spatial variable dependence object to plot. It could be the return of function shap_spatial_response.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

...

Not used.

Value

ggplot2 figure of dependent maps

See Also

spatial_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

shap_spatial <- shap_spatial_response(
 model = mod$model,
 target_vars = c("bio1", "bio12"),
 var_occ = mod$vars_train,
 variables = mod$variables,
 shap_nsim = 1)

plot(shap_spatial)
plot(shap_spatial, target_var = "bio1")

Display spatial variable dependence maps.

Description

Plot spatial variable dependence maps using ggplot2 by optionally setting target variable(s).

Usage

## S3 method for class 'SpatialResponse'
plot(x, target_var = NA, ...)

Arguments

x

(SpatialResponse) The spatial variable dependence object to plot. It could be the return of function spatial_response.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

...

Not used.

Value

ggplot2 figure of dependent maps

See Also

spatial_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

spatial_responses <- spatial_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 10)
plot(spatial_responses)
plot(spatial_responses, target_var = 'bio1')

Display variable importance.

Description

Display informative and detailed figures of variable importance.

Usage

## S3 method for class 'VariableAnalysis'
plot(x, ...)

Arguments

x

(VariableAnalysis) The variable importance object to plot. It could be the return of function variable_analysis.

...

Not used.

Value

A patchwork of ggplot2 figure of variable importance according to multiple metrics.

See Also

variable_analysis, print.VariableAnalysis

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_analysis <- variable_analysis(
  model = mod$model,
  pts_occ = mod$observation,
  pts_occ_test = mod$independent_test,
  variables = mod$variables)
plot(var_analysis)

Exhibit variable contribution for target observations.

Description

Use ggplot2 to plot variable contribution for each target observation separately or summarize the overall variable contribution across all selected observations.

Usage

## S3 method for class 'VariableContribution'
plot(x, plot_each_obs = FALSE, num_features = 5, ...)

Arguments

x

(VariableContribution) The VariableContribution object to plot. It could be the return of function variable_contrib.

plot_each_obs

(logical) The option of plot type. If TRUE, it will plot variable contribution for every observation. Otherwise, it will plot variable contribution violin plot for all observations.

num_features

(integer) A number of most important features to plot. Just work if plot_each_obs is TRUE.

...

Not used.

Value

ggplot2 figure of Variable Contribution.

See Also

variable_contrib

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_contribution <- variable_contrib(
  model = mod$model,
  var_occ = mod$vars_train,
  var_occ_analysis = mod$vars_train %>% slice(1:10))

# Plot variable contribution to each observation
plot(var_contribution,
     plot_each_obs = TRUE,
     num_features = 3)

# Plot the summarized contribution
plot(var_contribution)

Print summary information from EnviChange object.

Description

Display the detected tipping points and percentage of affected areas due to a changing variable from function detect_envi_change.

Usage

## S3 method for class 'EnviChange'
print(x, ...)

Arguments

x

(EnviChange) A EnviChange object to be messaged. It could be the return of function detect_envi_change.

...

Not used.

Value

The same object that was passed as input.

See Also

detect_envi_change

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")
#'
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 1L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Use a fixed value
bio1_changes <- detect_envi_change(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1,
  target_var = "bio1",
  var_future = 5)

print(bio1_changes)

Print summary information from EnvironmentalOutlier object.

Description

Display the environmental variable values comparing to the mean values of the detected environmental outliers in observations.

Usage

## S3 method for class 'EnvironmentalOutlier'
print(x, ...)

Arguments

x

(EnvironmentalOutlier) A EnvironmentalOutlier object to be messaged. It could be the return of function suspicious_env_outliers.

...

Not used.

Value

The same object that was passed as input.

See Also

suspicious_env_outliers, plot.EnvironmentalOutlier

Examples

library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

occ_outliers <- suspicious_env_outliers(
  occ = occ_virtual_species, variables = env_vars,
  z_outlier = 5, outliers_print = 4L)

print(occ_outliers)

Print summary information from FormatOccurrence object.

Description

Display the type and number of training and evaluation dataset in the formatted observations obtained by function format_observation.

Usage

## S3 method for class 'FormatOccurrence'
print(x, ...)

Arguments

x

(FormatOccurrence) A FormatOccurrence object to be messaged. It could be the return of function format_observation.

...

Not used.

Value

The same object that was passed as input.

See Also

format_observation

Examples

library(dplyr)
library(itsdm)
data("occ_virtual_species")

# obs + eval, presence-absence
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

obs_formatted <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

print(obs_formatted)

Print summary information from PAConversion object.

Description

Display the equation and parameters of a PAConversion object.

Usage

## S3 method for class 'PAConversion'
print(x, ...)

Arguments

x

(PAConversion) A PAConversion object to be messaged. It could be the return of function convert_to_pa.

...

Not used.

Value

The same object that was passed as input.

See Also

convert_to_pa, plot.PAConversion

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction, method = 'threshold', beta = 0.5)
print(pa_thred)

Print summary information from model evaluation object (POEvaluation).

Description

Display the most general and informative characteristics of a model evaluation object.

Usage

## S3 method for class 'POEvaluation'
print(x, ...)

Arguments

x

(POEvaluation) A presence-only evaluation object to be messaged. It could be the return of function evaluate_po.

...

Not used.

Value

The same object that was passed as input.

See Also

evaluate_po, plot.POEvaluation

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

eval_train <- evaluate_po(mod$model,
  occ_pred = mod$pred_train$prediction,
  var_pred = na.omit(as.vector(mod$prediction[[1]])))

print(eval_train)

Print summary information from POIsotree object.

Description

Display the most general and informative characteristics of a fitted POIsotree object. It includes the model information, model evaluation, variable analysis, etc.

Usage

## S3 method for class 'POIsotree'
print(x, ...)

Arguments

x

(POIsotree) The POIsotree object to be messaged. It could be the return of function isotree_po.

...

Not used.

Value

The same object that was passed as input.

See Also

isotree_po

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)
print(mod)

Print summary information from ReducedImageStack object.

Description

Display the most general and informative characteristics of a ReducedImageStack object, including the set threshold, original variables, and the selected variables and the correlations between them.

Usage

## S3 method for class 'ReducedImageStack'
print(x, ...)

Arguments

x

(ReducedImageStack) A ReducedImageStack object to be messaged. It could be the return of function dim_reduce.

...

Not used.

Value

The same object that was passed as input.

See Also

dim_reduce

Examples

library(itsdm)
library(dplyr)
library(stars)
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars()

img_reduced <- dim_reduce(env_vars, threshold = 0.7,
  preferred_vars = c('bio1', 'bio12'))

print(img_reduced)

Print summary information from variable importance object (VariableAnalysis).

Description

Display non-visualized information of a VariableAnalysis object returned by function variable_analysis.

Usage

## S3 method for class 'VariableAnalysis'
print(x, ...)

Arguments

x

(VariableAnalysis) A variable importance object to be messaged. It could be the return of function variable_analysis.

...

Not used.

Details

For Jackknife test, if the value is positive, print as "/". If the value is negative, then print as "\". For Shapley values based test, print as "#" since there is no negative value and in order to distinguish this characteristic with Jackknife test.

Value

The same object that was passed as input.

See Also

variable_analysis, plot.VariableAnalysis

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_analysis <- variable_analysis(
  model = mod$model,
  pts_occ = mod$observation,
  pts_occ_test = mod$independent_test,
  variables = mod$variables)

print(var_analysis)

Estimate suitability on stars object using trained isolation.forest model.

Description

Apply an isolation.forest model on a stars object to calculate environmental suitability and do quantile stretch to ⁠[0, 1]⁠.

Usage

probability(x, vars, offset = 0)

Arguments

x

(isolation_forest). It could be the item model of POIsotree made by function isotree_po.

vars

(stars) The stack of environmental variables. More specifically, make sure it has x and y dimensions only, and distribute variables to attributes of this stars. Otherwise, the function would stop.

offset

(numeric) The offset to adjust fitted suitability. The default is zero. Highly recommend to leave it as default.

Value

a stars of predicted habitat suitability

See Also

isotree_po

Examples

## Not run: 
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

suit <- probability(mod$model, mod$variables)

## End(Not run)

Calculate Shapley value-based variable dependence.

Description

Calculate how a species responses to environmental variables using Shapley values.

Usage

shap_dependence(
  model,
  var_occ,
  variables,
  si = 1000,
  shap_nsim = 100,
  visualize = FALSE,
  seed = 10,
  pfun = .pfun_shap
)

Arguments

model

(isolation_forest or other model). The SDM. It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

si

(integer) The number of samples to generate response curves. If it is too small, the response curves might be biased. The default value is 1000.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. When the number of variables is large, a smaller shap_nsim could be used. See details in documentation of function explain in package fastshap. The default is 100.

visualize

(logical) if TRUE, plot the variable dependence plots. The default is FALSE.

seed

(integer) The seed for any random progress. The default is 10.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

Details

The values show how each environmental variable independently affects the modeling prediction. They show how the Shapley value of each variable changes as its value is varied.

Value

(ShapDependence) A list of

  • dependences_cont (list) A list of Shapley values of continuous variables

  • dependences_cat (list) A list of Shapley values of categorical variables

  • feature_values (data.frame) A table of feature values

References

See Also

plot.ShapDependence explain in fastshap

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_dependence <- shap_dependence(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(var_dependence, target_var = "bio1", related_var = "bio16")


## Not run: 
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
  filter(usage == "train")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12)) %>%
  split()

model_data <- stars::st_extract(
  env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
  as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
  mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)

mod_rf <- randomForest(
  occ ~ .,
  data = model_data,
  ntree = 200)

pfun <- function(X.model, newdata) {
  # for data.frame
  predict(X.model, newdata, type = "prob")[, "1"]
}

shap_dependences <- shap_dependence(
  model = mod_rf,
  var_occ = model_data %>% select(-occ),
  variables = env_vars,
  visualize = FALSE,
  seed = 10,
  pfun = pfun)

## End(Not run)

Calculate shapley values-based spatial response.

Description

Calculate spatially SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.

Usage

shap_spatial_response(
  model,
  var_occ,
  variables,
  target_vars = NULL,
  shap_nsim = 10,
  seed = 10,
  pfun = .pfun_shap
)

Arguments

model

(isolation_forest or other model). It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

target_vars

(a vector of character) The selected variables to process. If it is NULL, all variables will be used.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap. When the number of variables is large, a smaller shap_nsim could be used. Be cautious that making SHAP-based spatial dependence will be slow because of Monte-Carlo computation for all pixels. But it is worth the time because it is much more informative. See details in documentation of function explain in package fastshap. The default is 10. Usually a value 10 - 20 is enough.

seed

(integer) The seed for any random progress. The default is 10L.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

Details

The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response.

Value

(SHAPSpatial) A list of

A list of stars object of spatially SHAP-based response of all variables

See Also

spatial_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

shap_spatial <- shap_spatial_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1)

shap_spatial <- shap_spatial_response(
 model = mod$model,
 target_vars = c("bio1", "bio12"),
 var_occ = mod$vars_train,
 variables = mod$variables,
 shap_nsim = 1)

## Not run: 
##### Use Random Forest model as an external model ########
library(randomForest)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
  filter(usage == "train")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12)) %>%
  split()

model_data <- stars::st_extract(
  env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
  as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
  mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)

mod_rf <- randomForest(
  occ ~ .,
  data = model_data,
  ntree = 200)

pfun <- function(X.model, newdata) {
  # for data.frame
  predict(X.model, newdata, type = "prob")[, "1"]
}

shap_spatial <- shap_spatial_response(
  model = mod_rf,
  target_vars = c("bio1", "bio12"),
  var_occ = model_data %>% select(-occ),
  variables = env_vars,
  shap_nsim = 10,
  pfun = pfun)

## End(Not run)

Calculate spatial response or dependence figures.

Description

Calculate spatially marginal, independence, and SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.

Usage

spatial_response(
  model,
  var_occ,
  variables,
  shap_nsim = 0,
  seed = 10L,
  visualize = FALSE
)

Arguments

model

(isolation_forest). It could be the item model of POIsotree made by function isotree_po.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap. Set it to 0 if you don't want to make SHAP-based spatial dependence. When the number of variables is large, a smaller shap_nsim could be used. Be cautious that making SHAP-based spatial dependence will be slow because of Monte-Carlo computation for all pixels. But it is worth the time because it is much more informative. See details in documentation of function explain in package fastshap. The default is 0. Usually a value 10 - 20 is enough.

seed

(integer) The seed for any random progress. The default is 10L.

visualize

(logical) if TRUE, plot the response curves. The default is FALSE.

Details

The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response. Compared to marginal dependence or independent dependence maps, SHAP-based maps are way more informative because SHAP-based dependence explain the contribution of each variable to final result.

Value

(SpatialResponse) A list of

  • spatial_marginal_response (list) A list of stars object of spatially marginal response of all variables

  • spatial_independent_response (list) A list of stars object of spatially independent response of all variables

  • spatial_shap_dependence (list) A list of stars object of spatially SHAP-based response of all variables

See Also

plot.SpatialResponse

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 1L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

spatial_responses <- spatial_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1)
plot(spatial_responses)
#'

Function to detect suspicious outliers based on environmental variables.

Description

Run outlier.tree to detect suspicious outliers in observations.

Usage

suspicious_env_outliers(
  occ,
  occ_crs = 4326,
  variables,
  rm_outliers = FALSE,
  seed = 10L,
  ...,
  visualize = TRUE
)

Arguments

occ

(data.frame, sf, SpatialPointsDataFrame) The occurrence dataset for training. There must be column x and y for coordinates if it is a regular data.frame.

occ_crs

(numeric or crs) The EPSG number or crs object of occurrence CRS. The default value is 4326, which is the geographic coordinate system.

variables

(RasterStack or stars) The stack of environmental variables.

rm_outliers

(logical) The option to remove the suspicious outliers or not. The default is FALSE.

seed

(integer) The random seed used in the modeling. It should be an integer. The default is 10L.

...

Other arguments passed to function outlier.tree in package outliertree.

visualize

(logical) If TRUE, plot the result. The default is TRUE.

Details

Please check more details in R documentation of function outlier.tree in package outliertree and their GitHub.

Value

(EnvironmentalOutlier) A list that contains

  • outliers (sf) The sf points of outliers

  • outlier_details (tibble) A table of outlier details returned from function outlier.tree in package outliertree

  • pts_occ (sf) The sf points of occurrence. If rm_outliers is TRUE, outliers are deleted from points of occurrence. If FALSE, the full observations are returned.

References

See Also

print.EnvironmentalOutlier, plot.EnvironmentalOutlier outlier.tree in package outliertree

Examples

library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

occ_outliers <- suspicious_env_outliers(
  occ = occ_virtual_species, variables = env_vars,
  z_outlier = 3.5, outliers_print = 4L, nthreads = 1)

occ_outliers
plot(occ_outliers)

Function to evaluate relative importance of each variable.

Description

Evaluate relative importance of each variable within the model using the following methods:

  • Jackknife test based on AUC ratio and Pearson correlation between the result of model using all variables

  • SHapley Additive exPlanations (SHAP) according to Shapley values

Usage

variable_analysis(
  model,
  pts_occ,
  pts_occ_test = NULL,
  variables,
  shap_nsim = 100,
  visualize = FALSE,
  seed = 10
)

Arguments

model

(isolation_forest) The extended isolation forest SDM. It could be the item model of POIsotree made by function isotree_po.

pts_occ

(sf) The sf style table that include training occurrence locations.

pts_occ_test

(sf, or NULL) The sf style table that include occurrence locations of test. If NULL, it would be set the same as var_occ. The default is NULL.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap.

visualize

(logical) If TRUE, plot the analysis figures. The default is FALSE.

seed

(integer) The seed for any random progress. The default is 10L.

Details

Jackknife test of variable importance is reflected as the decrease in a model performance when an environmental variable is used singly or is excluded from the environmental variable pool. In this function, we used Pearson correlation and AUC ratio.

Pearson correlation is the correlation between the predictions generated by different variable importance evaluation methods and the predictions generated by the full model as the assessment of mode performance.

The area under the ROC curve (AUC) is a threshold-independent evaluator of model performance, which needs both presence and absence data. A ROC curve is generated by plotting the proportion of correctly predicted presence on the y-axis against 1 minus the proportion of correctly predicted absence on x-axis for all thresholds. Multiple approaches have been used to evaluate accuracy of presence-only models. Peterson et al. (2008) modified AUC by plotting the proportion of correctly predicted presence against the proportion of presences falling above a range of thresholds against the proportion of cells of the whole area falling above the range of thresholds. This is the so called AUC ratio that is used in this package.

SHapley Additive exPlanations (SHAP) uses Shapley values to evaluate the variable importance. The larger the absolute value of Shapley value, the more important this variable is. Positive Shapley values mean positive affect, while negative Shapely values mean negative affect. Please check references for more details if you are interested in.

Value

(VariableAnalysis) A list of

  • variables (vector of character) The names of environmental variables

  • pearson_correlation (tibble) A table of Jackknife test based on Pearson correlation

  • full_AUC_ratio (tibble) A table of AUC ratio of training and test dataset using all variables, that act as references for Jackknife test

  • AUC_ratio (tibble) A table of Jackknife test based on AUC ratio

  • SHAP (tibble) A table of Shapley values of training and test dataset separately

References

See Also

plot.VariableAnalysis, print.VariableAnalysis explain in fastshap

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_analysis <- variable_analysis(
  model = mod$model,
  pts_occ = mod$observation,
  pts_occ_test = mod$independent_test,
  variables = mod$variables)
plot(var_analysis)

Evaluate variable contributions for targeted observations.

Description

Evaluate variable contribution for targeted observations according to SHapley Additive exPlanations (SHAP).

Usage

variable_contrib(
  model,
  var_occ,
  var_occ_analysis,
  shap_nsim = 100,
  visualize = FALSE,
  seed = 10,
  pfun = .pfun_shap
)

Arguments

model

(isolation_forest or other model) The SDM. It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

var_occ_analysis

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations for analysis. It could be either var_occ or its subset, or any new dataset.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap.

visualize

(logical) if TRUE, plot the response curves. The default is FALSE.

seed

(integer) The seed for any random progress. The default is 10L.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

Value

(VariableContribution) A list of

  • shapley_values (data.frame) A table of Shapley values of each variables for all observations

  • feature_values (tibble) A table of values of each variables for all observations

References

See Also

plot.VariableContribution explain in fastshap

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 5,
  sample_size = 0.8, ndim = 1L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_contribution <- variable_contrib(
  model = mod$model,
  var_occ = mod$vars_train,
  var_occ_analysis = mod$vars_train %>% slice(1:2))
## Not run: 
plot(var_contribution,
  num_features = 3,
  plot_each_obs = TRUE)

# Plot together
plot(var_contribution)

## End(Not run)

Download environmental variables made by worldclim version 2.1.

Description

Parse historic worldclim version 2.1 variables with a setting of boundary and a few other options.

Usage

worldclim2(
  var = "tmin",
  res = 10,
  bry = NULL,
  path = NULL,
  nm_mark = "clip",
  return_stack = TRUE
)

Arguments

var

(character) The option for the variable to download, should be one of tvag, tmin, tmax, prec, srad, wind, vapr and bio. The default is 'tmin'.

res

(numeric) The option for the resolution of image to download. Should be one of 0.5, 2.5, 5, 10 in minute degree. The default is 10.

bry

(sf or sp) The boundary to mask the downloaded original data. If NULL, it would get global map. If not NULL, it can take sf, sfc, SpatialPolygonsDataFrame, SpatialPolygons, etc. The default is NULL.

path

(character) The path to save the downloaded imagery. If NULL, it would use the current working directory. The default is NULL.

nm_mark

(character) the name mark of clipped images. The default is "clip". It would be ignored if bry is NULL.

return_stack

(logical) if TRUE, stack the imagery together and return. If the area is large and resolution is high, it is better not to stack them. The default is TRUE.

Details

Web page page for this dataset

Value

if return_stack is TRUE, the images would be returned as a stars. Otherwise, nothing to return, but the user would receive a message of where the images are.

Note

If it fails due to slow internet, try to set a larger timeout option, e.g., using options(timeout = 1e3).

References

Fick, Stephen E., and Robert J. Hijmans. "WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas." International journal of climatology 37.12 (2017): 4302-4315.doi:10.1002/joc.5086

Examples

## Not run: 
library(sf)
library(itsdm)

bry <- sf::st_polygon(
  list(rbind(c(29.34, -11.72), c(29.34, -0.95),
             c(40.31, -0.95), c(40.31, -11.72),
             c(29.34, -11.72)))) %>%
  st_sfc(crs = 4326)

bios <- worldclim2(var = "tmin", res = 10,
  bry = bry, nm_mark = 'exp', path = tempdir())

## End(Not run)