diff --git a/.Rbuildignore b/.Rbuildignore index 91114bf..96ece55 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -1,2 +1,5 @@ ^.*\.Rproj$ ^\.Rproj\.user$ +^\.travis\.yml$ +^data-raw$ +.gitlab-ci.yml diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml new file mode 100644 index 0000000..4d8aab9 --- /dev/null +++ b/.gitlab-ci.yml @@ -0,0 +1,41 @@ +document: + stage: document + script: + - R --vanilla -e 'library(devtools); devtools::document(); devtools::document()' + artifacts: + paths: + - man/ + - NAMESPACE + +test: + stage: test + script: + - R --vanilla -e 'library(devtools); library(testthat); devtools::test(reporter + = StopReporter)' + dependencies: + - document + +build: + stage: build + script: + - R --vanilla -e 'library(devtools); devtools::build(path = "./")' + artifacts: + paths: + - '*.tar.gz' + dependencies: + - document + +check: + stage: check + script: + - R --vanilla -e 'library(devtools); tar_file <- file.path(getwd(), list.files(".", + pattern = ".tar.gz")); results <- devtools::check_built(tar_file); stopifnot(sum(length(results$errors), length(results$warnings)) <= 0)' + dependencies: + - build + +stages: +- document +- test +- build +- check + diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..8d139ac --- /dev/null +++ b/.travis.yml @@ -0,0 +1,5 @@ +# R for travis: see documentation at https://docs.travis-ci.com/user/languages/r + +language: R +sudo: false +cache: packages diff --git a/NAMESPACE b/NAMESPACE index c047479..f652966 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -4,4 +4,6 @@ export(biased_trans_matrix) export(cbrw) import(dplyr) import(rlang) +importFrom(stats,setNames) importFrom(tidyr,spread) +importFrom(utils,combn) diff --git a/R/cbrw.R b/R/cbrw.R index 11b5aff..3b2b6c5 100644 --- a/R/cbrw.R +++ b/R/cbrw.R @@ -1,7 +1,7 @@ #' @title Coupled Biased Random Walks -#' TODO +#' @description TODO #' @param data a data.frame containing catgorical data -#' @value the input data frame with an additional \emph{score} variable representing +#' @return the input data frame with an additional \emph{score} variable representing #' relative outlier-ness of the observation #' @export cbrw <- function(data) { diff --git a/R/count.R b/R/count.R index e8fdff0..bf317aa 100644 --- a/R/count.R +++ b/R/count.R @@ -1,11 +1,11 @@ # TODO: Probably consolidate, or just build an R6 class -#' @title Counting helper functions -#' @name counting_helpers +# @title Counting helper functions +# @name counting_helpers -#' @rdname counting_helpers -#' @param data a tibble of categorical data -#' @return a tibble containing all unique combinations of feature values -#' with columns: \emph{u}, \emph{v}, \emph{freq}, emph{p}, and \emph{feature} +# @rdname counting_helpers +# @param data a tibble of categorical data +# @return a tibble containing all unique combinations of feature values +# with columns: \emph{u}, \emph{v}, \emph{freq}, emph{p}, and \emph{feature} intra_freq <- function(data) { var_quos <- lapply(names(data), as.name) @@ -30,12 +30,14 @@ intra_freq <- function(data) { return(out) } -#' Helper function to calculate all bivariate frequencies -#' within the dataset -#' @param data a tibble of categorical data -#' @return a tibble containing all unique combinations of feature values -#' with columns: \emph{u}, \emph{v}, \emph{freq}, and \emph{group} -#' @rdname counting_helpers +# Helper function to calculate all bivariate frequencies +# within the dataset +# @param data a tibble of categorical data +# @return a tibble containing all unique combinations of feature values +# with columns: \emph{u}, \emph{v}, \emph{freq}, and \emph{group} +# @rdname counting_helpers +#' @importFrom utils combn +#' @importFrom stats setNames inter_freq <- function(data) { # Calculate all 2 variable combinations of variables diff --git a/R/data.R b/R/data.R index fdb3617..e703df1 100644 --- a/R/data.R +++ b/R/data.R @@ -1,3 +1,4 @@ #' @title CBRW paper exmaple dataset +#' @description See dataset in cited paper (TODO) #' Example dataset as shown on the first page of the paper "cbrw_example" \ No newline at end of file diff --git a/R/matrix.R b/R/matrix.R index 9ac6ecc..e0d02f3 100644 --- a/R/matrix.R +++ b/R/matrix.R @@ -1,4 +1,8 @@ #' @title Generate Biased Transition Matrix +#' @description +#' Takes in a data.frame of categorical data and returns a weighted transition matrix +#' which characterizes the edge weights of a directed graph representation of the inter-feature value couplings +#' #' @param data a data.frame containing mosly categorical data #' @param all_data a boolean (default: FALSE) on whether to return additional node and edge data (see Note) #' @return a biased transition matrix of dim [k,k] where k is the number of unique feature values diff --git a/README.md b/README.md index 5567dc8..9daa768 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# cbRw, Coupled Biased Random Walks +# cbRw, Coupled Biased Random Walks [![Build Status](https://travis-ci.org/beansrowning/cbRw.svg?branch=master)](https://travis-ci.org/beansrowning/cbRw) Anomaly detection for complex categorical data ## Overview @@ -7,7 +7,15 @@ Described by [Pang, Cao, and Chen (2016)](https://www.ijcai.org/Proceedings/16/P Also based on work by Daniel Kaslovsky in the Python implementation, [Coupled-Biased-Random-Walks ](https://github.com/dkaslovsky/Coupled-Biased-Random-Walks) -## Work in progress +## Installation + +*Note that this is still very much developmental* + +```r +# Installation is straightforward with devtools +# install.packages("devtools") +devtools::install_github("beansrowning/cbRw") +``` ## Public Domain This repository constitutes a work of the United States Government and is not diff --git a/man/biased_trans_matrix.Rd b/man/biased_trans_matrix.Rd index 6673df2..e7f0732 100644 --- a/man/biased_trans_matrix.Rd +++ b/man/biased_trans_matrix.Rd @@ -14,6 +14,10 @@ biased_trans_matrix(data, all_data = FALSE) \value{ a biased transition matrix of dim [k,k] where k is the number of unique feature values } +\description{ +Takes in a data.frame of categorical data and returns a weighted transition matrix +which characterizes the edge weights of a directed graph representation of the inter-feature value couplings +} \section{Note}{ If all_data is `TRUE`, A list with three members will be returned: diff --git a/man/cbrw.Rd b/man/cbrw.Rd new file mode 100644 index 0000000..9c645e9 --- /dev/null +++ b/man/cbrw.Rd @@ -0,0 +1,18 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cbrw.R +\name{cbrw} +\alias{cbrw} +\title{Coupled Biased Random Walks} +\usage{ +cbrw(data) +} +\arguments{ +\item{data}{a data.frame containing catgorical data} +} +\value{ +the input data frame with an additional \emph{score} variable representing + relative outlier-ness of the observation +} +\description{ +TODO +} diff --git a/man/cbrw_example.Rd b/man/cbrw_example.Rd index c26f2a8..e4544a7 100644 --- a/man/cbrw_example.Rd +++ b/man/cbrw_example.Rd @@ -3,10 +3,13 @@ \docType{data} \name{cbrw_example} \alias{cbrw_example} -\title{CBRW paper exmaple dataset -Example dataset as shown on the first page of the paper} +\title{CBRW paper exmaple dataset} \format{An object of class \code{spec_tbl_df} (inherits from \code{tbl_df}, \code{tbl}, \code{data.frame}) with 12 rows and 5 columns.} \usage{ cbrw_example } +\description{ +See dataset in cited paper (TODO) +Example dataset as shown on the first page of the paper +} \keyword{datasets} diff --git a/man/counting_helpers.Rd b/man/counting_helpers.Rd deleted file mode 100644 index 629176c..0000000 --- a/man/counting_helpers.Rd +++ /dev/null @@ -1,28 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/count.R -\name{counting_helpers} -\alias{counting_helpers} -\alias{intra_freq} -\alias{inter_freq} -\title{Counting helper functions} -\usage{ -intra_freq(data) - -inter_freq(data) -} -\arguments{ -\item{data}{a tibble of categorical data} - -\item{data}{a tibble of categorical data} -} -\value{ -a tibble containing all unique combinations of feature values - with columns: \emph{u}, \emph{v}, \emph{freq}, emph{p}, and \emph{feature} - -a tibble containing all unique combinations of feature values - with columns: \emph{u}, \emph{v}, \emph{freq}, and \emph{group} -} -\description{ -Helper function to calculate all bivariate frequencies -within the dataset -} diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 0000000..58e1382 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,4 @@ +library(testthat) +library(cbRw) + +test_check("cbRw") diff --git a/tests/testthat/test_cbrw.R b/tests/testthat/test_cbrw.R new file mode 100644 index 0000000..94753ef --- /dev/null +++ b/tests/testthat/test_cbrw.R @@ -0,0 +1,27 @@ +context("Algorithm accuracy") + +test_that("Score values are equal to canonical", { + + # Canonical values pulled from running cbrw test data through + # the python CBRW package, dkaslovsky/Coupled-Biased-Random-Walks + canonical <- c( + 0.123556072282254, + 0.0504085179712585, + 0.0476111020550283, + 0.0495085666704524, + 0.0844538406306188, + 0.0468666953948249, + 0.0476111020550283, + 0.0493436236504131, + 0.0459772220233669, + 0.0694800337106498, + 0.0495038674807341, + 0.0668533395540888 + ) + + # Process example dataset and compare values + data(cbrw_example) + cbrw_example <- cbrw(cbrw_example) + + expect_equal(cbrw_example$score, canonical) +}) \ No newline at end of file