Unmapped Concepts: Multi Site, Anomaly Detection, Cross-Sectional Analysis


Created

Last Modified

Click on the thumbnail above to preview images.

Domain

Category

Parameters

Publisher

PEDSnet

Abstract

This check provides raw data and visualizations to aid a user in evaluating whether unmapped values are present in a dataset of interest. It summarizes the proportion of rows & patients with unmapped values, as well as the median number of unmapped rows per patient.

Probe

Clinical Assessment

Access Package

# install.packages("devtools") devtools::install_github('ssdqa/https://github.com/ssdqa/unmappedconcepts')

Visualization Output

This check outputs a dot plot representing anomalous proportions of patients or rows with unmapped values for a given variable per site. This graph summarizes the mean value for the variable by the dot size, the proportion of unmapped values by the dot color, and whether that variable is anomalous by replacing the dot with a star. A tooltip provides metadata for the variable and the site and precise values for proportion, mean proportion, median proportion, standard deviation and MAD upon hover.

Raw Output

This check produces a raw data output containing 13 columns:

Column Data Type Definition
site character the name of the site being targeted
variable character the name of the variable being investigated for unmapped values
total_rows numeric the total number of rows associated with the variable
total_pt numeric the total number of patients associated with the variable
unmapped_rows numeric the number of unmapped rows associated with the variable
unmapped_pt numeric the number of patients with at least one unmapped row associated with the variable
unmapped_row_prop numeric the proportion of unmapped rows
unmapped_pt_prop numeric the proportion of patients with at least one unmapped row
median_all_with0s numeric the median number of unmapped rows per patient, for all patients, across all sites
median_all_without0s numeric the median number of unmapped rows per patient, for only patients with evidence of the variable, across all sites
median_site_with0s numeric the median number of unmapped rows per patient, for all patients, across a specific site
median_site_without0s numeric the median number of unmapped rows per patient, for only patients with evidence of the variable, for a specific site
mean_val numeric the mean proportion of patients or rows (based on user selection) for each group across sites
median_val numeric the median proportion of patients or rows (based on user selection) for each group across sites
sd_val numeric the standard deviation of the proportion of patients or rows (based on user selection) for each group across sites
mad_val numeric the median absolute deviation of the proportion of patients or rows (based on user selection) for each group across sites
cov_val numeric the coefficient of variance of the proportion of patients or rows (based on user selection) for each group across sites
max_val numeric the maximum proportion of patients or rows (based on user selection) for each group across sites
min_val numeric the minimum prorportion of patients or rows (based on user selection) for each group across sites
range_val numeric the range of the proportion of patients or rows (based on user selection) for each group across sites
total_ct numeric the total number of group members
analysis_eligible character a string indicating whether the group is eligible for anomaly detection analysis
lower_tail numeric the lower bound used to identify low anomalies
upper_tail numeric the upper bound used to identify high anomalies
anomaly_yn character a string indicating whether the value is anomalous or not
output_function character a string indicating the type of visualization that should be generated by uc_output

Funder(s)

This research was made possible through the generous support of Patient-Centered Outcomes Research Institute. The statements presented in this work are solely the responsibility of the author(s) and do not necessarily represent the views of PCORI, its Board of Governors, or its Methodology Committee.

Provenance

Description

Clinical Subjects Headings

Related Data Quality Result

Related Person

Related Code

Related Data Quality Check

Related Publications

Creative Commons license

Except where otherwised noted, this item's license is described as a CC-BY Attribution 4.0 License.

Cite this Data Quality Check

Wieand, K. & Razzaghi, H. (2026, March). Unmapped Concepts: Multi Site, Anomaly Detection, Cross-Sectional Analysis. [D Q Check]. PEDSpace Knowledge Bank. https://doi.org/10.24373/pdsp-641