Load and Test Details

This DQec report is generated from testing completeness in OMOPV5_0 data from University of Washington on 2019-03-20.

Table 1. List and Status of Common Data Model (CDM) – here, OMOPV5_0 – Tables in this load

The table bolow provides a list of CDM tables provided (and not provided) in the data load.

The source data this table and the following graphics in this section are being generated from is tablelist_OMOPV5_0_University of Washington_20-03-2019.csv

Figure 1. Available Tables, Compared to all CDM (OMOPV5_0) Tables

This figure shows which of the CDM tables are loaded and/or available.

## Warning in `[.data.table`(dtfDT, , `:=`("c", fact), with = FALSE):
## with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.

Figure 2. File Size and Row Numbers by Table in the (OMOPV5_0) Load

Figure 3. Loaded tables against CDM (OMOPV5_0) Relational Model.

The figure below shows a network visualization of the CDM data model, as well as highlighting the tables that are available in this load (legend is the same as in Figure 1).

Completeness Results

Table 2. The Master Completeness Results Table

The table below provides results of completeness test at the value/cell level.

  • TabNam = OMOPV5_0 table name
  • ColNam = Column name
  • DQLVL = Level of importance for completeness test. (X: Extremely Important, H: Highly Important, L:Low Importance)
  • FRQ = Frequency of rows
  • UNIQFRQ = Frequency of unique values in each column
  • MS1_FRQ = Frequency of cells with NULL/NA values or empty strings in each column
  • MS2_FRQ = Frequency of cells with characters in each column that don’t represent meaningful data – including, ‘+’, ‘-’, ’_‘,’#‘,’$‘,’*‘,’', ‘?’, ‘.’, ‘&’, ‘^’, ‘%’, ‘!’, '@', and ‘NI’.
  • MSs_PERC = Percentage of overall missing data in each column

Data for this table is generated from DQ_Master_Table_OMOPV5_0_University of Washington_20-03-2019.csv saved under report directory.

Figure 4. Changes in Primary Keys Across Loads

Figure below profiles changes in primary keys across loads as a measure of change in patient/record number over time.

Data for the figure is stored in FRQ_comp_trberg_20-03-2019.csv

Figure set 1. Proportion of Missing Data by Type in Loaded Tables

Figures below show proportion of missing cells/values in each column of each table loaded. Figures are generated based on Table 2.

  • MS1_FRQ = Frequency of cells with NULL/NA values and empty strings in each column – presence of absence
  • MS2_FRQ = Frequency of cells with characters in each column that don’t represent meaningful data – presence of nonsense
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

## 
## [[20]]

## 
## [[21]]

## 
## [[22]]

## 
## [[23]]

## 
## [[24]]

Data Model Tests

Figure set 2. Common Key Variables

Figures below visualize number of unique key variables that are common in multiple OMOPV5_0 tables.

  • The Reference column on the right comes from the table in which the variable is a primary key, and therefore is a reference for all other tables.

  • Count_Out shows number of unique key variables that are not present in the reference table – e.g., person id from observation table that does not exist in person table.

  • Count_In represent number of unique key variables that are present in the reference table – e.g., person id from observation table that exist in person table as well.

Test of Completeness in Key Clinical Indicators

Figure 5. Common Key Variables

Figure 5 shows the parcentage of patients missing specific key clinical indicators.

info

This is report is from DQe-c version 3.2

Ask questions or report issues: trberg@uw.edu or kstephen@uw.edu

This tool was funded by ITHS and CD2H. For citation, see https://www.ncbi.nlm.nih.gov/pubmed/29069394