The Link King
Record Linkage and Consolidation Software
Performance Statistics: Duplicate Record Analysis
The chart below reflects processing time and memory requirements for consolidation of a
single dataset consisting of 100,000 to 4,000,000 records.
Unduplication of a client database consolidates multiple records for a single client under a
unique client identifier.
The chart below reflects processing time for consolidation of a single dataset containing 100,000 to 4,000,000 client records under
2 different blocking scenarios. The Link King offers 3 different blocking scenarios. The least resource intensive level uses blocking
rules developed by MEDSTAT for the Substance Abuse and Mental Health Services Administration. The more resource intensive
criteria were developed at Washington State's Division of Alcohol and Substance Abuse.
Although the more extensive criteria takes considerably more time, only 3-4% more links are usually found than identified using the
minimum recommended blocking criteria (developed for SAMHSA by MEDSTAT). Still, when it is important to capture all record
linkages, the time invested in using the more extensive criteria may be worthwhile.
Hard Disc Requirements for Consolidation of a Single Dataset
The chart below reflects processing time for consolidation of a single dataset containing 100,000 to 2,000,000 client records under
the 2 different blocking scenarios described above..
Performance Statistics: Linkage of Two Tables
The chart below reflects processing time and memory requirements for record linkage of two datasets: a dataset consisting of
10,000 clients (a.k.a the sample) is linked to datasets consisting of 100,000 to 4,000,000 clients. This application of record
linkage technology is used to track clients across multiple data systems.
These estimates were developed using The Link Kings more extensive blocking criteria (see discussion above).
System requirements and software performance for record linkage/unduplication tasks vary depending on the magnitude of the
linking/unduplication task, the size of the analytic dataset, processor speed, and release of SAS.. Power or patience !
Copyright Camelot Consulting 2004