sql - how to valiate Millions of data? -


how validate scenario?

scenario 1:

source file flat file contains millions of data. data source file loaded target table in data base.

now question how validate if data loaded in target table correctly??

note: can’t use xls validate have millions of records in it.

there lots of ways 1 can validate data. of depends on 3 things:

  1. how time have validation?

  2. what processing capabilities?

  3. is data on qa or production sql server?

if in qa , have lots of processing power, can basic checks:

  • where there warnings or errors during data load?
  • count total number of items in database vs. raw file
  • count total number of null records in database
  • check total number of columns vs. raw file
  • check length of variables. expected?
  • are character columns unexpectedly truncated?
  • are numeric columns out correct number of significant digits?
  • are dates reasonable? example, if expected dates 2004, 1970?
  • how many duplicates there?
  • check if data in columns make sense. few questions can ask: rows "shifted?" numeric variables in numeric columns? key column key? column names make sense? check of null records should detect these things.
  • can manually calculate columns , compare calculation 1 in file?

if low on processing power or on production server , not want risk degrading performance other users, can many of above checks simple random sample. take, say, 100,000 rows @ time.; or, stratify if needed.

these few checks can do. more comparisons , sanity checks, better off are.

most importantly, communicate these findings , anything seems strange file owner. should able give additional insight data load correct, or if gave right file in first place.

you're loading data , providing many reasonable checks possible. if they're satisfied outcome, , you're satisfied outcome, should consider data valid.


Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -