Strengthening Dashboard Integrity with Real-Time Data Quality Framework
Author – Roopali Bhatt
Business Impact
85%
Fewer Incident Tickets
Dashboard-related issue reports dropped significantly.
70%
Faster Issue Detection
Data problems caught earlier, before impacting decisions.
60%
60% Higher Dashboard Trust
User confidence improved based on internal surveys.
Problem Statement
Our client, a New York-based e-commerce company with over $100 million in annual revenue, relied on dashboards for critical business decisions. But behind the scenes, data was flowing in from multiple sources—each with its own refresh schedule, dependencies, and potential failure points.
The problem? There was no easy way to tell if all data sources had loaded correctly. In some cases, one source would fail while others succeeded, leading to incomplete or misleading insights. Teams would unknowingly make decisions based on outdated or partial data, resulting in operational delays, reporting inconsistencies, and lost business opportunities.
Clients often raised concerns due to unexpected spikes, drops, or zeros in dashboard data. How could they ensure their dashboards always reflected a complete, accurate, and up-to-date view of the business?
Background & Challenges:
- There was no standard way to identify data freshness as for 1 dashboard dataloads from various sources.
- There can be anomalies that need to be detected on time.
- Even Dashboards are showing numbers, Was it correct? No way to confirm for a business person, without connecting the development team.
- If SUM, Average or Count of amount, profit, sales or visit has drastically changed, no alerts were there to notify such activities.
Solution Implemented:
- Developed a modular PySpark Data Quality (DQ) framework, integrated directly within AWS Glue jobs for dynamic validation during data pipeline execution.
- Leveraged Spark SQL and DataFrame APIs to compute key DQ metrics, including record counts, freshness checks, and source-to-target validations.
- Captured DQ results in centralized reporting tables, making them available for downstream analytics and visualization.
- Created a “Data Health” page in each Power BI dashboard, providing users with transparent insights into data freshness, completeness, and overall quality.
- Enabled optional real-time alerts via Email and Microsoft Teams, allowing quick response to any pipeline failures or data quality issues.

Comprehensive Data Quality Validation Framework:
To deliver complete transparency and build user confidence in dashboard reporting, we implemented a layered Data Quality (DQ) framework with three core types of validation checks:
1. Null Check
- Purpose: Ensure that critical columns do not contain unexpected null or missing values.
- Benefit:
- Catches incomplete data early.
- Prevents misleading metrics and blank values on reports.

2. Reconciliation (Recon) Check
- Purpose: Verify that the number of records and critical aggregations match between source and target systems after data movement.
- Benefit:
- Confirms end-to-end data consistency.
- Detects data loss or duplication during pipeline execution.
- Key use: Catch dropped/duplicated/incorrectly transformed records.

3. Trend Check
- Purpose: Compare current data volume or key metrics against historical trends to detect unusual spikes, drops, or anomalies
- Benefit:
- Identifies subtle data issues that might not be obvious through basic validations.
- Protects against silent data corruption over time.
- Two variants:
- Rolling average (e.g., last 7/30 days)
- Weekday average (compare same day of week)
- Detects spikes or drops beyond threshold.

Architecture / Tools
- AWS Glue + PySpark: Scalable validation layer.
- Config-Driven Design: Allows adding new checks without code rewrite.
- Result Logging: Stored to reporting tables, visualized in Power BI.
- Power BI Integration: Automatically refreshed with ETL runs.
Challenges Faced
- Glue Data Catalog DQ tools required creating separate jobs for each check, which was inefficient and costly.
- Created a single custom class to plug into pipelines for flexible, in-line validations.
Results and Benefits
- Automation Replaced Manual Effort: No more ad-hoc Athena/SQL checks.
- Improved Confidence in Reports: Clients trust the validations visible in the dashboard.
- Scalable and Maintainable: Adding new checks or tables is lightweight.
- Early Detection of Anomalies: Trend checks catch subtle issues not visible in counts.
- Improved QA Workflow: Integrated alerts and standard playbooks for resolution.
Conclusion: Building Trust Through Data Transparency
By implementing a comprehensive Data Quality framework embedded directly into the data pipeline, we transformed how users interact with and trust the dashboards.
The addition of Null, Reconciliation, and Trend checks provided end-to-end validation across multiple data sources and layers, ensuring that any inconsistencies were immediately detected and surfaced.
With the Dashboard Health Page offering real-time visibility into the status of data loads, freshness, and quality, business users no longer had to second-guess the accuracy of the information they relied on for decision-making.
This approach delivered three major impacts:
- Increased Confidence: Users gained assurance that dashboard insights were accurate, complete, and up-to-date.
- Faster Issue Resolution: Early detection and proactive alerts allowed data engineering teams to address issues before they impacted reporting.
- Empowered Decision-Making: Reliable data empowered leaders to make faster, data-driven decisions without hesitation or manual data validation overhead.
Overall, this solution significantly strengthened trust in reporting systems, reduced operational risk, and improved the organization’s agility in using data as a competitive advantage.