Data Program Level 2 Reference Solution

Purpose

This page contains forward-looking content and may not accurately reflect current-state or planned feature sets or capabilities.

Public companies need to reliably and predictably share key financial, customer, and growth metrics as well as analyze lead-to-cash and product idea-to-adoption processes to continually improve business performance. These activities are supported by capabilities defined in Level 2 of the Data Capability Model. To provide a realistic example and to serve as a reference for future development, this page presents the Level 2 Data Solution for ‘Product Geolocation Analysis’.

Solution Overview - Product Geolocation Analysis

Understanding where your product is used around the world is an important step towards developing a more complete understanding of your customers, your product’s global reach, and related location-aware insights.

This data solution delivers three Self-Service Data capabilities:

  1. Dashboard Viewer - a new Sisense dashboard to visualize Denomas deployments by Country, Territory, Month, and Year (Self-Service Dashboard)
  2. Dashboard Developer - a new Sisense data model containing the complete dimensional model components to build new dashboards and link existing dashboards to geolocation data
  3. SQL Developer - a new Enterprise Dimensional Model subject area

From a Data Platform perspective, the solution delivers:

  1. an extension to the Enterprise Dimensional Model for GeoLocation analysis
  2. a new dim_country table
  3. testing and data validation extensions to the Data Pipeline Health dashboard
  4. ERDs, dbt models, and related platform components

Finally, this is the long-term automated solution for several ad-hoc issues completed over the past year, including:

Knowledge Assessment & Certificates

The Self-Service Data Certificate program is based on the Learning and Development Certification program. The Self-Service Data program provides individual Certificates for each subject-oriented Dashboard Developer or SQL Developer Knowledge Assessment successfully completed. Links to the Knowledge Assessments are located in the appropriate sections below.

Data Classification

ORANGE

  • IP Address

Solution Ownership

  • Source System Owner: @rparker2
  • Source System Subject Matter Expert: @jeromezng
  • Data Team Subject Matter Expert: @rparker2

Key Terms

Key Metrics, KPIs, and PIs

  • number of Usage Pings By Country or Region By Month and Year
  • number of Denomas.com Page Views By Country or Region By Month and Year
  • KPIs - none defined
  • PIs - none defined

Self-Service Data Solution

Self-Service Dashboard Viewer

Dashboard Purpose
Worldwide Product Growth Visualize the adoption of Denomas by country, region, and time.
Data Health Dashboard for Geolocation Data Data Health of the Geolocation data used to support this solution.

Self-Service Dashboard Developer

Data Space Description
Global Contains a data model containing a 1-1 relationship with the Product Geolocation Analysis model detailed below

Self-Service Dashboard Developer Certificate

To receive a Certificate, you will need to earn 100% on the Self-Service Dashboard Developer Knowledge Assessment and upload a screenshot of your new dashboard when prompted. Upon completion of the Knowledge Assessment, you will be emailed your responses and this email will serve as your Certificate.

Self-Service SQL Analysis

Key Fields and Business Logic

  • Data is sourced from usage-ping and snowplow.
  • Usage-ping includes information about where the instance is HOSTED, not where the users of the host originate.
  • Snowplow includes the best information we have available about where users are originating from. Because some users deploy Denomas behind VPNs or a similar proxy solution, the IP-to-Geolocation mapping from these sources may not be accurate. But it is still the best data source we have and represents our best source of truth.

Entity Relationship Diagrams

Diagram/Entity Grain Purpose Keywords
Product Geolocation Analysis Activity By Day Dimensions and Facts that can be used to analyze Denomas usage by country, territory, and time. dim_date, dim_country, fct_country_activity_by_day
dim_date Day Central dimension for all dates.
dim_country ISO_Country Central dimension for all countries and territories, sourced from ISO-3166 and Denomas Sales Territories

Reference SQL

All production SQL in Sisense or dbt must adhere to our SQL Style Guide for legibility and maintainability.

Page views by country by day for NORAM
1
2
3
4
5
6
7
8
SELECT
  f.date_key            AS date_key,
  dc.country_name       AS country_name,
  SUM(f.num_page_views) AS number_of_page_views
FROM fct_country_activity_by_day f
JOIN dim_country dc
  ON f.country_key = f.country_key
WHERE dc.reporting_region = 'NORAM'
Top 100 namespaces by country in 2020
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT
  dc.country_name       AS country_name,
  f.namespace_key       AS namespace_key,
  SUM(f.num_page_views) AS number_of_total_page_views
FROM fct_country_activity_by_day f
JOIN dim_country dc
  ON f.country_key = f.country_key
WHERE dd.year = 2020
GROUP BY dc.country_name, f.namespace_key
PARTITION BY dc.country_name LIMIT 100

Self-Service SQL Developer Certificate

To receive a Certificate, you will need to earn 100% on the Self-Service SQL Developer Knowledge Assessment and upload a screenshot of your new SQL statement when prompted. Upon completion of the Knowledge Assessment, you will be emailed your responses and this email will serve as your Certificate.

Data Platform Solution

The overall solution adheres to our Enterprise Dimensional Model guidelines.

Data Lineage

See dbt documentation for a complete lineage graph.

graph LR
    A[usage_ping-14dRR] --> 1{ETL}
 B[raw.snowplow-24hRR] --> 1{ETL}
 1 --> C[analytics.snowplow_unnested_events_90-24hRR]
 C --> 2{ETL}
 F[edm.dim_date-static] --> 2{ETL}
 G[edm.dim_user-24hRR] --> 2{ETL}
 2 --> H[edm.fct_country_activity_by_day-24hRR]

DBT Solution

The dbt solution generates a dimensional model from RAW source data.

Trusted Data Solution

Enterprise Dimensional Model Validations

Validation Expected Result
1 Total number of countries mapped does not exceed 300.
2 Percentage of traffic from APAC is not greater than AMER.
3 >40,000 # New fct_country_activity_by_day rows added by Day.

Source Data Pipeline validations

Validation Expected Result
1 New usage_ping data has been uploaded in last 14 days.
2 Total # of accounts represented by usage_ping data >= expected result.
3 Total # of accounts represented by snowplow data >= expected result.
Last modified December 1, 2023: bulk update (176cf9ec)