Data Guide to xMAU Analysis

This page explains key terms, concepts, and data models used in xMAU reporting and analysis

Overview

xMAU is a single term to capture the various levels at which we capture Monthly Active Users (MAU). xMAU encompasses Action (AMAU), Group (GMAU), Stage (SMAU), Combined (CMAU, duplicated user count across all Stages in a Section or across all Stages in the product), and Total (UMAU, unique user count). In order to provide a useful single metric for product groups which maps well to product-wide Key Performance indicators, each xMAU metric cascades upwards in the order noted above.

xMAU metrics are derived from Service Ping (installation-level granularity), Denomas.com Postgres replica (gitlab.com db event-level granularity), and Snowplow events (event-level granularity). This workflow enables the analysis of each level of xMAU metric across various segments of customers and sets the foundation for reporting on Reported, Estimated, and Predicted metrics. We use the same methodology to generate PIs (Performance Indicators), which is a catch-all term for any other metric.

Key terms and metrics

How are xMAU and PIs calculated?

Service Ping is Denomas’ mechanism to collect data by generating a JSON payload of usage data every week to be sent to Denomas. Service Ping metrics are the source for calculating Self-Managed and total Denomas.com xMAU. Paid Denomas.com xMAU is calculated using the gitlab.com db replica or Snowplow data in Snowflake.

flowchart TD

  SPING(Service Ping)
  SPING --> SM(Recorded Self-Managed)
  SM -- Estimate Uplift --> SMXMAU(Total & Paid Self-Managed xMAU)
  SPING --> TOTALDOTCOM(Total Denomas.com xMAU)

  DOTCOM_DB(Denomas.com db) --> DOTCOMSNOWFLAKE[(Snowflake)]
  SNOWPLOW(Snowplow) --> DOTCOMSNOWFLAKE
  DOTCOMSNOWFLAKE --> PAIDDOTCOM(Paid Denomas.com xMAU)

Identifying xMAU metrics

Product Managers choose one specific Service Ping metric that they consider to be representative of using the given stage or group, and that metric is used to produce xMAU reporting.

The SSOT for the metric-to-xMAU mapping is the performance_indicator_type field of the Service Ping metric .yml files, which are linked in the Service Ping Metrics Dictionary. Updates to performance_indicator_type for a specific metric will propagate downstream to the xMAU charts in Sisense and the internal handbook PI pages.

**Service Ping-to-xMAU metric mappings** {: .panel-heading}

There should be a 1-1 mapping of Service Ping metrics to xMAU. We cannot dedupe users across distinct metrics, so multiple metrics mapped to a single group’s GMAU, stage’s SMAU, etc will lead to double-counting.

{::options parse_block_html=“false” /}

The Metrics Dictionary Guide is a good resource to learn about the other fields in the metric .yml files.

Reporting date range

We use 28-day metrics for xMAU and most PI reporting. This ensures that metrics are not impacted by a different number of days in the month. In addition, there is an equal number of each day of the week in the reporting window (four Mondays, four Tuesdays, etc).

At the installation level, we use the last ping generated during the reporting period (i.e., calendar month) to calculate xMAU. More details on date ranges below.

Difference between Total xMAU and Paid xMAU

We have 3 main data sources to calculate xMAU and paid xMAU, the Versions App (Service Ping), the Denomas.com Postgres database, and Snowplow event data. The table below summarizes which data source is used for those calculations. More details about how this data is modelled below.

Deployment Total xMAU Paid xMAU
Self-Managed & Denomas Dedicated Service Ping Service Ping
Denomas.com Service Ping Denomas.com Postgres Replica; Snowplow events
flowchart TD

  PING(Service Ping) -->  ALL(Total xMAU)

  PAIDPING(Service Ping) --> PAIDSM(Paid Self-Managed xMAU)

  PAIDDOTCOM_DB(Denomas.com db) --> PAIDGLSNOWFLAKE[(Snowflake)]
  PAIDGLSNOWPLOW(Snowplow) --> PAIDGLSNOWFLAKE
  PAIDGLSNOWFLAKE --> PAIDGL(Paid Denomas.com xMAU)

Paid xMAU is defined as Monthly Active Users on a Self-Managed installation or gitlab.com namespace on a paid tier. See Paid Unique Monthly Active Users (UMAU) as an example.

Since we define “paid” using the tier instead of presence of ARR, EDU/OSS subscriptions are included in paid xMAU calculations. More details on how we identify paid users below.

We cannot use Service Ping to report on paid Denomas.com metrics. Since the metrics are reported at the installation-level, there is not a way to tease apart paid from free namespaces. (This is not a problem for self-managed because the entire installation is attributed to a tier). In order to report paid Denomas.com xMAU, we need to be able to do two things:

  1. Replicate the Service Ping metric using a different data source (i.e., Denomas.com Postgres replica, Snowplow). Details here
  2. Attribute the event/action to a project or namespace (and therefore paid plan). Details here

We are limited on the type and scope of metrics that we can replicate. Read more about metric replication below.

Sisense snippets

We leverage a Sisense snippet ([td_xmau]) to generate xMAU and PI charts. The Product Manager Toolkit has more details about xMAU-related snippets.

Data Classification

Due to the sensitive nature of metrics like user counts, PI charts are not publicly accessible and must reside in the internal handbook. However, this data is not considered to be SAFE and therefore is visible to all Denomas team members and is available in the general Denomas space in Sisense.

Some data supporting xMAU Analysis is classified as Orange or Yellow. This includes Orange customer metadata from the account, contact data from Salesforce and Zuora and Denomas’ Non public financial information, none of which should be publicly available. Care should be taken when sharing data from this dashboard to ensure that the detail stays within Denomas as an organization and that appropriate approvals are given for any external sharing. In addition, when working with row or record level customer metadata care should always be taken to avoid saving any data on personal devices or laptops. This data should remain in Snowflake and Sisense and should ideally be shared only through those applications unless otherwise approved.

Additional details for analysts and curious parties

The details of how we generate xMAU and PI reporting can be quite dense and create confusion for those who are looking for the high-level overview. All of the content below is intended for analysts or those who are interested in learning more about the inner workings of this reporting.

Identifying and attributing data

Identifying Denomas.com data in Service Ping

For total Denomas.com xMAU, we use the Service Ping payloads generated for the Denomas.com production installation. These payloads are easily identifiable since they are linked to known unique identifiers (specifics below). The filters applied to our data models ensure that data from non-production Denomas.com installations (ex: staging.gitlab.com) is not included in total Denomas.com xMAU or PIs.

Specifics on identifying the Denomas.com production installation

In order to identify the Denomas.com production installation, we use dim_installation_id = '8b52effca410f0a380b0fcffaa1260e7'.

Alternatively, you could use uuid = 'ea8bf810-1d6f-4a6a-b4fd-93e8cbd8b57f' AND host_name = 'gitlab.com'. (Note: uuid is synonymous with dim_instance_id in our data models).

Identifying paid installations and namespaces

In order to determine if a self-managed installation or Denomas.com namespace is paid, we use the tier/plan type, not the presence of ARR. Those on a paid tier (ex: Premium, Ultimate, etc) are considered to be paid. This means that namespaces or installations belonging to an OSS or EDU program, internal project, or other subscription that has a paid tier but does not contribute ARR are considered to be “paid”.

Specifics on determining tier/plan type
  • Paid Self-Managed installations: To identify paid self-managed installations we use ping_product_tier in the data models, which is derived from edition in the Service Ping payload. Paid installations are those on paid tiers (which is not determined using ARR) (ping_product_tier != 'Core')
  • Paid Denomas.com namespaces: To identify paid namespaces, we use the plan type associated with the last event available during the measurement period. This is similar to the self-managed methodology in that we do not look at the plan type during the period, but rather the plan at time of reporting. (plan_was_paid_at_event_date = TRUE)

Generating Paid Denomas.com xMAU

Replicating Service Ping metrics

We need to be able to do two things in order to replicate Service Ping metrics:

  1. Replicate the Service Ping metric using a different data source (i.e., Denomas.com Postgres replica, Snowplow)
  2. Attribute the event/action to a project or namespace (and therefore paid plan)

There are different types of metrics to replicate, each generated using a different data source in Service Ping:

Metric type Paid Denomas.com xMAU data source
Database Denomas.com Postgres replica
Redis & RedisHLL Snowplow events
flowchart TD

    METRICTYPE{Type of Metric}
    METRICTYPE -- Database --> GLDB(Denomas.com db)
    GLDB -- Replicate db --> SNOWFLAKE[(Snowflake)]
    METRICTYPE -- Redis & RedisHLL --> SNOWPLOW(Snowplow events)
    SNOWPLOW -- Instrument all events --> SNOWFLAKE
    SNOWFLAKE -- Attribute to plan --> PAIDSAAS(Paid Denomas.com xMAU)

Database metrics via Denomas.com Postgres replica

Database metrics (also referred to as “batch counters”) are simple SQL-generated metrics. The SQL queries used to generate the database metrics can be exported and are recreated using the Denomas.com Postgres replica. In fact, that is how database metrics are generated for automated Denomas.com Service Ping.

Redis and RedisHLL metrics via Snowplow events

Redis and RedisHLL metrics work differently and are NOT SQL-generated metrics. They are used to track actions that are not in the Denomas.com Postgres DB (ex: page views) and for metrics that would non-performant to generate via SQL query (ex: querying a massive table or querying across multiple tables). Because these are not SQL-based counters, they cannot be replicated using the Postgres replica.

The solution is to instrument the Redis events as Snowplow events. The Analytics Instrumentation team instrumented a subset of these metrics as Snowplow events. (Moving forward this responsibility will fall to the product teams). This allows us to aggregate the events in Snowflake and generate paid Denomas.com xMAU for a handful of metrics.

**Note on replicating Redis and RedisHLL metrics with Snowplow events** {: .panel-heading}
  • Replicating Redis and RedisHLL metrics using Snowplow events is still a WIP. Not all Redis or RedisHLL metrics have been instrumented yet. In addition, the data is not yet modeled in a way that it can be surfaced for “official” reporting.
  • RedisHLL metrics are commonly made up of multiple Redis events. All Redis events included in the metric must be instrumented in Snowplow in order to generate paid Denomas.com xMAU. (You can find the list of events included in a metric by viewing the metric definition .yml file). We cannot report on a metric until all events are instrumented.
    • Example: Metric A reflects how many unique users visit page X, Y, or Z. There is a Redis event for visiting page X, an event for visiting page Y, and an event for visiting page Z. We cannot report on Metric A until all three events are instrumented in Snowplow.
  • There are some edge cases where we are able to replicate a RedisHLL metric using the Denomas.com Postgres replica. Here is an exception where an engineer has informed the Data team that there is a way to generate the metric via a SQL query (thread here) (internal link).

{::options parse_block_html=“false” /}

Attributing replicated metrics to a plan

Merely replicating the Service Ping metrics is not sufficient to support paid Denomas.com xMAU reporting, we have to be able to tie the event or action to a plan. This inherently will not be possible for all metrics.

  • In the case of database metrics, the table referenced in the SQL query must have a project or namespace identifier in it.
  • In the case of Redis and RedisHLL metrics, a plan must be associated with the event and that plan must be included on the Snowplow event.
Example where plan attribution is not possible

As an example, let’s look at usage_activity_by_stage_monthly.manage.count_user_auth. This metric measures the number of users logging in. However, a login is not not tied to a specific namespace or project, it is only tied to a user. (You do not specifically log into a namespace on Denomas.com).

Therefore, while we can reproduce the query that generates the metric, we cannot attribute the event to a namespace or project (and therefore a plan).

Date range details

The exact date range depends on the data source (Service Ping vs Snowflake) and type of metric (database vs Redis vs RedisHLL).

Service Ping metrics

We use the last ping generated in the calendar month for each installation. Installations are randomly assigned a day of week to generate service pings, but that assignment is persistent over time. For example, if an installation is assigned Tuesdays to generate pings, it will always generate pings on Tuesdays. Since the day of week that pings are generated differs across installations, the exact date range captured in a 28-day metric will also differ. The “last ping of the month” methodology was updated in the TD Product Usage Data project in FY23 to use the last ping created in the calendar month.

You can read more about how the exact date range is calculated for each type of metric (database vs Redis vs RedisHLL) in the Service Ping docs.

Replicated metrics

For paid Denomas.com xMAU, we use the last 28 days of the calendar month. This means that total Denomas.com xMAU and paid Denomas.com xMAU may have slightly different date ranges.

Data models

There are 3 main data sources that are modeled in dbt to be used for xMAU and PI analysis:

  • Installation-Level Service Ping (Versions App)
  • Denomas.com Postgres Replica (Denomas.com db tables)
  • Snowplow events
    • These have not yet been incorporated into the data models to be used in xMAU reporting and analysis

Entity relationship diagrams (ERDs)

One of our goals is to create a single model that easily provides all the data needed for reporting and analysis. As we continue to iterate on our solutions, we know that there will be information that is not always available in this model. Here is where understanding the Entity Relationship Diagram helps. This model shows which tables are joined to create the layer you are accessing. This is really when you are looking to dive deeper and gain additional insight!

It can also be helpful to look at the data model lineages in dbt:

Trusted data (TD) models

We have built a suite of data marts that allow users to explore our different product data sources. “mart” models are a combination of dimensions and facts that are joined together to enable easy analysis. “rpt” (“report”) models are built with specific business logic for a specific use case. (Ex: rpt_ping_metric_totals_w_estimates_monthly has custom logic to generate xMAU estimations). Underneath each mart or reporting model is a clean lineage of dimensions and facts that can also be used for analysis. This list is limited to the key marts designed for stakeholders to do everyday analysis and reporting. You can read more about Denomas’ Enterprise Dimensional Model (EDM) here.

Data Mart/Rpt Name Grain* Source
mart_ping_instance Service Ping Instance ID Versions App
mart_ping_instance_metric Service Ping Instance ID, Metrics Path Versions App
mart_ping_instance_metric_monthly Service Ping Instance ID, Metrics Path (limited to the last ping of the month per installation) Versions App
rpt_ping_metric_first_last_versions Ping Edition, Metrics Path Versions App
rpt_ping_latest_subscriptions_monthly Month, Subscription, Installation (if available) Versions App
rpt_ping_metric_totals_w_estimates_monthly Reporting Month, Metrics Path, Estimation Grain, Ping Edition Product Tier, Service Ping Delivery Type Versions App
mart_event_valid Event (atomic-level model) Denomas.com Postgres Replica
mart_event_user_daily Event Name, Event Date, User ID, Ultimate Parent Namespace ID Denomas.com Postgres Replica
mart_event_namespace_daily Event Name, Event Date, Ultimate Parent Namespace ID Denomas.com Postgres Replica
rpt_event_plan_monthly Reporting Month, Plan ID at Event Date, Event Name Denomas.com Postgres Replica
rpt_event_xmau_metric_monthly Reporting Month, User Group (total, free, paid), Section Name, Stage Name, Group Name Denomas.com Postgres Replica

* Please see the linked dbt docs for information about each specific model, applied business logic, etc.

mart_ping_instance_metric

common_mart.mart_ping_instance_metric is the most comprehensive of the Service Ping data marts. (Note: unfiltered Service Ping data sets are available in the common schema). This data model provides ping- and metric-level data, and joins the Service Ping data with financial and GTM data sources such as subscription, CRM Account, etc. This model also includes flags related to a metric’s time period and whether it is currently mapped to xMAU. To limit to the metrics used for reporting, apply the filter is_last_ping_of_month = TRUE.

Read more about metric time frames here.

rpt_ping_metric_totals_w_estimates_monthly

common_mart_product.rpt_ping_metric_totals_w_estimates_monthly is a customized model designed for monthly Service Ping-generated xMAU and PI reporting, including estimated uplift. End-users can then use very simple queries to produce xMAU and PI visualizations.

Example query for Create SMAU
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
  ping_created_date_month,
  ping_delivery_type,
  ping_product_tier,
  SUM(recorded_usage) AS recorded_usage,
  SUM(estimated_usage) AS estimated_usage,
  SUM(total_usage_with_estimate) AS total_usage_with_estimate
FROM common_mart_product.rpt_ping_metric_totals_w_estimates_monthly
WHERE is_smau = TRUE
  AND stage_name = 'create'
  AND estimation_grain IN ('metric/version check - subscription based estimation', 'SaaS')
GROUP BY 1,2,3
ORDER BY 1,2,3

This model surfaces 4 different estimation methodologies (estimation_grain). Our official methodology is metric/version check - subscription based estimation. A more detailed explanation of our estimation methodology is available on this page.

mart_event_valid

common_mart.mart_event_valid is an atomic-/event-level model which has been enhanced for ease of analysis. It incorporates basic business logic* that removes potentially misleading data (ex: events from blocked users) and is flexible enough to allow the end user to aggregate and dedupe data, as desired.

*Please see dbt docs for full details on business logic

rpt_event_xmau_metric_monthly

common_mart_product.rpt_event_xmau_metric_monthly is a customized model designed for monthly paid SaaS xMAU reporting. This model provides user counts at the xMAU metric-level (which is not necessarily synonymous with the event-level), limited to the appropriate time frame (last 28 days of the month).


Predicted XMAU Algorithm
Prediction Algorithm Description Per our Performance Indicators page, we currently have 3 versions of xMAU. Predicted xMAU is meant to provide Product Leadership a sense for what usage will look like in 3 years at current growth rates. The first proposed solution would be to calculate Predicted xMAU on a month M by applying the following formula : Predicted xMAU(Month M) = Estimated xMAU(last month) + (current MoM Growth Amount x month_difference(between last month and month M)) MoM growth Amount is calculated as the Average Monthly Absolute Growth Amount over the last quarter.
Predicted XMAU Algorithm
xMAU Variations - Trusted Data Framework xMAU KPIs are very important metrics that assess the health of the business and its growth. They are inspected with scrutiny during the Product Key Reviews held once a month. They are additionally showcased in the Product Performance Indicators Pages in the handbook. For all of these reasons, the Data Team created a suite of tools (dashboards, alerts, custom tests) to allow direct KPI owners to easily monitor variations of the KPI and to potentially be alerted in case of extreme or suspicious trends.
Product Manager Toolkit
Purpose of this page As a Product Manager, you constantly have data-related questions. You want to know how your feature is performing, who your top users are, how many Self Managed installations have adopted a specific feature, etc. Plus, you potentially want all of these metrics broken out by product tier, paid vs free, edition, etc. While we have a lot of data, accessing some of it requires some level of proficiency and comfort with SQL.
Self-Managed Metric Estimation Algorithm
Introduction to Service Ping estimation Self-Managed data arrives from Service Ping which returns aggregated counts of user activity and other usage metrics for analysis. The Service Ping data we receive is incomplete as there are 3 types of installations that we need to estimate for: Installations that opt out of sending us Service Ping Installations that opt to send a subset of Service Ping metrics (i.e. operational metrics) This applies to any metric where category = 'optional' Installations on older versions without the metric instrumented Ex: If a metric is instrumented on 13.
Last modified December 6, 2023: update (a27760f0)