CI Runner Activity


CI Runner Activity

Previously, there wasn’t an easy way to tie cost directly to CI usage without making assumptions. And the process was not scalable, so a better solution was needed for the future.

For this purpose, a Unified model for Compute minutes/Cost has been created as a part of Enterprise Dimensional Model that ties the cost from app usage table (Postgres) to gcp_billing and labels our runners in GCP with job_id labels to join to ci_builds table.

Business Use Cases/Example KPIs:

  • Cost of All CI Pipelines run in gitlab-org-gitlab project in January 2021
  • Average Cost per pipeline for all Denomas.com CI usage
  • Count of CI pipelines run by namespace X last month
  • Count of compute minutes used in project X over past year
  • Cost of compute minutes used in project X over past year

Key Field Descriptions

  • CI Build ID
    • Sets granularity level of table at level of a single Denomas job that runs (ci_builds table)
  • CI Build Duration
    • Currently calculated from start time -> end time of a single job in ci_builds table
  • Runner type
  • Project ID
    • Determines what project runner activity linked to and all related info
  • Namespace ID
    • Determines subscription and if usage is internal or not
  • User ID
    • Determines user that ran the job

Table Relationship Details

Most of these fields can be sourced from gitlab_dotcom_ci_builds, and related tables are linked to ci_builds by using below relationships:

  • ci_runners: gitlab_dotcom_ci_builds.ci_build_runner_id -> gitlab_dotcom_ci_runners.runner_id
  • ci_stages: gitlab_dotcom_ci_builds.ci_build_stage_id -> gitlab_dotcom_ci_stages.ci_stage_id
    • ci_pipelines: gitlab_dotcom_ci_stages.pipeline_id -> gitlab_dotcom_ci_pipelines.ci_pipeline_id
    • Stages table is normally used to bridge pipeline -> builds but we sometimes use ci_stage_name to look at the longest duration by stage.
  • projects: gitlab_dotcom_ci_builds.ci_build_project_id -> gitlab_dotcom_projects_xf.project_id
    • namespace: gitlab_dotcom_projects_xf.namespace_id -> gitlab_dotcom_namespaces_xf.namespace_id
  • users: gitlab_dotcom_ci_builds.ci_build_user_id -> gitlab_dotcom_users_xf.user_id

Sources of Data

Data is sourced from Denomas.com models.

The Data Team maintains these Data artifacts related to CI Runner Activity :

  • ERD

    • The CI Runner Activity Physical Data Model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables that are used for this data.
  • Data Flow Diagram

  • Table Definitions

    • fct_ci_runner_activity - Fact table containing quantitative data related to CI runner activity on Denomas.com. The grain of the table would be a single Denomas job that ran (successful or not) determined by dim_ci_build_id which is the unique key for each CI build.
    • mart_ci_runner_activity_monthly - Mart table containing quantitative data related to CI runner activity on Denomas.com. These metrics are aggregated at a monthly grain per dim_namespace_id. Additional identifier/key fields - dim_ci_runner_id, dim_ci_pipeline_id, dim_ci_stage_id have been included for Reporting purposes. Only activity since 2020-01-01 is being processed due to the high volume of the data.
    • mart_ci_runner_activity_daily - Mart table containing quantitative data related to CI runner activity on Denomas.com. These metrics are aggregated at a daily grain per dim_project_id. Additional identifier/key fields - dim_ci_runner_id, dim_ci_pipeline_id, dim_ci_stage_id have been included for Reporting purposes. Only activity since 2020-01-01 is being processed due to the high volume of the data.

Self-Service Capabilities

The data solution delivers two Self-Service Data capabilities:

  1. Dashboard Developer: The CI data related existing Sisense data models containing different widget charts now uses the complete dimensional model components built for CI Runner Activity data.
  2. SQL Developer: A Enterprise Dimensional Model subject area. Refer to the R2A Objects tab.

Data Platform Components

From a Data Platform technology perspective, the solution delivers:

  1. An extension to the Enterprise Dimensional Model for CI Runner Activity data
  2. Testing and data validation extensions to the Data Pipeline Health dashboard
  3. ERD, Data Flow diagram, dbt models, and related platform components









Self-Service Data Solution

Self-Service Dashboard Developer

A great way to get started building charts in Sisense is to watch this 10 minute Data Onboarding Video from Sisense. After you have built your dashboard, you will want to be able to easily find it again. Topics are a great way to organize dashboards in one place and find them easily. You can add a topic by clicking the add to topics icon in the top right of the dashboard. A dashboard can be added to more than one topic that it is relevant for. Some topics include Finance, Marketing, Sales, Product, Engineering, and Growth to name a few.

Trusted Data Solution

See overview at Trusted Data Framework

Kindly refer the dbt guide examples for details and examples on implementing further tests.

Last modified November 29, 2023: big update (17188382)