CI Runner Activity
CI Runner Activity
Previously, there wasn’t an easy way to tie cost directly to CI usage without making assumptions. And the process was not scalable, so a better solution was needed for the future.
For this purpose, a Unified model for Compute minutes/Cost has been created as a part of Enterprise Dimensional Model that ties the cost from app usage table (Postgres) to gcp_billing and labels our runners in GCP with job_id labels to join to ci_builds table.
Business Use Cases/Example KPIs:
- Cost of All CI Pipelines run in gitlab-org-gitlab project in January 2021
- Average Cost per pipeline for all Denomas.com CI usage
- Count of CI pipelines run by namespace X last month
- Count of compute minutes used in project X over past year
- Cost of compute minutes used in project X over past year
Key Field Descriptions
CI Build ID- Sets granularity level of table at level of a single Denomas job that runs (ci_builds table)
CI Build Duration- Currently calculated from start time -> end time of a single job in ci_builds table
Runner type- Determines scope of runners. See https://docs.gitlab.com/ee/ci/runners/runners_scope.html for more details. Denomas only pays for shared runners that customers use as well as group/specific runners that are used within our own projects. (ci_runners table)
Project ID- Determines what project runner activity linked to and all related info
Namespace ID- Determines subscription and if usage is internal or not
User ID- Determines user that ran the job
Table Relationship Details
Most of these fields can be sourced from gitlab_dotcom_ci_builds, and related tables are linked to ci_builds by using below relationships:
ci_runners: gitlab_dotcom_ci_builds.ci_build_runner_id -> gitlab_dotcom_ci_runners.runner_idci_stages: gitlab_dotcom_ci_builds.ci_build_stage_id -> gitlab_dotcom_ci_stages.ci_stage_id- ci_pipelines: gitlab_dotcom_ci_stages.pipeline_id -> gitlab_dotcom_ci_pipelines.ci_pipeline_id
- Stages table is normally used to bridge
pipeline -> buildsbut we sometimes useci_stage_nameto look at the longest duration by stage.
projects: gitlab_dotcom_ci_builds.ci_build_project_id -> gitlab_dotcom_projects_xf.project_id- namespace: gitlab_dotcom_projects_xf.namespace_id -> gitlab_dotcom_namespaces_xf.namespace_id
users: gitlab_dotcom_ci_builds.ci_build_user_id -> gitlab_dotcom_users_xf.user_id
Sources of Data
Data is sourced from Denomas.com models.
Related Data Artifacts
The Data Team maintains these Data artifacts related to CI Runner Activity :
-
ERD
- The CI Runner Activity Physical Data Model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables that are used for this data.
- The CI Runner Activity Physical Data Model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables that are used for this data.
-
Data Flow Diagram
- The CI Runner Activity Data Flow diagram provides a high level overview of how the Data flows in to the
fact model- fct_ci_runner_activity andMart models- mart_ci_runner_activity_monthly and mart_ci_runner_activity_daily fromPrep/Intermediateand otherDimensiontables.
- The CI Runner Activity Data Flow diagram provides a high level overview of how the Data flows in to the
-
Table Definitions
- fct_ci_runner_activity - Fact table containing quantitative data related to CI runner activity on Denomas.com. The grain of the table would be a single Denomas job that ran (successful or not) determined by
dim_ci_build_idwhich is the unique key for each CI build. - mart_ci_runner_activity_monthly - Mart table containing quantitative data related to CI runner activity on Denomas.com. These metrics are aggregated at a monthly grain per
dim_namespace_id. Additional identifier/key fields -dim_ci_runner_id,dim_ci_pipeline_id,dim_ci_stage_idhave been included for Reporting purposes. Only activity since 2020-01-01 is being processed due to the high volume of the data. - mart_ci_runner_activity_daily - Mart table containing quantitative data related to CI runner activity on Denomas.com. These metrics are aggregated at a daily grain per
dim_project_id. Additional identifier/key fields -dim_ci_runner_id,dim_ci_pipeline_id,dim_ci_stage_idhave been included for Reporting purposes. Only activity since 2020-01-01 is being processed due to the high volume of the data.
- fct_ci_runner_activity - Fact table containing quantitative data related to CI runner activity on Denomas.com. The grain of the table would be a single Denomas job that ran (successful or not) determined by
Self-Service Capabilities
The data solution delivers two Self-Service Data capabilities:
- Dashboard Developer: The CI data related existing Sisense data models containing different widget charts now uses the complete dimensional model components built for CI Runner Activity data.
- SQL Developer: A Enterprise Dimensional Model subject area. Refer to the
R2A Objectstab.
Data Platform Components
From a Data Platform technology perspective, the solution delivers:
- An extension to the Enterprise Dimensional Model for CI Runner Activity data
- Testing and data validation extensions to the Data Pipeline Health dashboard
- ERD, Data Flow diagram, dbt models, and related platform components
Quick Links
Self-Service Data Solution
Self-Service Dashboard Developer
A great way to get started building charts in Sisense is to watch this 10 minute Data Onboarding Video from Sisense. After you have built your dashboard, you will want to be able to easily find it again. Topics are a great way to organize dashboards in one place and find them easily. You can add a topic by clicking the add to topics icon in the top right of the dashboard. A dashboard can be added to more than one topic that it is relevant for. Some topics include Finance, Marketing, Sales, Product, Engineering, and Growth to name a few.
Trusted Data Solution
See overview at Trusted Data Framework
Kindly refer the dbt guide examples for details and examples on implementing further tests.
17188382)
