Installation & Configuration
Here you will find the instructions for setting up this project.
These section is not finalized & is likely to change.
Setup
...[TO DO]...
Required Variables
This package assumes that you have an existing DBT project with a BigQuery profile and a BigQuery GCP instance available with GA4 event data loaded. Source data is located using the following variables which must be set in your dbt_project.yml
file.
vars:
project: '<gcp_project>'
dataset: '<ga4_dataset>'
start_date: 'YYYYMMDD'
frequency: 'daily'
Project
The GCP project where your GA4 data is located.
vars:
project: '<gcp_project>'
Dataset
The GCP dataset where your GA4 data is located.
vars:
dataset: '<ga4_dataset>'
Start Date
The date to start pulling data from.
vars:
start_date: 'YYYYMMDD'
Frequency
Set the frequency of the data export. This should match the frequency of the export configured in GA4.
Options:
daily
: Daily export.streaming
: Streaming export.daily+streaming
: Appends today's intraday data to daily data.
vars:
frequency: 'daily'
If you don't have any GA4 data of your own, you can connect to Google's public data set with the following settings:
vars:
project: 'bigquery-public-data'
dataset: 'ga4_obfuscated_sample_ecommerce'
start_date: '20210120' # Only using data from 2021-01-20 for testing purposes.
Find more info about the GA4 obfuscated dataset here.
Optional Variables
These variables are also NOT finalized & are LIKELY to change.
vars:
conversion_events: ['download_gated', 'download_ungated', 'form_submit', 'search', 'social_click']
consideration_events: ['click', 'cta_click', 'navigation_click', 'view_search_results']
excluded__events: ['session_start']
excluded__event_params: ['ga_session_id', 'page_location', 'ga_session_number', 'session_engaged']
excluded__columns: ['event_previous_timestamp', 'event_bundle_sequence_id', 'event_server_timestamp_offset']
excluded__user_props: []
included__query_params: ['utm_source', 'utm_medium', 'utm_campaign', 'gclid', 'fbclid']
funnel_stages: ['begin_checkout', 'add_shipping_info', 'add_payment_info', 'purchase']
Query Parameter Exclusions
Setting any query_parameter_exclusions
will remove query string parameters from the page_location
field for all downstream processing. Original parameters are captured in a new original_page_location
field. Ex:
vars:
query_parameter_exclusions: ['gclid', 'fbclid', '_ga']
Conversion Events
Specific events can be set as conversions with the conversion_events
variable in your dbt_project.yml
file. These events will be counted against each session and included in the final mart models. Ex:
vars:
conversion_events: ['purchase', 'download']
Consideration Events
Specific events can be set as considerations with the conversion_events
variable in your dbt_project.yml
file. These events will be counted against each session and included in the final mart models. Ex:
vars:
consideration_events: ['cta_click', 'view_search_results']
Funnel Stages
Set specific events to be stages in a funnel.
vars:
funnel_stages: ['begin_checkout', 'add_shipping_info', 'add_payment_info', 'purchase']
Excluded Events
Exclude specific events from the final tables.
vars:
excluded__events: ['session_start']
Excluded Event parameters
Exclude specific event parameters from the final tables.
vars:
excluded__event_params: ['ga_session_id', 'page_location', 'ga_session_number', 'session_engaged', 'engagement_time_msec', 'entrances', 'page_title', 'page_referrer', 'source', 'medium', 'campaign', 'debug_mode', 'term', 'clean_event', 'value', 'tax', 'coupon', 'promotion_name', 'transaction_id']
Excluded Columns
Exclude specific default columns from the final tables.
vars:
excluded__columns: ['event_previous_timestamp', 'event_bundle_sequence_id', 'event_server_timestamp_offset', 'user_id', 'user_pseudo_id', 'stream_id', 'ga_session_id', 'privacy_info', 'event_dimensions', 'app_info']
Excluded User Properties
Exclude specific user properties from the final tables.
vars:
excluded__user_props: ['logged_in']
Included Query Parameters
Include specific query parameters to be in the final tables.
vars:
included__query_params: ['utm_source', 'utm_medium', 'utm_campaign', 'utm_content', 'utm_term', 'gclid', 'fbclid', 'gclsrc', '_ga']