Azure Databricks
This guide take you through how you can forward your logs from an Azure Databricks cluster to Apica Ascent. Before you proceed with this setup, ensure that you meet the following prerequisites.
Private VNI
An Azure Databricks cluster in private VNI
Apica Ascent endpoint
Configuring your Databricks cluster to forward logs
To configure your Azure Databricks cluster to forward logs to your Apica Ascent endpoint, do the following.
Navigate to the Compute section on your Azure portal.
Click Create Cluster.
Choose your cluster size.
Click Advanced options > SSH. Paste your public key under SSH public key. You can generate a public key by running the command
ssh-keygen -t rsa -b 4096 -C "email-id”
. You will use the private key to log into the machine later on.
Next, on the Azure portal, under Network security group, add port
2200
in the Inbound ports section for the machines that the Databricks cluster spun up.
Installing and configuring Fluent Bit
To install and configure Fluent Bit on your Databricks cluster, do the following.
Log into the machine using the following command.
ssh ubuntu@machine-ip -p 2200 -i <private_key_file_path>
Install Fluent Bit as per the version of Ubuntu OS running on the machine. For detailed installation instructions, refer to the Fluent Bit documentation.
Use the following Fluent Bit configuration file.
[SERVICE]
Flush 1
Parsers_File /etc/td-agent-bit/parsers.conf
Log_Level debug
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/driver/stdout*
Tag driver-stdout
Buffer_Max_Size 1MB
Ignore_Older 5m
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/driver/*.log
Tag driver-log4j
Buffer_Max_Size 1MB
Ignore_Older 5m
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/driver/stderr*
Tag driver-stderr
Buffer_Max_Size 1MB
Ignore_Older 5m
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/eventlog/*/*/eventlog
Tag eventlog
Buffer_Max_Size 1MB
Ignore_Older 5m
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/executor/*/*/stdout*
Tag executor-stdout
Buffer_Max_Size 1MB
Ignore_Older 5m
[INPUT]
Name tail
Path /dbfs/cluster-logs/*/executor/*/*/stderr*
Tag executor-stderr
Buffer_Max_Size 1MB
Ignore_Older 5m
[FILTER]
Name record_modifier
Match driver-stdout
Record AppName driver-stdout
[FILTER]
Name record_modifier
Match eventlog
Record AppName eventlog
[FILTER]
Name record_modifier
Match driver-stderr
Record AppName driver-stderr
[FILTER]
Name record_modifier
Match driver-log4j
Record AppName driver-log4j
[FILTER]
Name record_modifier
Match executor-stdout
Record AppName executor-stdout
[FILTER]
Name record_modifier
Match executor-stderr
Record AppName executor-stderr
[FILTER]
Name record_modifier
Match *
Record cluster_id Linux
Record linuxhost ${HOSTNAME}
Record namespace Databrick-worker
[FILTER]
Name modify
Match *
Rename ident AppName
Rename procid proc_id
Rename pid proc_id
[FILTER]
Name parser
Match *
Key_Name data
Parser syslog-rfc3164
Reserve_Data On
Preserve_Key On
[OUTPUT]
name stdout
match *
[OUTPUT]
name http
match *
host <Logiq endpoint>
port 443
URI /v1/json_batch
Format json
tls on
tls.verify off
net.keepalive off
compress gzip
Header Authorization Bearer <TOKEN>
In the Fluent Bit configuration file above, substitute the following details based on your implementation.
logiq-endpoint
TOKEN
Databricks-worker
Next, replace the existing configuration at
/etc/td-agent-bit/td-agent-bit.conf
with the modified file.Finally, restart Fluent Bit by running the following command.
systemctl restart td-agent-bit
Now, when you log into your Apica Ascent UI, you should see the logs from your Azure Databricks cluster being ingested. See the Explore Section to view the logs.
Last updated
Was this helpful?