Software Development

Databricks cross-workspace administration with Powershell and WPF

For quite some time now, I have worked in Data Analytics platforms in the cloud, where Databricks played a major role. Databricks is a cross-cloud product (AWS, Azure, GCP) that was developed by the creators of Apache Spark, and provides a user interface where notebooks can be shared and clusters managed. The Premium version, which is widely used, also includes, among other features, user management and role-based access controls (RBAC) for notebooks, clusters, jobs and tables.

Even a minimalistic Databricks setup would include at least 3 workspaces: Dev, Test and Prod. In practice there are way more workspaces created, as some will be specifically set up for Engineers, others for Data Scientists , Analysts, all that across all environments. A dozen different workspaces spread across all Cloud environments and subscriptions is pretty common.

Challenges & Solutions

With that in mind, challenges quickly arise on how to administer the different user groups and audit all these workspaces, as Databricks’ own UI is workspace-specific. This is where Powershell scripts can help. However, command-line scripts can only go so far. What would be needed is a cross-workspace GUI to help with administration and oversight. Enter WPF (Windows Presentation Foundation), where the UI is written in XAML, and the various widgets (WPF controls) can be dragged & dropped using Visual Studio’s graphical designer. All that is needed then is to integrate Powershell with WPF/XAML, while keeping code and GUI separate:

In the XAML file, we use the x:Name attribute. For example, to process an ‘Add’ button:

<Button x:Name="Add" Content="Add" --/>

In the Powershell file, we’ll generate a variable named var_Add, to process the Add button event:

# ----------------WPF-XAML WINDOW UI----------------------------------

Add-Type -AssemblyName PresentationCore,PresentationFramework,WindowsBase,system.windows.forms

$xamlFile = '.\files\MainWindow.xaml'

#create window
$inputXML = Get-Content $xamlFile -Raw -Force
$inputXML = $inputXML -replace 'mc:Ignorable="d"', '' -replace "x:N", 'N' -replace '^<Win.*', '<Window'
[XML]$xaml = $inputXML

#Read XAML
$reader = (New-Object System.Xml.XmlNodeReader $xaml)
try {
    $window = [Windows.Markup.XamlReader]::Load( $reader )
} catch {
    Write-Warning $_.Exception
    throw
}

# Create variables based on form control names.
# Variable will be named as 'var_<control name>'

$xaml.SelectNodes("//*[@Name]") | ForEach-Object {
    #"trying item $($_.Name)"
    try {
        Set-Variable -Name "var_$($_.Name)" -Value $window.FindName($_.Name) -ErrorAction Stop
    } catch {
        throw
    }
}
Get-Variable var_*  | Out-Null

Same logic works for all WPF controls that we would want to process with Powershell. Then we would handle the ‘Add’ button click event this way:

$var_Add.Add_Click( {

# process here

})

Tool design

Now that we know how to create a Powershell-based UI tool, what is left is to use the Databricks REST API, either directly, or through already existing Databricks Powershell modules that call the REST API.

We end up with a tree-like design that goes from retrieving the Cloud subscriptions, to listing all the Databricks workspaces, groups, users, clusters, etc. For the Azure Cloud, it will look like the following:

This design is the basis for creating DBX-Admin, a Databricks cross-workspace administration and auditing tool:

DBX-Admin Main Window

The tool has been open-sourced and is available on GitHub . Fell free to use/contribute.

Published on Java Code Geeks with permission by Tony Sicilian, partner at our JCG program. See the original article here: Databricks cross-workspace administration with Powershell and WPF

Opinions expressed by Java Code Geeks contributors are their own.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button