# Dremio <-> Lake Formation Demo

Starting in Dremio EE 19.0, Dremio can honor permissions configured directly in AWS Lake Formation for AWS Glue sources. This demo walks you through the following steps:

* Step 0: Bootstrapping a basic IdP
  * This demo assumes that you are starting from scratch
  * Feel free to skip this section if you already have an available IdP service (e.g. AAD)
* Step 1: Configure Dremio's LDAP connection
* Step 2: Connect the IdP to AWS using SAML
* Step 3: Set up permissions in Lake Formation
* Step 4: Connect Dremio to your Glue/Lake Formation source
* Step 5: Explore
* Step 6: Cleanup

Note: Dremio's Lake Formation permission integration requires Dremio EE (Enterprise Edition)

Note: Dremio currently only supports database- and table-level access permissions. Fine-grained Lake Formation permissions like row- and column-level permissions are not yet supported by Dremio.

Note: This demo uses passwords that start with `changeme` in several places. Please choose secure passwords instead.

## Step 0: Bootstrapping a basic IdP

This demo assumes that you are starting from scratch. Feel free to skip this section if you already have an available IdP service (e.g. AAD) that supports LDAP and SAML.

This section sets up and configures the following services:
* LDAP server using OpenLDAP
* Keycloak IdP to enable SAML connection with AWS
* Postgres database for Keycloak persistance

### 0.0: Create a VM for your IdP

Both Dremio and AWS will need to communciate with the IdP server, so it is recommended to do these steps on an externally accessible machine. A small cloud VM is a great candidate. Testing was performed with an `e2-medium` (2 vCPU, 4 GB RAM) size Compute Engine instance from GCP costing ~$0.03/hour.

Clone this repository onto the VM.

### 0.1: Start Docker images using provided docker-compose.yaml

```sh
docker compose up -d
```

### 0.2: Bootstrap OpenLDAP with users and groups

```sh
docker exec -it dremio-lake-formation-demo_openldap_1 bash
# Inside the container:
cd /bootstrap
./bootstrap.sh
exit
```

### 0.3: Synchronize Keycloak Users with OpenLDAP

1. Open browser to Keycloak `http://<KEYCLOAK_IP_OR_HOSTNAME>:8080/auth`
2. Click "Administration Console"
3. Login
   1. Username: `admin`
   2. Passwored: `changeme-keycloak`
4. Click "Master" (realm dropdown)
5. Click "Add realm"
6. Name the realm `dremio`
7. Click "Create"
8. Click User Federation
9. Choose "ldap" from dropdown
10. Apply settings:

    Setting | Value
    -|-
    Vendor | `Other`
    Username LDAP attribute | `cn`
    RDN LDAP attribute | `cn`
    UUID LDAP attribute | `uid`
    User Object Classes | `inetOrgPerson, organizationalPerson`
    Connection URL | `ldap://openldap:1389`
    Users DN | `ou=users,dc=example,dc=org`
    Bind DN | `cn=admin,dc=example,dc=org`
    Bind Credential | `changeme-ldap`

11. Optionally, click "Test connection" and "Test authentication" to validate settings
12. Click "Save"
13. Click "Synchronize all users"
14. Optionally, set periodic "Sync Settings"

### 0.4: Synchronize Keycloak Groups with OpenLDAP

15. Continuing from the LDAP User Federation page
16. Click "Mappers" tab
17. Click "Create"
18. Apply settings:

    Setting | Value
    -|-
    Name | `group-ldap-mapper` (can be anything)
    Mapper Type | `group-ldap-mapper`
    LDAP Groups DN | `ou=groups,dc=example,dc=org`
    Group Name LDAP Attribute | `cn`
    Group Object Classes | `groupOfNames`

19. Click "Save"
20. Click "Sync LDAP Groups to Keycloak"

## Step 1: Configure Dremio's LDAP connection

### 1.0: Stop Dremio

```sh
bin/dremio stop
```

### 1.1: Edit `dremio.conf`

Add the following settings:

* `services.coordinator.web.auth.type: "ldap"`
* `coordinator.web.auth.ldap_config: "ad.json"`

### 1.2: Create `ad.json` file

Put `ad.json` in the same directory as `dremio.conf`

```json
{
    "connectionMode": "PLAIN",
    "servers": [
        {
            "hostname": "<LDAP_IP_OR_HOSTNAME>",
            "port": 1389
        }
    ],
    "names": {
        "baseDN": "dc=example,dc=org",
        "bindDN": "cn=admin,dc=example,dc=org",
        "bindPassword": "changeme-ldap",
        "userFilter": "(&(objectClass=inetOrgPerson))",
        "userAttributes": {
            "baseDNs": [
                "ou=users,dc=example,dc=org"
            ],
            "searchScope": "SUB_TREE",
            "firstname": "cn",
            "id": "cn",
            "lastname": "sn",
            "email": "cn"
        },
        "userGroupRelationship": "GROUP_ENTRY_LISTS_USERS",
        "groupEntryListsUsers": {
            "userEntryUserIdAttribute": "dn",
            "groupEntryUserIdAttribute": "member"
        },
        "groupDNs": [
            "CN={0},ou=groups,dc=example,dc=org"
        ],
        "groupFilter": "(objectClass=groupOfNames)",
        "autoAdminFirstUser": true
    }
}
```

### 1.3: Verify that you can log into Dremio using the admin account and userXX accounts

```sh
bin/dremio start
```

1. Open browser to Dremio `http://<DREMIO_IP_OR_HOSTNAME>:8080`
2. Login as admin
   * Username: `admin`
   * Password: `changeme-ldap`
3. Login as one or more users (user00 through user99)
   * Username: `user00`
   * Password: `changeme`

Admin user will have admin permissions throughout Dremio, user accounts will have basic access only.

## Step 2: Connect your IdP to AWS using SAML

1. Download `descriptor.xml` metadata from `http://<HOSTNAME_OF_KEYCLOAK>:8080/auth/realms/dremio/protocol/saml/descriptor` (or from your existing IdP)
2. Login to AWS Console
3. Open IAM Service
4. Click "Identity providers"
5. Click "Add provider"
6. Use the default SAML type
7. Give the provider a name
   1. Remember this name, it is used later as `<PROVIDER_NAME_IN_AWS>`
8. Upload the `descriptor.xml` file
9.  Click "Add provider"

## Step 3: Set up permissions in Lake Formation

### 3.0: Create a table

If you don't already have tables in AWS Glue / Lake Formation, you can create one or more now.

1. Still in AWS Console, open Lake Formation Service
1. Click "Tables"
2. Click "Create table"
3. Fill in settings as desired
   1. Create a Database if needed
   2. Create an S3 bucket if needed

### 3.1: Add permissions

1. Still in AWS Console, open Lake Formation Service
2. Click "Data permissions"
3. Click "Grant"
4. Apply settings:
    Setting | Value
    -|-
    Principals | SAML users and groups
    SAML and Amazon QuickSight users and groups | `arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/user00` OR `arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/group0` (Note: `userX0` through `userX9` are members of `groupX` for `X = [0,9]`)
    LF-Tags or catalog resources | Named data catalog resources
    Databases | <DATABASE_NAME>
    Tables | <TABLE_NAME>
    Table and column permissions | `Select` and/or `Super` (All), up to you

## Step 4: Connect Dremio to your Glue/Lake Formation source

1. Open browser to Dremio `http://<DREMIO_IP_OR_HOSTNAME>:8080`
2. Click the `+` button next to "Data Lakes"
3. Click "Amazon Glue Catalog"
4. Fill out the "General" tab including Name and Authentication
5. Click "Advanced Options" tab
   1. Enable "Enforce AWS Lake Formation access permissions on datasets"
   2. Fill in the user and group prefix settings per [Lake Formation Permissions Reference](https://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-reference.html)
      * For this demo, use SAML:
      * User prefix: `arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/`
      * Group prefix: `arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/`
6. Optionally, under the "Privileges" tab
   1. Enable "Select" privileges for "All Users". This will allow other users, not just the admin account, to access this AWS Glue source.
7. Click "Save"

## Step 5: Explore

1. Open browser to Dremio `http://<DREMIO_IP_OR_HOSTNAME>:8080`
2. Login as admin or one of the user accounts (user00 through user99)
   * Note: `userX0` through `userX9` are members of `groupX` for `X = [0,9]`
3. Click on the AWS Glue source that was added
4. Explore the table(s) that you have created, noting that Lake Formation permissions are enforced
5. Try logging in as other users as well

## Step 6: Cleanup

Don't forget to shutdown or delete your IdP VM when you are done using it to reduce costs.