Configuration

To configure Crail use the *.template files as a basis and modify it to match your environment. Set the $CRAIL_HOME environment variable to your Crail deployment’s path.

cd $CRAIL_HOME/conf
mv crail-site.conf.template crail-site.conf
mv crail-env.sh.template crail-env.sh
mv core-site.xml.template core-site.xml
mv slaves.template slaves

Note: Docker containers can be configured by using config files above. However it is only recommended for complex configurations. See Docker for details.

The purpuse of each of these files are:

  • crail-site.conf: Configuration of the file system, data tiers and RPC

  • crail-env.sh: Allows to pass additional JVM arguments

  • core-site.xml: Configuration of the HDFS adapter

  • slaves: Used by the start-crail.sh script to ease running Crail on multiple machines

crail-site.conf

There are a general file system properties and specific properties for the different storage tiers. Typical properties you might want to change are:

Property

Default Value

Description

crail.namenode.address

crail://localhost:9060

Namenode hostname and port

crail.cachelimit

1073741824

Size (byte) of client buffer cache

crail.cachepath

/dev/hugepages/cache

Hugepage path to client buffer cache

Advanced properties (Only modify if you know what you are doing):

Property

Default Value

Description

crail.directorydepth

16

Maximum depth of directory tree

crail.tokenexpiration

10

Seconds write token is valid

crail.blocksize

1048576

Size (byte) of block

crail.user

crail

Username used for HDFS adapter

crail.debug

false

Enable debug output

crail.statistics

true

Collect statistics

crail.rpctimeout

1000

RPC timeout in milliseconds

crail.datatimeout

1000

Data operation timeout in milliseconds

crail.buffersize

1048576

Size (byte) of buffer (buffered stream)

crail.slicesize

524288

Size (byte) of slice (transfer unit)

crail.singleton

true

Only create a single instance of the FS

crail.regionsize

1073741824

Size (byte) of allocation unit (Cache)

crail.directoryrecord

512

Size (byte) of directory entry

crail.directoryrandomize

true

Randomize iteration of directories

crail.cacheimpl

org.apache.crail.memory.MappedBufferCache

Client buffer cache implementation

crail.namenode.fileblocks

16

File

crail.namenode.blockselection

roundrobin

Block selection algorithm: roundrobin or random

RPC

Crail’s modular architecture allows to plugin different kinds of RPC implementations. The crail.namenode.rpctype property is used to configure the RPC implementation. We currently offer two implementations:

  • A TCP implementation based on narpc (default): org.apache.crail.namenode.rpc.tcp.TcpNameNode

  • A RDMA implementation based on darpc: org.apache.crail.namenode.rpc.darpc.DaRPCNameNode

Logging

To allow shutting down the namenode without loosing data Crail offers namenode logging. It can be enabled by setting a path to the log file with crail.namenode.log.

Note: this feature is experimental and should be used with caution

Storage Tiers

Crail offers multiple types of datanode dependent on your network and storage requirements:

  1. TCP storage tier backed by DRAM (default)

  2. RDMA storage tier backed by DRAM

  3. NVMe over Fabrics storage tier, typically backed by NVMe drives

Crail allows to use multiple storage tier types together, e.g. to store hot data on DRAM and cold data on NVMe, or extend your DRAM by NVMe storage. Storage types can be configured as a comma separated list by setting the crail.storage.types property:

  1. TCP: org.apache.crail.storage.tcp.TcpStorageTier

  2. RDMA: org.apache.crail.storage.rdma.RdmaStorageTier

  3. NVMf: org.apache.crail.storage.nvmf.NvmfStorageTier

Each of the storage types in the list defines a storage class, starting from storage class 0. Types can appear multiple times to allow defining multiple storage classes for a type. The maximum number of storage classes needs to be specified with the crail.storage.classes property (default = 1). In the default configuration storage classes are used in incremental order, i.e. storage class 0 is used until no more space is left then storage class 1 is used and so on. However filesystem nodes (e.g. files) can also be created on a particular storage class and can be configured to inherit the storage class of its container. The default storage class of / is 0 however it can be configured via crail.storage.rootclass.

Storage tiers send keep alive messages to the namenode to indicate that they are still running and no error has occured. The interval in which keep alive message are send can be configured in seconds with crail.storage.keepalive.

Some of the configuration properties can be set via the command line when starting a storage tier. Refer to Run for details.

TCP Tier

The TCP storage tier (org.apache.crail.storage.tcp.TcpStorageTier) is backed by DRAM. The following properties can be set to configure the storage tier:

Property

Default Value

Description

crail.storage.tcp.interface

eth0

Network interface to bind to

crail.storage.tcp.storagelimit

1073741824

Size (Bytes) of DRAM to provide, multiple of allocation size

crail.storage.tcp.datapath

/dev/hugepages/data

Hugepage path to data

Advanced properties:

Property

Default Value

Description

crail.storage.tcp.port

50020

Port to listen on

crail.storage.tcp.allocationsize

crail.regionsize

Allocation unit

crail.storage.tcp.queuedepth

16

Data operation queue depth (single connection)

crail.storage.tcp.cores

1

Threads to process requests

RDMA Tier

The RDMA storage tier (org.apache.crail.storage.rdma.RdmaStorageTier) is backed by DRAM. The following properties can be set to configure the storage tier:

Property

Default Value

Description

crail.storage.rdma.interface

eth0

Network interface to bind to

crail.storage.rdma.storagelimit

1073741824

Size (Bytes) of DRAM to provide; multiple of allocation size

crail.storage.rdma.datapath

/dev/hugepages/data

Hugepage path to data

Advanced properties:

Property

Default Value

Description

crail.storage.rdma.port

50020

Port to listen on

crail.storage.rdma.allocationsize

crail.regionsize

Allocation unit

crail.storage.rdma.localmap

true

Use mmap if client is colocated with data tier

crail.storage.rdma.queuesize

32

Data operation queue depth (single connection)

crail.storage.rdma.type

passive

Operation type: passive or active (see DiSNI)

crail.storage.rdma.persistent

false

Allow restarting a data tier if namenode logging is used

crail.storage.rdma.backlog

100

Listen backlog

crail.storage.rdma.connecttimeout

1000

Connect timeout in milliseconds

NVMf Tier

The NVMf storage tier (org.apache.crail.storage.nvmf.NvmfStorageTier) is typically backed by NVMe drives. However some target implementations support using any block device. Unlike the RDMA and TCP storage tier the NVMf storage tier is not involved in any data operation but only is used to provide metadata information. Crail uses the jNVMf library to connect to a standard NVMf target to gain metadata information about the storage and provide the information to the namenode. Clients directly connect to the NVMf target. Crail has been tested to run with the Linux kernel, SPDK and Mellanox ConnectX-5 offloading target.

The following properties can be set to configure the storage tier:

Property

Default Value

Description

crail.storage.nvmf.ip

localhost

IP/hostname of NVMf target

crail.storage.nvmf.port

50025

Port of NVMf target

crail.storage.nvmf.nqn

nqn.2017-06.io.crail:cnode

NVMe qualified name of NVMf controller

crail.storage.nvmf.namespace

1

Namespace of NVMe device

crail.storage.nvmf.hostnqn

<random 128bit UUID>

NVMe qualified name of host

Advanced properties:

Property

Default Value

Description

crail.storage.nvmf.allocationsize

crail.regionsize

Allocation unit

crail.storage.nvmf.queueSize

64

NVMf submission queue size

crail.storage.nvmf.stagingcachesize

262144

Staging cache size (byte) for read-modify-write operations

crail-env.sh

Modify crail-env.sh to pass additional JVM arguments to crail respectively start-crail.sh.

It is recommended to increase heap (e.g. -Xmx24g) and young generation heap size (e.g. -Xmn16g) for the namenodes and TCP datanodes to improve performance for large deployments.

core-site.xml

To configure the HDFS adapter modify core-site.xml. For example the Crail shell crail fs uses the HDFS adapter thus requiring the core-site.xml file to be setup. Modify fs.defaultFS to match crail.namenode.address in crail-site.conf. The default is:

<property>
  <name>fs.defaultFS</name>
  <value>crail://localhost:9060</value>
</property>

slaves

The slaves file can be used to ease starting Crail on larger deployments. Refer to Run for details. Each line should contain a hostname where a storage tier is supposed to be started. Make sure the hostname allows passwordless ssh connections. Note that the hostnames are not used by the storage tier itself but only by the start/stop-crail.sh scripts to start and stop storage tiers. IP/hostname of the storage tiers or any other configuration option are either passed by command line arguments or via crail-site.conf. Command line arguments can be configured in the slaves file following the hostname.