Archive for the 'Projects' category

Enterprise iSCSI storage with OpenSolaris and COMSTAR

October 28, 2009 4:11 pm

The goal of this project is to build enterprise-grade iSCSI storage that is modular enough to meet any iSCSI needs.

I chose OpenSolaris for the flexibility we get from ZFS, which everyone has at least heard of, but also for its Common SCSI Target (COMSTAR) project.   I’ll only be discussing the iSCSI target portion of this project, but I recommend reading more on the capabilities of COMSTAR outside of the iSCSI space.

HARDWARE

Since we talking about building our own storage array, lets look at some hardware options. My personal preference for a chassis is the Supermicro SC846 for the redundant power, an option for two internal disks so that you can use all 24 hot-swap for just storage, and the ability to use 3.5″ and 2.5″ drives.

If you prefer only go for 2.5″ drives you might want to check out SC216 which will also provide 24 disks, but in only 2 Rack Units of space.

The next important decision is the HBA(s) you will be using. Keep in mind that since we’ll be using ZFS for this project, we do not want a RAID card, but instead a JBOD card. Trust me on this on, RAID cards can turn out to be a nightmare, plus JBOD HBA’s are cheaper. My personal preference in HBA for OpenSolaris is the LSI 3081 card for the Fusion-MPT chip. You don’t have to buy LSI brand, but I definitely recommend an HBA with Fusion-MPT. Since these provide 8 SATA ports, you’ll need 3 of them to support 24 disks.

The last major hardware decision is networking. My preference for 1GB is a NIC with the 82571EB chip. Intel offers a single port card, the Intel PRO/1000 MT, and a dual-port version, the Intel PRO/1000 PT.

For 10GB I recommend a NIC with the 82598EB chip. For a dual-port CX-4 version I use the EXPX9502CX4 card or if you prefer dual port SR fiber go with the EXPX9502AFXSR card.

Just to be clear, all of the hardware recommendation I have made I currently use with OpenSolaris servers and are confirmed by Sun to be supported in OpenSolaris.

CONFIGURATION

Let’s start with a fresh install of OpenSolaris.

Mirror OS disk

First, lets mirror our OS disk for added reliability.  In this example, OpenSolaris was install on disk c9d0s0.  Our second OS disk is c10d0s0.

1. Create a solaris disk label on the second disk

# format c10d0s0

Select “fdisk” then “create 100% Standard Solaris Partition over the full Disk”

2. Next, we need to copy the Solaris slice layout from the OS disk to the second disk. (note we use s2, this is very important)

# prtvtoc /dev/rdsk/c9d0s2 | fmthard -s - /dev/rdsk/c10d0s2

3. Next, we’ll attach the mirror disk to the OS zpool

#  zpool attach -f rpool c9d0s0 c10d0s0

4. Last, we need to make the second disk bootable

# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c10d0s0

Static IP
If you don’t want to rely on always getting the same DHCP IP, you’ll probably want to statically configure the IP of your storage server.

First, we need to disable the NetworkAutomagic service

# svcadm disable network/physical:nwam

Next, enable the config file-based networking service

# svcadm enable network/physical:default

Now we must configure the IP statically. This is done by creating a /etc/hostname. file. In this example I’ll use the e1000g0 interface.

 vi /etc/hostname.e1000g0
192.168.1.200

Configure the netmask for the management IP

# vi /etc/netmasks
192.168.1.0 255.255.255.0

Configure the default gateway

# vi /etc/defaultrouter
192.168.1.1

Tell system to use standard file-based DNS

# cp /etc/nsswitch.dns /etc/nsswitch.conf

Now, Configure DNS servers

# vi /etc/resolv.conf
nameserver 192.168.1.4

Configure IP Multi-Pathing (IPMP)
If you went with a dual-port card, or two cards, it’s advisable to use IPMP so that a single link down doesn’t make your iSCSI volumes unaccessible.

In this example I’m using two e1000g interfaces and creating the IPMP interface iscsi0.

# vi /etc/hostname.iscsi0
ipmp group san0 192.168.1.200 up

The primary interface of the IPMP group is e1000g0

# vi /etc/hostname.e1000g0
group san0 -failover up

The backup interface is e1000g1

# vi /etc/hostname.e1000g1
group san0 -failover standby up

Enable COMSTAR
Install stmf (library and service for COMSTAR)

# pkg install SUNWstmf

Now install the iSCSI toolset

# pkg install SUNWiscsit

At this point, reboot your machine before continuing on.

After rebooting, we will enable the stmf service

# svcadm enable stmf

Creating your zpool
I went with a chassis that supports up to 24 disks to build in room for expansion. Based on you needs, you can fill all 24 hot-swap trays with raw storage to be exported as one or more iSCSI volumes, or you can use some to take advantage of some of the performance advantages of creating a hybrid pool.

If you are unfamiliar with the term hybrid pool, I suggest reading up on ZIL and L2ARC. Here are a few links to get you started:
ZIL: SLOG BLOG
L2ARC: ZFS L2ARC

So for purposes of this example, I’ll presume to save four drive bays for SSDs, a pair for ZIL and a pair for L2ARC, leaving 20 disks. We can then use these 20 drive slots for 4 RAIDZ of 5 disks. I’m going to start with configuring one, then I’ll explain how to grow your ZPOOL when adding your second RAIDZ for storage expansion.

Before we go any further, now is a good time to demonstrate two useful commands. The first, we can use devfsadm to scan for newly added disks.

# devfsadm -Cv

Second, we can use the format command to list all recognized disks.

# format < /dev/null

For the first 5 storage disks, mine were recognized on channel 7. I'll create my initial zpool named "iSCSIdisks" as a RAIDZ using all 5 disks.

# zpool create iSCSIdisks raidz c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0

There we go, we now have our storage to start creating iSCSI volumes. I'm going to now create a 20GB zvol (target volume) that will be used as the disk for a virtual machine.

# zfs create -V 20G iSCSIdisks/vm1_hdd

Next, I need to make a LUN (Logical Unit) out of this volume.

# sbdadm create-lu /dev/zvol/rdsk/iSCSIdisks/vm1_hdd

Now that we have create a logical unit, we need to find out the GUID of this volume so that we can provide it to COMSTAR for iSCSI access. Here's how you list all LUNs that have been created.

# sbdadm list-lu

Now, if you don't already have the iscsit server enabled, now would be a good time to do so.

# svcadm enable -r svc:/network/iscsi/target:default

I'm going to create a basic iSCSI target configuration here that leaves this storage wide-open to be accessed by anyone, I suggest you secure yours. To do so, read up on itadm in the man page.

# itadm create-target

You can now see your newly created iSCSI target, and all previously created ones, using the itadm command.

# itadm list-target

You're all set to access this storage remotely.

The last thing I want to come back to is how we will grow our underlying storage as we need to expand. Following the previous example of a 5 disk RAIDZ, I'll just add a second 5 disk RAIDZ to the zpool iSCSIdisks.

Since I have 3 LSI HBAs, each with 8 ports, my next 5 disks will consume the last 3 ports of my first HBA and the first 2 ports of my second one. I plug in the 5 new disks, run "devfsadm -Cv" then run "format < /dev/null" to ensure they have been recognized. Now I'm ready to add them.

# zpool add iSCSIdisks raidz c7t5d0 c7t6d0 c7t7d0 c8t0d0 c8t1d0

And that’s it, your zpool is now grown and ready to be sliced up into more iSCSI targets.

Enjoy your new enterprise iSCSI array, and don’t for get to check out ZIL and L2ARC!

WebVirt status update

February 23, 2009 9:08 pm

As you can probably guess, I’ve been rather busy.

WebVirt has become a very exciting project, so I’ve been spending almost all of my computing time coding. I have posted some screenshots but I have to admit that they’re already a bit dated with some of the new features I’ve implemented.

Current WebVirt can connect to remote libvirt nodes, however only with zero authentication. What this mean is that you’re limited to using a connection string like this:

qemu+tcp://192.168.1.2/?name=qemu:///session

That being said, I’m planning on using the Redhat package python-nss for key creation and management. So that should follow “hopefully” shortly.

Once connected, you can start and destroy (stop) both virtual networks and domains that are currently defined on a remote physical node. I should point out that there is still a bug in libvirt that “may” undefine a network on the remote machine when you destroy it. Undefining is removing the config from the remote node. This does not effect domains, however.

You can also push network and domain configurations created in WebVirt to your remote nodes.

You can toggle whether or not each virtual domain and network is to autostart. This means that when the libvirt daemon is started/restarted on the remote node, these virtuals domains and networks will automatically start.

Lastly, importing previously defined virtual domains and networks from remote nodes.  The elaborate on this feature, let’s say you’re like me. You’re so excited about libvirt, that you’re already using it to manage virtual domains and networks, and you just can’t wait for a full release of WebVirt.

Fear not!  Currently WebVirt can only import the very basic virtual networks and the import of virtuals domains is all but a place holder (only the name and UUID are read as a proof of concept as up CVS check-in 14), this feature is of utmost importance to the project and will be fully implemented in time.

WebVirt now at FedoraHosted

January 26, 2009 9:53 am

WebVirt is the project I started after my recent obsession with Libvirt. Two weekends worth of work and it’s already talking to remote libvirt boxes; if only polling capabilities, defined networks and defined domains.

Since the architecture of the project has been laid out. I determined it was time to allow a little more attention than my website can provide.

Since I’ve been developing it on Fedora and intend on using it with Fedora, RHEL, and CentOS, I went with FedoraHosted.

Go check out the project Trac page: fedorahosted.org/webvirt

Libvirt kinda caught my attention

January 18, 2009 9:14 pm

I’ve been working a lot with running virtual machines on Linux recently. An inevitable stop was to the libvirt project.

I started playing with creating xml configs for creating new domains, networks and storage. I had been planning on 1. learning pythong and 2. playing with django, so this jointly inspired me to start WebVirt, a web-based front-end to libvirt for managing virtual machines.

I spent this weekend completing my first two goals, coding basic xml generation for libvirt resources, GPLing and creating the repo.

Building home OpenBSD router – Part 6

December 6, 2008 6:07 pm

Start at Part 1

The Multi Router Traffic Grapher (MRTG)

Reference: Tobi Oetiker’s MRTG – The Multi Router Traffic Grapher

To borrow a phrase from Tobi Oetiker, “You have a router, you want to know what it does all day long? Then MRTG is for you.” The goal here is to track the actions of the OpenBSD router over time. This practice is important for detecting trends in traffic, helpful for finding bottlenecks, and even identify a baseline to recognize abnormal changes in traffic.

So let’s get on with it. For this example, I use OpenBSD’s MRTG package. I’ll also install two packages required for IPv6, OpenBSD MRTG package requires them:

mschenck ~# sudo pkg_add http://mirror.rit.edu/pub/OpenBSD/4.3/packages/i386/mrtg-2.15.2p1.tgz
mschenck ~# sudo pkg_add http://mirror.rit.edu/pub/OpenBSD/4.3/packages/i386/p5-Socket6-0.19.tgz
mschenck ~# sudo pkg_add http://mirror.rit.edu/pub/OpenBSD/4.3/packages/i386/p5-IO-INET6-2.01p0.tgz

I should point out that you need an snmp daemon running for MRTG to pole stats from your router. The enable this first you must add the following to “/etc/rc.conf.local”:

snmpd_flags=""          # for normal use: ""

This will enable snmpd to automatically start on reboot. In the meantime, lets start it ourself

mschenck ~#  sudo  /usr/sbin/snmpd

By default, OpenBSD’s snmp daemon (snmpd(8)) only listens on localhost and the default community string is “public“.  You can change these settings by modifying ” /etc/snmpd.conf” (see snmpd.conf(5)), however for this example we’ll stick with these default settings.

Now lets get an http daemon running to display these graphs.  OpenBSD come with Apache, lets enable and start it up. Add the following line to /etc/rc.conf.local:

httpd_flags=""          # for normal use: "" (or "-DSSL" after reading ssl(8))

Again, lets avoid the reboot and just start the daemon manually:

mschenck ~# sudo /usr/sbin/httpd

Now, lets backup to original document root for apache and create a new one for displaying our mrtg graphs:

mschenck ~# sudo mv /var/www/htdocs /var/www/htdocs-orig
mschenck ~# sudo mkdir -p /var/www/htdocs/cfg

So now that we have MRTG and, the perl modules it requires, and an snmp daemon running and an http daemon up to display our graphs; we’re ready to start configuring.

mschenck ~# sudo cfgmaker --global 'WorkDir: /var/www/htdocs'  \
          --global 'Options[_]: bits,growright' \
          --output /var/www/htdocs/cfg/mrtg.cfg    \
           public@localhost

Now lets schedule the polling of our NICs’ stats for the mrtg graphs. I’m going to put the task to root’s crontab:

mschenck ~# sudo crontab -u root -e

And then add the following cron schedule:

*/5 * * * *  /usr/local/bin/mrtg /var/www/htdocs/cfg/mrtg.cfg --logging /var/log/mrtg.log

Now, just watch the data start collecting.