Farmshare+1

From FarmShare

Revision as of 11:12, 21 November 2016 by Chekh (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This is meant to be a design document for the next version of farmshare. Hopefully we can fill it in with as much design guidance and implementation details as possible. It draws inspiration from our SNI VDI OpenStack system, our FarmVNC scripts and our ProclusVNC scripts.

One key factor that we want to use to drive the design is our user's needs. The system needs to be a certain way because it benefits the users and not because "this is how we do things", or "we've always done it this way".

Discoverability needs to be a higher priority than similarity to previous system. Even if a user has used the system before and they say, "I used to do X in Y way", it's not important to keep it working the same way if they can easily figure out how to do X the new way.

Many of the users are looking for newer software when they log in to FarmShare. So we need to track Ubuntu better, going with the latest Ubuntu 6mo release schedule.

Many of the users are looking for more performance than their laptop. As a baseline, we can take a 2015 MacBook Air or MacBook Pro 13". So when a user logs in to FarmShare+1, they should have available some resources that are greater than their MacBook. With approximately 3000 unique users per month (but only a couple of hundred unique per day?) we need to have 200x the MacBook power in our cluster or something on that order. Probably 10x new R630s would be enough to start (assuming we then add the 10 new Dells we already have). Interestingly, local disk is not a priority, I think, first RAM, then CPU.

If a new macbook is about 2 cores and about 8GB, a 24-core, 128GB RAM machine can support approximately 12 users. So 10 such machines can support approximately 100 simultaneous users, if each is using slightly more than a macbook's worth of resources. Maybe at $5k per standard Dell node, that's $50k.

If I had to rank the user requirements, I would probably order them as: software, resources, discoverability. They need to be able to run something bigger than their laptop, and/or different from their laptop (whether Free Software or proprietary), and then they also need to be able to figure out how to do that.

In terms of minimal overall system size, the corn and barley machines are almost always heavily loaded, so I don't think it makes sense to try to roll out a service that is the same size or smaller. We really want it to be bigger than FarmShare. Computational power is cheap and we should be providing heaps of it to all Stanford people.

After the design meets the user needs, there are also admin needs to consider. Mainly we need to be able to easily build new machines, reboot the machines, manage the configuration of the machines, and also track performance metrics and logs for security. And we're actually really close there, with the main problem being the "technical debt" we have in the form of the oldest software parts of FarmShare: AFS, Stanford puppet modules and the other AFS-based tools (tripwire, log rotation, filter-syslog).

Contents

storage

The new system really hinges on not having AFS home directories. And not having any system utilities or configuration depend on AFS.

It's important not to have the storage system be the same system where user workload is run, as that will not make for a stable system. So we need some additional storage hardware for the new homedirs, in addition to having AFS available.

I think it's very reasonable to shoot for 10GB quota per user default. At 3k active users, that's only about 30TB of allocated storage, especially as not everyone will bump up against the quota limit. But we probably want to think something like 100 IOPs per user (if we're trying to equate to an old laptop), or actually more like an SSD's worth of IOPS per user if we're trying to match a modern MacBook. So we're talking 30TB but at say 0.3M IOPS? Unfortunately nothing cheap comes to mind for those requirements. If budget is the main issue, we have to give up the high-availability requirement.

Let's face it, we haven't had an HA fileserver for FarmShare in many years, and things have held up OK, with only a couple of downtimes. So a SuperMicro based ZFS / NFS box with a few SSDs, could probably match our 0.3M IOPS (cached) and 30TB usable performance requirement, for about $20k. And without requiring a separate backend storage network.

network

If we're getting new hardware, we can start all over with the network. Provision a whole new VLAN and IP range and probably keep the central firewall (with a new vsys). The key is not new network hardware, but just a new VLAN/IP range that fits all the farmshare machines.

I think it's OK to put the IPMI controllers on the same VLAN as the public IPs, just give them shadow net IPs. At some point there was a lot of concern about users from FarmShare having network access to the IPMI controllers for the machines, but I think that concern is overblown. If that's a requirement, then we need separate management network, which costs more.

misc

Logging: throw out the entire old logging infrastructure. Start with modern default Ubuntu rsyslog config and just add a Splunk destination.

Logwatch and "root mail" aka sysadmins keeping an eye on stuff: throw out all the filter-syslog / newsyslog / AFS stuff and just run logwatch or make some Splunk dashboards.

Tripwire: throw out the entire existing tripwire infrastructure from 15 years ago, and go with modern OSSEC, if necessary.

batch jobs: Maybe we need to move to SLURM. Should be fine. The only questions is which execution hosts. Can we spec new ones? The main challenge with using existing barleys is the network hardware and config.

configuration management: we can use the Stanford central puppet infrastructure, but we'd have to start over from scratch without the existing Stanford modules in order to meet the logging/tripwire/splunk goals above, as most of them depend on AFS. This is where the challenge is; I'd estimate build system + configuration management effort is about half a man year total.

package management

We need to have the regular upstream Ubuntu repos. Then, we need to have an easy way to add additional repos and the packages from those. The Stanford repo is just an example of the latter. Then, we need a way to add packages outside the OS, so that would be a separate /share/sw (or similar) tree. Just like we have /farmshare/software now. Perhaps we standardize on lmod + easybuild + fpm(?) for the latter.

Ruth's suggestions

  • re-branding, maybe we call is SRCC "student environment" or something
  • hand off cardinal systems to another group, leave those as is.
  • retire ryes if we can just put a GPU into each new machine
  • is it still the case that all fully-sponsored sunetids need access to this system? Or should it be more limited
  • can we get a faculty committee involved in scheduling/policy determinations
  • can we also add Hadoop capability?

Naras's suggestions

  • self-service provisioning of corn-like resources (OS, auth and AFS, not licensed software)
  • should allow for separate billing for dedicated resources for e.g. a class

2015-06 Alex notes

farmshare sessions

guiding principles: lightweight desktop - maybe https://www.gnu.org/software/guix/manual/html_node/System-Installation.html

limited service options - just manage sessions and access files

no AFS

5yr project lifetime, need to have loose coupling with hardware/network

example user session: ssh farmshare-session-manager aka fsm Duo auth limited shell, allows custom commands like:

  • fsm-newsession
  • fsm-listsession
  • fsm-killsession

Similar to current FarmVNC, but with additional limits, maybe start with one session per user.

Then we connect to the "session". Maybe VNC is OK? How can we make it easier without SSH tunneling? Compare to current setups on proclus, farmshare, army? Maybe generate new VNC password for every session

2015-09 Alex notes

Stack is: lubuntu (Linux, KVM, cgroups), SLURM, VirtualGL/TurboVNC: vglrun -> TurboVNC. Use standard TurbuVNC security instructions: https://cdn.rawgit.com/TurboVNC/turbovnc/2.0.x/doc/index.html

We can use wrapper scripts like "start session" to actually be a wrapper around submitting a SLURM job which will pick a node, start a vncserver there and print the instructions for how to connect to it via SSH-tunneled VNC.

Testing proposal: single GCE or AWS host with Ubuntu + Lubuntu packages, + local SLURM single-node (same node master, db, exec host config) and then one can test this thing.

user experience

Here's how I envision the user experience. First, the user needs an SSH client, and they use that SSH client to connect to a gateway/orchestrator machine. That system only allows them to manage their sessions but not actually do any work. Perhaps it could even not mount their homedirs, but drop them into an empty one. The system will require two-factor auth, maybe even have some minimal PATH settings so not very many commands are there. The user can start/stop/check sessions. Minimally, we can start with one session max, of a fixed size. Second, the user needs to have TurboVNC installed, and then in their SSH they will get a printout of the command to run on their local machine that will allow them to connect to that VNC session which started up somewhere. We may need to generate a new VNC password each time, and also use pam_slurm to limit SSH logins. The session is actually an sbatch thing underneath that runs a fixed job script and then e-mails the user the instructions when they start up.

We actually already have pretty much this implementation: https://web.stanford.edu/group/proclus/cgi-bin/mediawiki/index.php/VNC But just need to make it more "user friendly"


What if we skip the whole session starting thing and just launch them based on attempted connections? I guess that's too fraught with danger. Well, what about making it as part of a startup script?

2015-10 Alex notes

systemd-cgls https://wiki.freedesktop.org/www/Software/systemd/ControlGroupInterface/

systemctl set-property httpd.service CPUShares=500 MemoryLimit=500M

'systemd-run --scope' may be used to easily launch a command in a new scope unit from the command line.

systemd-run --scope -p MemoryLimit=1G --description=test --uid=chekh tmux new-session -d


farmvnc:
1) start up Xorg with config and display number and log file
2) start up VNC server pointing at that display
3) start up gnome session in that display


On 9/29/15 2:06 PM, Alex Chekholko wrote:
> Simple example:
> https://schnouki.net/posts/2013/12/19/resource-control-with-systemd/
> 
> See cgroups:
> 
> systemctl status whatever -l
> look for cgroups section
> 
> with ps: ps xawf -eo pid,user,cgroup,args
> 
> https://linuxaria.com/article/how-to-manage-processes-with-cgroup-on-systemd 
> 
> 
> in the config file, in the [service] section
> MemoryLimit=XG
> 
> more detail: https://www.kernel.org/doc/Documentation/cgroups/memory.txt
> 
> Test:
> create slice
> set memory limit for slice
> launch process inside a slice
> 

Personal tools
Toolbox
LANGUAGES