This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

January 19, 2014

Novice Example: Remote File Copy, Better Problem Breakdown

In the last post, we spent a small amount of time understanding a problem faced by our hypothetical novice programmer, Ned.

As a reminder, let's revisit Novice Ned's breakdown of the problem.

  1. Launch FTP
  2. Connect to the host
  3. Log in with the foobar user and password
  4. Change to the appropriate remote directory
  5. Put each of the files to the remote machine
  6. Log off from the remote machine
  7. Repeat for all servers

The problem with this breakdown is that it is focused on how we are currently solving the problem. Given that Ned was given a manual process, this breakdown is very manual. To help Ned, we're going to take a different approach to breaking down the problem into its fundamental functional pieces:

  1. Remotely access a machine
  2. Copy files from local machine
  3. ... to remote machine
  4. Repeat the above steps for dozens of machines

The original, manual process requires multiple steps for the each of the first three pieces. This is mostly because of the tool chosen to solve the problem: the FTP command line tool. This problem breakdown is focused on functional requirements of solving the problem, not a particular tool.

In the original process, logging in to the remote machine with FTP is one (multi-step) solution to the task of remotely accessing a machine. If we could find some way to copy files to a remote machine in a more automated way, then maybe we could automate the process of sending to multiple machines. As a first pass, here are some approaches to copying files to a remote system. (I'm leaving out special purpose configuration management tools and novel solutions like netcat. We'll just look at solutions that our novice is likely to be able to master relatively quickly.)

  • Script the ftp session
  • rsync
  • sftp
  • rcp
  • scp

A quick check with Ned shows that he uses SSH to access all of these machines for other purposes. This means that scp is probably a good choice. Since Ned had not seen scp before, we would suggest he spend some time with the man page. The key point is that scp works a lot like the Unix cp command, except it can copy securely to and from a remote machine.

Assuming the files to copy are in the current directory, the command to copy the files is now:

scp foobar.conf input1.txt input2.xml foobar@{server}:data

Where {server} is replaced with each server name, one by one. This still requires us to type a password for each server, but it's much less work than before. (We'll address this problem next time.) To automate this for all of the servers in our list, we just need to write a script.

I'll use bash, because it's what I'm comfortable with. If you are using a different shell, the commands should still be similar. Create the file cpfilestoremotes.sh with a text editor and type in the following (replacing server1, etc. with the correct server names).


#!/bin/bash

scp foobar.conf input1.txt input2.xml foobar@server1:data
scp foobar.conf input1.txt input2.xml foobar@server2:data
scp foobar.conf input1.txt input2.xml foobar@server3:data
scp foobar.conf input1.txt input2.xml foobar@server4:data


I created the four lines through copy and paste. Then I modified the server name (right after the @) to match the different machines. Obviously, I would need a different line for each machine. Make the file executable with:

chmod +x cpfilestoremotes.sh

First Step, Conclusion

At this stage, our solution has a number of problems. A real sysadmin-type would be suggesting tools better suited to mass configuration. A programmer-type would be twitching about the code duplication.

Everyone would have to agree that the need to type your password for each of the machines is suboptimal. Also if you need to change the files to be copied, you will need to do some manual work. Likewise, the list of server names is spread throughout the script file.

On the plus side, the amount of typing Ned needs to do has been vastly reduced. Now all he needs to do is type the name of the shell script and the passwords instead of running through all of the FTP commands one by one. For a large number of servers, this could reduce a multi-hour process down to much less than an hour. It's still boring, but at least it's not boring for as long.

Next time, we'll attack fixing some of the above problems.

Posted by GWade at 11:28 AM. Email comments

January 15, 2014

Novice Example: Copy Files, Describing the Problem

In an earlier entry, I covered the first thing a programming novice needs to know: how to break down a problem. A short while later, I was talking with a guy at our local Perl user group and he presented me with an actual novice problem he was fighting.

After some thought, I realised that a combination of his issue and a problem I worked on about a year ago would make a really good multi-step example for novices of how to break a problem into pieces solvable by the computer.

One thing that I want to make really clear is that this is not necessarily the best solution to the problem. People with more experience will probably be able to suggest many tools or approaches that are more efficient or maintainable. Since I am aiming the example at a programming novice, I'm focusing on a simpler solution that can be modified by the novice.

The (Initial) Problem

Let's assume that we are trying to help a hypothetical novice named Ned solve the problem.

Ned has been given access to a large number of machines that are running some variant of Unix. About once a week, he needs to copy a small number of files to specific directories on each of these machines. These files will be used in some processing that is outside of his control.

To make this simpler to start with, I'm going to define the problem to have 3 files copied to the same directory on every machine. The files are foobar.conf, input1.txt, and input2.xml. Ned has to copy all three files to the directory /home/foobar/data. He has access to the foobar user account on each of these machines.

So, when Ned first got this assignment, there were about 30 machines that he needed to work with and he was shown how to FTP to each machine and put the files where they need to go. Unfortunately, the farm has grown to 100 servers and he knows that's expected to double in the next year. There is no reasonable way for him to do this by hand every week and still keep up with his other duties.

Breaking Down the Problem

The current process is pretty straight-forward, but repetitive. For each server, he needs to

  1. Launch FTP
  2. Connect to the host
  3. Log in with the foobar user and password
  4. Change to the appropriate remote directory
  5. Put each of the files to the remote machine
  6. Log off from the remote machine

He had simplified this a little by looking at the ftp man page and seeing that he could pass the host on the ftp command line to combine steps 1 and 2. Asking around on-line resulted in a lot of programs and terms the Ned does not understand and he's afraid that he'll run down a rabbit hole chasing something that won't help solve the problem.

As an aside, this is a problem often faced by novice programmers when they ask for help from more senior developers. We have a tendency to try to come up with an optimal solution. (Sometimes it's just showing off.) We also often have a lot of knowledge that we can apply to the problem. A more senior developer is used to looking at multiple solutions and making tradeoffs. At this point, all the novice needs is a single solution. By providing lots of options, we make it easy for a novice to become overwhelmed and stop trying.

So, at this point, Ned has hit a roadblock. Each step seems separated into its most fundamental form, but the problem is still very manual. It's not obvious how to eliminate the manual steps, because Ned doesn't have enough of the right kind of information.

Next Steps

Next time, we'll revisit Ned's breakdown of the problem to make it easier to automate. We'll explore some other approaches, and we'll stick with solutions that are understandable with a small amount of work at Ned's level of understanding.

Posted by GWade at 03:04 PM. Email comments