I haven't looked at the book link posted above, but I have dealt with digital signal processing recently and I can share how I understand it's approached (and thus spare you the task of trying to piece it together through Wikipedia and bits of Matlab documentation like I did). One possibility is to actually take the Fourier transform as others have suggested. This is only viable if you have all of the data available from the start and the Fourier transform allows you to apply a filter essentially on all the data at once. If you're streaming data, I don't know that that can be made to work. This is where digital filters come in. And lots of math.
We usually operate in the Laplace domain for continuous filters with something called the transfer function. In the most basic sense, our filter has an input, x(t), that's a function of time, and an output, y(t) that's a function of time. If we take the Laplace transform of both of these, we get functions X(s) and Y(s) in the frequency domain (where s is a complex number sometimes referred to as the complex frequency). I don't want to spend too much time on what the Laplace transform is or what it actually does but it is similar to the Fourier transform (in fact, the Fourier transform is the Laplace transform evaluated for values of s along the imaginary axis, so you can think of the Laplace transform as being an expanded Fourier transform). At this point we introduce something called the transfer function, H(s), which we define as H(s) = Y(s) / X(s).
Most filters are described in terms of their transfer function. Roughly speaking, the transfer function basically tells you what the output of the filter would be if you put in a signal with a particular frequency. But don't worry about that too much, this part is really just a means to an end. The first thing we're going to start with is the transfer function of a low-pass filter. It turns out that if we have that, we can create any kind of filter we want through some simple transformations. For the time being, I'm going to assume that the cut-off frequency of our low-pass filter is 1 rad/s. Later on a warping factor will show up that will allow us to adjust this.
But, first off, the transformations from a low-pass transfer function, H
low(s), to other transfer functions:
High pass:
 = H_{low}\left( \frac{1}{s} \right))
Band pass:
 = H_{low}\left( Q ( s + \frac{1}{s} ) \right))
Where the Q-factor is computed from the fractional bandwidth of the filter. This also assumes that the center frequency of the filter (the geometric mean of the two cutoff frequencies) is 1 (more on this later).
Band stop:
 = H_{low}\left( \frac{1}{Q ( s + \frac{1}{s} )} \right))
So, what does a low-pass filter transfer function actually look like? Well, the simplest one that I know of is a
Butterworth low-pass filter (this is actually a first order Butterworth low-pass filter,
higher order filters are possible):
 = \frac{1}{1 + s})
So, we can get the equivalent Butterworth high pass filter:
 = \frac{1}{1 + \frac{1}{s}} = \frac{s}{1 + s})
I'll leave it as an exercise to do the other ones.
Okay, so now we have a transfer function, how do we take a stream of sampled numbers and filter them using this? Well, the first step is to transform this transfer function from the continuous time domain, to the discrete time domain (after all, we're working with discrete samples, the waveform is only recorded at regular intervals). For that we need something called the
bilinear transform. Given some analog filter (continuous time) transfer function, H
a(s), we want to find its digital (discrete time) transfer function, H
d(z). Now, the Wikipedia article uses s = 2 / T ( z - 1 ) / ( z + 1 ) as its transform. I'm going to do something slightly different (recall when I mentioned that I was going to assume the cutoff frequency of the low-pass filter was always 1, well here's the part where we take care of that). Instead of 2 / T, I'm going to introduce a warping factor, that we'll call c for now.
As
the article discusses, this transformation, since it's only an approximation, introduces a frequency warping, which causes the cut-off frequency to move. Replacing that 2 / T with c and following the algebra (easy enough since that 2 / T factor always stays in front), we arrive at the following relation:
)
Where T is our sampling interval (the amount of time that passes between discrete samples in our input). Now, this relation tells us that, for some frequency, ω, the frequency that
actually corresponds to in the analog filter is ω
a. Our frequencies get warped! Now, what we'd like is for some specific frequency to map to another specific frequency. For example, since our analog low-pass filter has a cutoff frequency is 1, we'd like the
desired cutoff frequency to map to 1:
)
})
So our high and low-pass filters have a cutoff frequency (the frequency above or below which we want to attenuate the response). But what about our band-pass and band-stop filters? Those really have two cutoff frequencies, a low and a high frequency, that defines their bands. Well, this is a good time to talk about that Q factor and the center frequency. First, the center frequency of a band-pass or band-stop filter is defined as the geometric mean of its high and low cutoff frequencies. That is:

Now, you might be wondering why it's not simply the average of the two. Well, technically it is the average of the two frequencies...in log space. If you take the logarithms of the lower frequency and the upper frequency, then the logarithm of the center frequency will be their average. Now let's get to that Q factor. The Q factor is defined as:

One short aside, the warping factor (that c term) for band-pass and band-stop filters will be the center frequency (as I mentioned briefly before, the transformations I used above assume that the center frequency is 1, so we want our actual center frequency to map to 1).
Okay, we still haven't full moved into the digital domain, so let's continue with our high-pass filter example. We left off at:
 = \frac{s}{1 + s})
Let's apply our bilinear transform:
 = H_a\left(c \frac{z - 1}{z + 1}\right) = \frac{c \frac{z - 1}{z + 1}}{1 + c \frac{z - 1}{z + 1}})
Doing some algebra:
 = \frac{ c z - c }{ z + 1 + c z - c } = \frac{ c z - c }{ (1 + c) z + (1 - c) } = \frac{ c - c z^{-1} }{(1 + c) + (1 - c)z^{-1}})
Now, we have a digital transfer function. Like the continuous transfer function, it's defined as the ratio of the z-transforms (the discrete time equivalent of the Laplace transform) of the output and the input:
 = Y(z) / X(z) = \frac{ c - c z^{-1} }{(1 + c) + (1 - c)z^{-1}})
We're much closer now, a little more algebra to go:
 + ( 1 - c )z^{-1}\right] Y(z) = \left[ c - c z^{-1} \right] X(z))
 Y(z) + (1 - c) z^{-1} Y(z) = c X(z) - c z^{-1} X(z))
Now, we're going to take the inverse z-transform. The only tricks we need to know are that the z-transform is linear, so constant multiples (like c or 1 + c) will just pop out and that the inverse of z
-n X(z) is x[k - n] where k is the current time sample (meaning that this basically refers to previous time samples). So, here we go:
 y[k] + (1 - c) y[k - 1] = c x[k] - c x[k - 1])
 y[k - 1] \right))
So what does this tell us? This tells us that the value of the output at the current sample, k, is a combination of the current and previous inputs, x[k] and x[k - 1], and the previous output, y[k - 1]. So, to implement our high-pass filter in code, it might look like this:
var den, c;
c = 1 / tan( pi * cutoffFrequency_Hz / samplingFrequency_Hz );
output = ( c * ( input - lastInput ) - ( 1 - c ) * lastOutput ) / ( 1 + c );
lastInput = input;
lastOutput = output;
Note: I've changed the expression for c to use frequencies rather than angular frequency and interval, it results in a slightly simpler expression.
So now we can just step through the samples one at a time computing each output. One important note is that the cutoff frequency should be greater than 0 and less than half the sampling frequency. You should never try to filter frequencies above half the sampling frequency (essentially because of aliasing, discrete data simply can't capture frequencies that high). I'll leave it as an exercise for others to go through the derivations of the other filter types.
As a bit of interesting trivia, let's analyze our filter to see if it makes sense. Let's let the cutoff frequency tend towards 0. Since this is a high-pass filter, that means we're essentially telling it to not attenuate any frequencies, so our output should match our input.
Well, if cutoff frequency goes to 0, we can see that c tends towards infinity. If we take that limit in our expression for the output, we get this:
output ~= input - lastInput + lastOutput
Basically, it's taking the difference between the current and previous inputs (how much they've changed) and adding it to the previous output. This will make the output follow the input, exactly as we expect.
If, instead, we go the other direction and let c tend towards 0 (the case where we let the cutoff frequency approach half the sampling frequency), we find the following:
output ~= lastOutput
meaning that the filter doesn't ever change!