My Favourite 20 Words

Geoff wrote:

I was updating my profile at the UK social networking site, Friends Reunited, and decided to pay attention to a question that I’d hitherto ignored: “What are your 20 favourite words”. This is what I came up with:

love, peace, new, unexpected, honest, whimsy, intense, tranquil, touch, trust, humanism, experiment, imagination, learn, reflection, evolution, revolution, reason, create, teach

…and taking a far more practical approach to the matter, it struck me that for myself a more accurate impression might be had through counting the numbers of words in all my blogpostings, combined.

So I thought for a moment, engaged leet-drive and typed:

$ find articles/ archive/ -type f -print |
pargs cat |
perl -pe ‘s!<[^>]*?>!!go’ |
tr -cd ‘a-zA-Z \010\012’ |
tr ‘ A-Z’ ‘\012a-z’ |
tally |
sort -rn |
head -50 |
cat -n

…and it did prettymuch what I wanted, first time; pargs is a better xargs of my own devising, and tally is an awk script which counts instances of records which pipe through it. The only goof was not to expunge blank lines before running tally, but that’s a simple grep -v '^$'.

The results are decidedly boring:

index count word
1 15982 the
2 9082 to
3 8502 a
4 8322 and
5 7685 of
6 5577 i
7 4813 in
8 3628 that
9 3288 is
10 3182 for
11 3105 it
12 2421 on
13 2169 with
14 1950 this
15 1940 was
16 1927 my
17 1917 be
18 1827 at
19 1670 you
20 1615 have

…and things get only marginally more interesting further down:

index count word
71 485 security
91 349 work
124 254 bike
167 182 police
202 155 british

“Sex” is not even in the top 500, alas. Nor is “love”. Only some small crum of comfort does come from the fact that in 30 seconds I created an incantation that would functionally stump most people in the real world.

Being a old-style unix hacker is still useful.

Comments

3 responses to “My Favourite 20 Words”

  1. bridget
    re: My Favourite 20 Words

    love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love,love, love, love, love, love, love, love

    Im upping the lurve quota on your blog so you can do a new search How New Age of me!!!

  2. Toby
    re: My Favourite 20 Words

    Sure, being an old-style unix hacker has its rewards, but have you published the source of tally and pargs anywhere yet?

  3. alecm
    re: My Favourite 20 Words

    $ wc -l ~/scripts/tally ~/scripts/pargs

    9 /Users/alecm/scripts/tally

    29 /Users/alecm/scripts/pargs

    …tally you should be able to work out for yourself, the only reason it is 9 lines long is coz of linebreaks, without which it’s be three lines.

    #!/usr/bin/awk -f

    { foo[$0]++; }

    END { for (i in foo) { print foo[i], i; } }

    …as for pargs, i cannot be arsed. The thing is: although I am all for open-source, I am even more for people having the ability to do things for themselves.

    What I want to know is why can’t everybody do the above?

Leave a Reply

Your email address will not be published. Required fields are marked *