The best way I have found to count words in Java is to use a StreamTokenizer. This is a class dating back to the beginnings of Java, often overlooked in favour of String with its split() method. In its default invocations, it easily distinguishes words when mixed with numbers and punctuation as the example below shows. If more specialized or exceptional parsing is required, then check the API docs for such methods as wordChars, ordinaryChars and whitespaceChars. The source code is HERE

import java.io.*;

public class WordCount {
    public static void main(String[] args) {
        StringReader in = new StringReader("The quick, brown fox jumps over 1 lazy dog");
        System.out.println(new WordCount().getWordCount(in));
    }

    public int getWordCount(Reader in) {
        int result = 0;
        try {
            StreamTokenizer st = new StreamTokenizer(in);

            while (st.nextToken() != StreamTokenizer.TT_EOF) {
                switch (st.ttype) {
                    case StreamTokenizer.TT_WORD:
                        result++;
                        break;

                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        finally {
            try {
                in.close();
            } catch (IOException e) { /* ignore */
            }
        }
        return result;
    }

}