F# Dictionary Binary Search

During a recent assignment, I found the need to create a quick word lookup from an ordered dictionary file. I post the following:

open System
open System.IO
let rec line (s : StreamReader) (sawEol : bool) (acc : string) = 
    match (char.ConvertFromUtf32(s.BaseStream.ReadByte())) with
    | "n" when (not sawEol) -> line s true ""
    | "n" when sawEol ->  acc
    | str -> line s sawEol (acc + str)   
let rec binarySearch (s : StreamReader) (target : string) 
                     (low : System.Int64) (high : System.Int64) =
    let mid = Int64.shift_right (low + high) 1 in
    let pos = s.BaseStream.Seek(mid, SeekOrigin.Begin) in 
    let cur = (line s false "") in
    printf "Current: %sn" cur;
    if (high < low) then
    elif (String.CompareOrdinal(cur, target) > 0) then
        binarySearch s target low (mid - Int64.one)
    elif (String.CompareOrdinal(cur, target) < 0) then
        binarySearch s target (mid + Int64.one) high
let isValidWordFromFile (filename: string) (word : string) =
    let fileInfo = new FileInfo(filename) in
    let size = fileInfo.Length in
    use stream = fileInfo.OpenText() in
    let res = (binarySearch stream word Int64.zero size) in
    not (res = Int64.zero)
let output = isValidWordFromFile "dictionary.txt" "entropies";;
printf "Found: %sn" (any_to_string output);;

Let me describe what is happening. I have found that as I learn to write stuff functionally, I tend to do things from a top down approach but write it bottom up.  The isValidWordFromFile function simply gets everything set up. Note the use keyword in 

use stream = fileInfo.OpenText() in

This is similar in nature to the using statement in C#: It ensures that the stream is disposed of after use. The two functions of interest are the binarySearch and line functions. The rec keyword makes both functions recursive in F#. The binarySearch function in essence keeps "halving" the dictionary file by seeing if the word is either a match, too "large" or too "small". If it is too large, it resumes the "halving" process but on the lower half of the file. If it is too small, it resumes the "halving" process on the upper half of the file.

When I originally started, I attempted to use the Stream.BaseStream.Seek method in conjunction with the Stream.ReadLine() in the binarySearch function. This was not functioning correctly. After calling Seek on the BaseStream, the current file position was set correctly. Calling Seek thereafter yielded no results. In comes the line method. I like this method because it truly shows the power of a functional language. It in essence takes a stream and spits out a complete line. Since it is possible that the file Position is in the middle of a line, it is important to actually retrieve the next complete line. The acc variable is an accumulator for each byte that is going to be the eventual string. The sawEol variable simply tells the function whether or not it saw an end of line character. The match construct takes the byte at the current position and sort of does a C# switch on it. I say sort of because this match construct is a lot more powerful than a simple switch. The first case is when it sees a "n" AND has not seen an end of line character yet. This means that it is not yet time to start accumulating characters. If we see a "n" AND have seen an end of line already, that means we have completely read an entire line. This essentially ends the recursion. If it is some other character, add it to the accumulator variable and keep going.

In summary, I am extremely impressed with the functional programming paradigm. It seems simpler when I'm finished with something useful. The problem I have been encountering is wrapping my imperative mind functionally.

Your email address will not be published. Required fields are marked *