duminică, 30 ianuarie 2011

Implementation of the prefix tree(trie) data structure in C#

Update: all the source code for this article can be accessed from here

A prefix tree is a data structure with many uses. I won't go into details about this data structure Wikipedia does a better job than I could. What I will do is post a C# implementation of this data structure and in a following post I will show how to use this data structure when implementing an auto-complete solution.

These are the properties of a prefix tree:
  • it stores strings(finite sequences of chars)
  • it had nodes :) (doooooooh!)
  • each node stores a character
  • each node can have a number of subnodes(0,1,2, ... infinity)
  • the char stored in a subnode is the one following the char stored by the node in a string
  • each node has a flag which is raised if a word ends there(this doesn't mean it can't have subnodes for example a prefix tree can store the words "cat" and "catastrophy" in this case node 3 stores 't' and has a subnode that stores 'a')

And now the code:
The node:

namespace PrefixTree
{
 /// 
 /// this is the class that represents a node in the prefix tree
 /// the tree stores:
 /// 1. a char
 /// 2. references to it's subnodes
 /// 3. a flag which is raised if any word ends with the char stored in this node
 /// 
 class Node
 {

     /// 
     /// char stored by the node
     /// 
     public char Char;
     /// 
     /// references to it's subnodes
     /// 
     public List Subtree;
     /// 
     /// the flag which is raised if any word ends with the char stored in this node
     /// 
     public bool AWordEndsHere;

     /// 
     /// constructor
     /// 
     /// 

     public Node(char c)
     {
         Char = c;
         Subtree = new List();
     }
     /// 
     /// gets a reference to this node's subnode that stores the specified char
     /// 
     /// 
the specified char
     /// reference to the subnode that stores the specified char
     public Node GetChild(char c)
     {
         if (Subtree.Count != 0)
             foreach (var node in Subtree)
                 if (node.Char == c)
                     return node;
         return null;
     }
 }
}


The tree:

namespace PrefixTree
{

   /// 
   /// this is the main prefix tree class
   /// 
   class Tree
   {
       private List subsequentStrings;

       private Node _root;
       public Node Root
       {
           get
           {
               return _root;
           }
       }

       /// 
       /// Create a prefix tree - the root node should always store ' '
       /// 
       public Tree()
       {
           _root = new Node(' ');
       }

       /// 
       /// Add a word(series of characters) into the prefix tree
       /// 
       /// 

       public void AddWord(string word)
       {
           char[] chars = word.ToCharArray();
           Node currentNode = _root;
           Node child = null;
           //cycle through the chars in the string
           for (int counter = 0; counter < child =" currentNode.GetChild(word[counter]);" child ="="" newnode =" new" currentnode =" newNode;" currentnode =" child;" counter ="="" awordendshere =" true;">
       /// return wether a string can be found in the tree
       /// 
       /// 
a string
       /// true or false
       public bool Find(string query)
       {
           char[] chars = query.ToCharArray();
           Node currentNode = _root;
           Node child = null;
           //search throughout the child nodes of the current node for the current char
           for (int counter = 0; counter < child =" currentNode.GetChild(chars[counter]);" child ="="" currentnode =" child;">


The main program:

Tree prefixTree = new Tree();
          
prefixTree.AddWord("abc");
prefixTree.AddWord("abcd");
prefixTree.AddWord("abcde");
prefixTree.AddWord("abcdef");
prefixTree.AddWord("ab123cd");
prefixTree.AddWord("abc123d");
prefixTree.AddWord("abc132d");
          
string word = "abc";


if (prefixTree.Find(word))
Console.WriteLine("found");
else
Console.WriteLine("not found");

2 comentarii:

  1. The AddWord and Find function seem to be incomplete. I dont understand why they have the XML nodes and the Node class ends with . Am I missing something?

    RăspundețiȘtergere
  2. @Mujahid Shariff yes there seems to be some kind of an error with the code highlighting script
    I'll try and fix it first time I get

    RăspundețiȘtergere