tag:blogger.com,1999:blog-17898782167455438972024-03-13T08:54:02.646-07:00Paraschiv Andrei Technical BlogParaschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.comBlogger26125tag:blogger.com,1999:blog-1789878216745543897.post-21430691836574590862016-06-27T07:17:00.001-07:002016-06-27T07:17:41.713-07:00The Trie: A Neglected Data Structure<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
From the very first days in our lives as <a href="https://www.toptal.com/freelance" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">programmers</a>, we’ve all dealt with data structures: Arrays, linked lists, trees, sets, stacks and queues are our everyday companions, and the <a href="https://www.toptal.com/freelance/the-traveling-engineers-survival-guide" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">experienced programmer</a> knows when and why to use them. In this article we’ll see how an oft-neglected data structure, the <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">trie</em>, really shines in application domains with specific features, like word games.</div>
<h2 id="word-games" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Word Games</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
For starters, let’s consider a simple word puzzle: find all the valid words in a 4x4 letter board, connecting adjacent letters horizontally, vertically, or diagonally. For example, in the following board, we see the letters ‘W’, ‘A’, ‘I’, and ‘T’ connecting to form the word “WAIT”.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<img alt="simple word puzzle" src="https://assets.toptal.io/uploads/blog/image/103/toptal-blog-2_E.png" style="border: 0px; box-sizing: border-box; display: block; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
The naive solution to finding all valids words would be to explore the board starting from the upper-left corner and then moving depth-first to longer sequences, starting again from the second letter in the first row, and so on. In a 4x4 board, allowing vertical, horizontal and diagonal moves, there are 12029640 sequences, ranging in length from one to sixteen characters.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Now, our goal is to find the best data structure to implement this valid-word checker, i.e., our vocabulary. A few points to keep in mind:</div>
<ul style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 20px; list-style: none; margin: 0px 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">We only need a single copy of each word, i.e., our vocabulary is a set, from a logical point of view.</li>
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">We need to answer the following questions for any given word:<ul style="border: 0px; box-sizing: border-box; font-size: 1em; list-style: none; margin: 1em 0px 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: circle; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">Does the current character sequence comprise a valid word?</li>
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: circle; margin: 0px 0px 0px 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">Are there longer words that begin with this sequence? If not, we can abandon our depth-first exploration, as going deeper will not yield any valid words.</li>
</ul>
</li>
</ul>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
To illustrate the second point, consider the following board: There’s no point in exploring subsequent moves, since there are no words in the dictionary that start with “ASF”.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<img alt="nothing starts with asf" src="https://assets.toptal.io/uploads/blog/image/104/toptal-blog-2_D.png" style="border: 0px; box-sizing: border-box; display: block; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
We’d love our data structure to answer these questions as quickly as possible. ~O(1) access time (for checking a sequence) would be ideal!</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
We can define the Vocabulary interface like this (see <a href="https://github.com/acbellini/juzzle/blob/master/TrieDictionary/src/net/webeggs/juzzle/Vocabulary.java" rel="noopener noreferrer" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" target="_blank">here</a> for the GitHub repo):</div>
<pre lang="java" style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-size: 15px; line-height: 20px; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;"><code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.9em; line-height: 1.5em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; overflow-x: auto; padding: 2px 5px; vertical-align: text-bottom; width: 864px;">public interface Vocabulary {
boolean add(String word);
boolean isPrefix(String prefix);
boolean contains(String word);
}
</code></pre>
<h2 id="candidate-data-structures" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Candidate Data Structures</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Implementing the <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">contains()</code> method requires a backing data structure that lets you find elements efficiently, while the <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">isPrefix()</code> method requires us to find the “next greater element”, i.e. we need to keep the vocabulary sorted in some way.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
We can easily exclude hash-based sets from our list of candidates: while such a structure would give us constant-time checking for <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">contains()</code>, it would perform quite poorly on <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">isPrefix()</code>, in the worst case requiring that we scan the whole set.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
For quite the opposite reason, we can also exclude sorted linked-lists, as they require scanning the list at least up to the first element that is greater than or equal to the searched word or prefix.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Two valid options are using a sorted array-backed list or a binary tree.<br style="box-sizing: border-box;" />On the sorted array-backed list we can use binary search to find the current sequence if present or the next greater element at a cost of <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">O(log2(n))</em>, where <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">n</em> is the number of words in the dictionary.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
We can implement an array-backed vocabulary that always maintains ordering of like <a href="https://github.com/acbellini/juzzle/blob/master/TrieDictionary/src/net/webeggs/juzzle/vocabulary/ListVocabulary.java" rel="noopener noreferrer" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" target="_blank">this</a>, using standard <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">java.util.ArrayList</code> and <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">java.util.Collections.binarySeach</code>:</div>
<pre lang="java" style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-size: 15px; line-height: 20px; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;"><code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.9em; line-height: 1.5em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; overflow-x: auto; padding: 2px 5px; vertical-align: text-bottom; width: 864px;">public class ListVocabulary implements Vocabulary {
private List<String> words = new ArrayList<String>();
/**
* Constructor that adds all the words and then sorts the underlying list
*/
public ListVocabulary(Collection<String> words) {
this.words.addAll(words);
Collections.sort(this.words);
}
public boolean add(String word) {
int pos = Collections.binarySearch(words, word);
// pos > 0 means the word is already in the list. Insert only
// if it's not there yet
if (pos < 0) {
words.add(-(pos+1), word);
return true;
}
return false;
}
public boolean isPrefix(String prefix) {
int pos = Collections.binarySearch(words, prefix) ;
if (pos >= 0) {
// The prefix is a word. Check the following word, because we are looking
// for words that are longer than the prefix
if (pos +1 < words.size()) {
String nextWord = words.get(pos+1);
return nextWord.startsWith(prefix);
}
return false;
}
pos = -(pos+1);
// The prefix is not a word. Check where it would be inserted and get the next word.
// If it starts with prefix, return true.
if (pos == words.size()) {
return false;
}
String nextWord = words.get(pos);
return nextWord.startsWith(prefix);
}
public boolean contains(String word) {
int pos = Collections.binarySearch(words, word);
return pos >= 0;
}
}
</code></pre>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
If we decided to use a binary tree, the implementation could be even shorter and more elegant (again, here’s a <a href="https://github.com/acbellini/juzzle/blob/master/TrieDictionary/src/net/webeggs/juzzle/vocabulary/TreeVocabulary.java" rel="noopener noreferrer" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" target="_blank">link to the code</a>):</div>
<pre lang="java" style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-size: 15px; line-height: 20px; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;"><code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.9em; line-height: 1.5em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; overflow-x: auto; padding: 2px 5px; vertical-align: text-bottom; width: 864px;">public class TreeVocabulary extends TreeSet<String> implements Vocabulary {
public TreeVocabulary(Collection<String> c) {
super(c);
}
public boolean isPrefix(String prefix) {
String nextWord = ceiling(prefix);
if (nextWord == null) {
return false;
}
if (nextWord.equals(prefix)) {
Set<String> tail = tailSet(nextWord, false);
if (tail.isEmpty()) {
return false;
}
nextWord = tail.iterator().next();
}
return nextWord.startsWith(prefix);
}
/**
* There is a mismatch between the parameter types of vocabulary and TreeSet, so
* force call to the upper-class method
*/
public boolean contains(String word) {
return super.contains(word);
}
}
</code></pre>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
In both cases, we can expect <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">O(log n)</em> performance for each access method (<code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">contains()</code> and <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">isPrefix()</code>). As for space requirements, both the array-backed implementation and the tree-backed implementation require<em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">O(n+M)</em> where n is the number of words in the dictionary and M is the bytesize of the dictionary, i.e. the sum of the length of the strings in the dictionary.</div>
<h2 id="enter-the-trie" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Enter, the Trie</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Logarithmic performance and linear memory isn’t bad. But there are a few more characteristics of our application domain that can lead us to better performance:</div>
<ul style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 20px; list-style: none; margin: 0px 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">We can safely assume that all words are lowercase.</li>
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">We accept only a-z letters—no punctuation, no hyphens, no accents, etc.</li>
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">The dictionary contains many inflected forms: plurals, conjugated verbs, composite words (e.g., house –> housekeeper). Therefore, <span style="border: 0px; box-sizing: border-box; font-weight: 600; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">many words share the same stem</span>.</li>
<li style="border: 0px; box-sizing: border-box; line-height: 1.5em; list-style-type: disc; margin: 0px 0px 0.75em 30px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">Words have a limited length. For example, if we are working on a 4x4 board, all words longer than 16 chars can be discarded.</li>
</ul>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
This is where the trie (pronounced “try”) comes in. But what exactly <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">is</em> a trie? Tries are neglected data structures, found in books but rarely in standard libraries.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
For motivation, let’s first consider Computer Science’s poster child: the binary tree. Now, when we analyze the performance of a binary tree and say operation <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">x</em> is <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">O(log(n))</em>, we’re constantly talking log base 2. But what if, instead of a binary tree, we used a ternary tree, where every node has three children (or, a fan-out of three). Then, we’d be talking log base 3. (That’s a performance improvement, albeit only by a constant factor.) Essentially, our trees would become wider but shorter, and we could perform fewer lookups as we don’t need to descend quite so deep.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Taking things a step further, what if we had a tree with fan-out equal to the number of possible values of our datatype?</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
This is the motivation behind the trie. And as you may have guessed, a trie is indeed a tree!</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
But, contrary to most binary-trees that you’d use for sorting strings, those that would store entire words in their nodes, each node of a trie holds a single character (and not even that, as we’ll see soon) and has a maximum fan-out equal to the length of the alphabet. In our case, the length of the alphabet is 26; therefore the nodes of the trie have a maximum fan-out of 26. And, while a balanced binary tree has <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">log2(n)</em> depth, the maximum depth of the trie is equal to the maximum length of a word! (Again, wider but shorter.)</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Within a trie, words with the same stem (prefix) share the memory area that corresponds to the stem.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
To visualize the difference, let’s consider a small dictionary made of five words. Assume that the Greek letters indicate pointers, and note that in the trie, <span style="border: 0px; box-sizing: border-box; font-weight: 600; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">red characters indicate nodes holding valid words</span>.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<img alt="visualizing the trie" src="https://assets.toptal.io/uploads/blog/image/106/toptal-blog-3_F.png" style="border: 0px; box-sizing: border-box; display: block; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" /></div>
<h2 id="the-implementation" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
The Implementation</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
As we know, in the tree the pointers to the children elements are usually implemented with a left and right variable, because the maximum fan-out is fixed at two.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
In a trie indexing an alphabet of 26 letters, each node has 26 possible children and, therefore, 26 possible pointers. Each node thus features an array of 26 (pointers to) sub-trees, where each value could either be null (if there is no such child) or another node.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
How, then, do we look-up a word in a trie? Here is the method that, given a <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">String s</code>, will identify the node that corresponds to the last letter of the word, if it exists in the tree:</div>
<pre lang="java" style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-size: 15px; line-height: 20px; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;"><code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.9em; line-height: 1.5em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; overflow-x: auto; padding: 2px 5px; vertical-align: text-bottom; width: 864px;">public LowercaseTrieVocabulary getNode(String s) {
LowercaseTrieVocabulary node = this;
for (int i = 0; i < s.length(); i++) {
int index = LOWERCASE.getIndex(s.charAt(i));
LowercaseTrieVocabulary child = node.children[index];
if (child == null) {
// There is no such word
return null;
}
node = child;
}
return node;
}
</code></pre>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
The <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">LOWERCASE.getIndex(s.charAt(i))</code> method simply returns the position of the <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">ith</em> character in the alphabet. On the returned node, a Boolean property <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">node</code> indicates that the node corresponds to the last letter of a word, i.e. a letter marked in red in the previous example. Since each node keeps a counter of the number of children, if this counter is positive then there are longer words in the dictionary that have the current string as a prefix. <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">Note: the node does not really need to keep a reference to the character that it corresponds to, because it’s implicit in its position in the trie.</em></div>
<div class="embeddable_form-wrapper for-post" data-role="blog_subscribe" data-view="blog_subscribe#form" style="background-color: white; border-radius: 6px; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 15px; line-height: 20px; margin: 0px; min-height: 0px; min-width: 0px; padding: 30px 0px; vertical-align: baseline;">
<form action="https://www.toptal.com/blog/subscription" class="embeddable_form" data-entity="blog_subscription" data-remote="" data-view="form#form" method="post" style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<div class="embeddable_form-step is-confirmation" data-place="@blog_subscribe" data-role="confirmation" style="border: 0px; box-sizing: border-box; height: 0px; margin: 0px; min-height: 0px; min-width: 0px; overflow: hidden; padding: 0px; vertical-align: baseline;">
<div class="embeddable_form-row is-label is-success" style="border: 0px; box-sizing: border-box; margin: 0px 0px 10px; min-height: 0px; min-width: 0px; padding: 0px; position: relative; vertical-align: baseline; width: 864px;">
<div class="embeddable_form-label_title" style="border: 0px; box-sizing: border-box; color: #2557a1; display: inline-block; font-size: 17px; font-weight: 700; line-height: 23px; margin: 0px 10px 0px 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div class="embeddable_form-label" style="border: 0px; box-sizing: border-box; color: #313131; display: inline-block; font-size: 17px; font-style: italic; line-height: 19px; margin: 0px 10px 0px 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
</div>
</div>
<div class="embeddable_form-row is-success" style="border: 0px; box-sizing: border-box; margin: 0px 0px 10px; min-height: 0px; min-width: 0px; padding: 0px; position: relative; vertical-align: baseline; width: 864px;">
<div class="embeddable_form-label is-header" style="border: 0px; box-sizing: border-box; color: #313131; display: inline-block; font-size: 17px; font-style: italic; line-height: 19px; margin: 0px 10px 0px 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div class="embeddable_form-label" style="border: 0px; box-sizing: border-box; color: #313131; display: inline-block; font-size: 17px; font-style: italic; line-height: 19px; margin: 0px 10px 0px 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<a data-role="preferences_link" href="https://www.toptal.com/java/the-trie-a-neglected-data-structure#" style="border: 0px; box-sizing: border-box; color: #3976cb; display: inline; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; text-decoration: none; transition: color 150ms, transform, text-shadow, -webkit-transform; vertical-align: baseline;"></a></div>
</div>
<div class="embeddable_form-row is-success" style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; position: relative; vertical-align: baseline; width: 864px;">
<ul class="social_share is-horizontal is-loaded" data-rss-url="https://www.toptal.com/developers/blog.rss" data-twitter-username="@toptalllc" data-url="https://www.toptal.com/developers/blog" data-view="layout#social_share" data-youtube-channel-url="https://www.youtube.com/channel/UCNqm_euTHZz3o5OnKhUS-oA" style="-webkit-box-direction: normal; -webkit-box-orient: horizontal; -webkit-box-pack: start; border: 0px; box-sizing: border-box; display: flex; flex-direction: row; font-size: 1.2em; justify-content: flex-start; list-style: none; margin: 0px; max-width: 300px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<li class="social_share-item is-counter" style="background-color: #3863a0; border-bottom-color: rgb(236, 236, 236); border-bottom-style: solid; border-left-color: rgb(236, 236, 236); border-left-style: solid; border-radius: 4px 0px 0px 4px; border-right-color: rgb(56, 99, 160) !important; border-top-color: rgb(236, 236, 236); border-top-style: solid; border-width: 1px 0px 1px 1px; box-sizing: border-box; color: white; flex-shrink: 0; height: 50px; line-height: 15px; list-style-type: disc; margin-bottom: 0.75em; margin-left: 30px; margin-right: 0px !important; margin-top: 0px !important; min-height: 0px; min-width: 0px; padding: 10px 0px; position: relative; text-align: center; text-shadow: rgba(0, 0, 0, 0.498039) 0px 1px 2px; transition: box-shadow 0.2s; vertical-align: baseline; width: 50px;" title="Total number of shares"><span class="social_share-item_num" data-role="counter_num" style="border: 0px; box-sizing: border-box; display: block; font-size: 14px; margin: 0px; min-height: 0px; min-width: 0px; opacity: 1; padding: 0px; transition: opacity 0.3s; vertical-align: baseline;"></span><span class="social_share-item_text" data-role="counter_text" style="border: 0px; box-sizing: border-box; display: block; font-size: 9px; margin: 0px; min-height: 0px; min-width: 0px; opacity: 1; padding: 0px; text-transform: uppercase; transition: opacity 0.3s; vertical-align: baseline;"></span></li>
<li class="social_share-item" style="border-bottom-color: rgb(236, 236, 236); border-bottom-style: solid; border-left-color: rgb(236, 236, 236); border-left-style: solid; border-top-color: rgb(236, 236, 236); border-top-style: solid; border-width: 1px 0px 1px 1px; box-sizing: border-box; flex-shrink: 0; height: 50px; line-height: 1.5em; list-style-type: disc; margin-bottom: 0.75em; margin-left: 30px; margin-right: 0px !important; margin-top: 0px !important; min-height: 0px; min-width: 0px; padding: 0px; text-shadow: rgba(0, 0, 0, 0.498039) 0px 1px 2px; transition: box-shadow 0.2s; vertical-align: baseline; width: 50px;"><a class="social_share-item_link" data-role="link" data-type="facebook" href="https://www.blogger.com/null" style="border: 0px; box-sizing: border-box; color: #3976cb; cursor: pointer; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" title="Share on Facebook"><img class="social_share-item_image" height="50" src="https://assets.toptal.io/assets/front/static/public/primitives/social/share_bar/facebook_dc66c9.png" style="border: 0px; box-sizing: border-box; display: block; height: 50px; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline; width: 50px;" width="50" /></a></li>
<li class="social_share-item" style="border-bottom-color: rgb(236, 236, 236); border-bottom-style: solid; border-left-color: rgb(236, 236, 236); border-left-style: solid; border-top-color: rgb(236, 236, 236); border-top-style: solid; border-width: 1px 0px 1px 1px; box-sizing: border-box; flex-shrink: 0; height: 50px; line-height: 1.5em; list-style-type: disc; margin-bottom: 0.75em; margin-left: 30px; margin-right: 0px !important; margin-top: 0px !important; min-height: 0px; min-width: 0px; padding: 0px; text-shadow: rgba(0, 0, 0, 0.498039) 0px 1px 2px; transition: box-shadow 0.2s; vertical-align: baseline; width: 50px;"><a class="social_share-item_link" data-role="link" data-type="google_plus" href="https://www.blogger.com/null" style="border: 0px; box-sizing: border-box; color: #3976cb; cursor: pointer; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" title="Share on Google Plus"><img class="social_share-item_image" height="50" src="https://assets.toptal.io/assets/front/static/public/primitives/social/share_bar/google_plus_355fb0.png" style="border: 0px; box-sizing: border-box; display: block; height: 50px; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline; width: 50px;" width="50" /></a></li>
<li class="social_share-item" style="border-radius: 0px 4px 4px 0px; border: 1px solid rgb(236, 236, 236); box-sizing: border-box; flex-shrink: 0; height: 50px; line-height: 1.5em; list-style-type: disc; margin-bottom: 0px; margin-left: 30px; margin-right: 0px !important; margin-top: 0px !important; min-height: 0px; min-width: 0px; padding: 0px; text-shadow: rgba(0, 0, 0, 0.498039) 0px 1px 2px; transition: box-shadow 0.2s; vertical-align: baseline; width: 50px;"><a class="social_share-item_link" data-role="link" data-type="twitter_follow" href="https://www.blogger.com/null" style="border: 0px; box-sizing: border-box; color: #3976cb; cursor: pointer; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" title="Follow Toptal on Twitter"><img class="social_share-item_image" height="50" src="https://assets.toptal.io/assets/front/static/public/primitives/social/share_bar/twitter_83c6d4.png" style="border: 0px; box-sizing: border-box; display: block; height: 50px; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline; width: 50px;" width="50" /></a></li>
</ul>
</div>
</div>
</form>
</div>
<h2 id="analyzing-performance" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Analyzing Performance</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
What makes the trie really perform well in these situations is that the cost of looking up a word or prefix is fixed and dependent only on the number of characters in the word and not on the size of the vocabulary.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
In our specific domain, since we have strings that are at most 16 characters, exactly 16 steps are necessary to find a word that is in the vocabulary, while any negative answer, i.e. the word or prefix is not in the trie, can be obtained in at most 16 steps as well! Considering that we have previously ignored the length of the string when calculating running time complexity for both the array-backed sorted list and the tree, which is hidden in the string comparisons, we can as well ignore it here and safely state that lookup is done in <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">O(1)</em> time.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Considering space requirements (and remembering that we have indicated with <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">M</em> the bytesize of the dictionary), the trie could have <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">M</em> nodes in the worst case, if no two strings shared a prefix. But since we have observed that there is high degree of redundancy in the dictionary, there is a lot of compression to be done. The English dictionary that is used in the example code is 935,017 bytes and requires 250,264 nodes, with a compression ratio of about 73%.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
However, despite the compression, a trie will usually require more memory than a tree or array. This is because, for each node, at least 26 x <code style="background: rgb(255, 255, 252); border-radius: 3px; border: 1px solid rgb(238, 238, 238); box-sizing: border-box; display: inline-block; font-size: 0.8em; line-height: 1em; margin: 0px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 2px 5px; vertical-align: text-bottom;">sizeof(pointer)</code> bytes are necessary, plus some overhead for the object and additional attributes. On a 64-bit machine, each node requires more than 200 bytes, whereas a string character requires a single byte, or two if we consider UTF strings.</div>
<h2 id="performance-tests" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Performance Tests</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
So, what about performance? The vocabulary implementations were tested in two different situations: checking for 20,000,000 random strings and finding all the words in 15,000 boards randomly generated from the same word list.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Four data structures were analyzed: an array-backed sorted list, a binary tree, the trie described above, and a trie using arrays of bytes corresponding to the alphabet-index of the characters themselves (a minor and easily implemented performance optimization). Here are the results, in ms:</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
<img alt="performance results" src="https://assets.toptal.io/uploads/blog/image/101/toptal-blog-1_A.png" style="border: 0px; box-sizing: border-box; display: block; margin: 0px auto 7px; max-width: 100%; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
The average number of moves made to solve the board is 2,188. For each move, a word lookup and a prefix lookup are done, i.e., for checking all the boards, more than 32M word lookups and 32M prefix lookups were performed. <em style="border: 0px; box-sizing: border-box; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">Note: these could be done in a single step, I kept them separated for clarity in the exposition. Compacting them in a single step would cut the time for solving the boards almost in half, and would probably favour the trie even more.</em></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
As can be seen above, the word lookup perform better with the trie even when using strings, and is even faster when using alphabet indexes, with the latter performing more than twice as fast as a standard binary tree. The difference in solving the boards is even more evident, with the fast trie-alphabet-index solution being more than four times as fast as the list and the tree.</div>
<h2 id="conclusions" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Conclusions</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
The trie is a very specialized data structure that requires much more memory than trees and lists. However, when specific domain characteristics apply, like a limited alphabet and high redundancy in the first part of the strings, it can be very effective in addressing performance optimization.</div>
<h2 id="references" style="background-color: white; border: 0px; box-sizing: border-box; color: #3863a0; font-family: "Proxima Nova", Arial, sans-serif; line-height: 1.3em; margin: 2em 0px 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
References</h2>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
An extensive explanation of tries and alphabets can be found in chapter 5 of Robert Sedgewick’s book “Algorithms, 4th edition”. The companion website at Princeton has the <a href="http://algs4.cs.princeton.edu/code/" rel="noopener noreferrer" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" target="_blank">code for an implementation</a> of Alphabet and TrieST that is more extensive than my example.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
Description of the trie and implementations for various languages can also be found on <a href="https://en.wikipedia.org/wiki/Trie" rel="noopener noreferrer" style="border: 0px; box-sizing: border-box; color: #3976cb; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;" target="_blank">Wikipedia</a>.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #303030; font-family: "Proxima Nova", Arial, sans-serif; font-size: 1.2em; line-height: 1.5em; margin-bottom: 1em; min-height: 0px; min-width: 0px; padding: 0px; vertical-align: baseline;">
This article was written by <a href="https://www.toptal.com/resume/anna-chiara-bellini" style="background-color: #436ba8; border: 0px; box-sizing: border-box; color: white; font-size: 19px; font-weight: 600; line-height: 20px; margin: 0px; min-height: 0px; min-width: 0px; padding: 0px; text-align: center; text-decoration: none; vertical-align: baseline;">Anna Chiara Bellini</a>, a <a href="https://www.toptal.com/java/the-trie-a-neglected-data-structure">Toptal</a><a href="https://www.toptal.com/java"> Java developer</a>.</div>
Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-65001211922186432342015-02-01T13:08:00.000-08:002015-02-01T13:08:01.889-08:00Choosing a license for your code on GithubI added licensing information to all my open source code on Github and Sourceforge after a request from someone who wanted to use parts of it.<br/>
Turns out if someone wants to use your code those licences that you keep seeing everywhere are pretty important. Even if it is just a small, apparently unimportant bit of code.<br/>
The person in question also provided me with a <a href="http://choosealicense.com/">link</a> that explains the differences between the various licences.<br/>
After some consideration decided to go for the MIT license.<br/>
Why MIT?<br/>
Because and I quote <blockquote> It lets people do anything they want with your code as long as they provide attribution back to you and don’t hold you liable</blockquote>. Sounds good to me at this moment in time.<br/>
Found <a href="http://choosealicense.com/">Choosealicense.com</a> to be a very interesting website.Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-72394946952186534732013-03-11T13:45:00.001-07:002013-03-11T13:46:34.834-07:00Contribuited to my second open source projectI was looking for an application that could migrate issues from one of my Bitbucket repositories to one of my Github repositories.<br/>
I couldn't find anything already written but I found something close <a href="https://github.com/tekstop/Git2BitIssues">Git2BitIssues</a> by <a href="https://github.com/tekstop">tekstop</a>. His application basically did the opposite of what I wanted. Moved things from Git to Bit not from Bit to Git. I built on it and achieved the desired result.
You can check out the code if you need something similar done.<br/>
This has been my first experience using collaborative coding on GitHub and I was amazed by how smoothly everything went.
The idea behind improving on someone else's code is that you first <a href="https://help.github.com/articles/fork-a-repo">fork</a> what he did. Basically you get a copy of his repository on your own account. You push your changes to this account and then you ask the author if he is ok with your changes via a <a href="https://help.github.com/articles/using-pull-requests">pull request</a>. It is all really easy and kind of fun.<br/>
Anyway now you can move your issues from one system to another with this application if you need to.<br/>
Cheers!<br/>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-88874967635560265682012-10-20T10:36:00.001-07:002012-10-20T10:36:55.259-07:00My first open source project - Seringa: The SQLi FrameworkI publically launched my first open source project today. It's hosted at github.<br/>
<a href="https://github.com/paratechnical/Seringa">https://github.com/paratechnical/Seringa</a><br/>
A short description copied from the Wiki:<br/>
Seringa(Romanian for seringe) is an SQL injection framework featuring high customizability and a user-friendly interface. It is completely open source. It uses the .NET 4.0 framework and Windows Presentation Foundation(WPF) for the GUI. With regard to design it utilizes the Strategy Pattern to distinguish between various SQLi strategies whilst storing other relevant data such as exploits, payloads and patterns in xml files so that the framework can be easily customized from the outside(a manifestation of the Open-Closed Principle).<br/>
Seringa allows you to:<br/>
<ul>
<li>scan Google search results given a search string</li>
<li>test search results for SQLi vulnerability</li>
<li>test a single url for vulnerability</li>
<li>extract a database structure(databases,tables,columns) in a tree form</li>
<li>execute given payloads and receive results(some predefined queries include current database name, current database user, current database version etc)</li>
<li>save your penetration testing process to a file(mapping file) and load it later</li>
<li>use a proxy(regular or socks) when testing</li>
</ul>
<br/>
Everyone is welcomed to contribute.
Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com3tag:blogger.com,1999:blog-1789878216745543897.post-70059362618558328032012-10-20T10:31:00.001-07:002012-10-20T10:31:35.522-07:00Contribuited to my first open source project - jQGridI had some problems with a jqGrid that contained a table within it. Turned out there was a bug in jQGrid. I fixed the problem locally and I thought others might be having the same problem as well so I tried to commit my changes to the jQGrid source repository. That didn't exactly go as planned but I managed to push the code throgh eventually. <a href="http://www.trirand.com/blog/?page_id=393/bugs/html-td-bug/">Check it out</a>.<br/>
Anyway <a href="http://www.trirand.com/blog/">jqGrid</a> is great. I really recommend it.Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-53873591789862944682012-01-30T00:43:00.000-08:002015-02-01T13:25:05.783-08:00JMultiLocGMap - joomla module that I createdI created a new <a href="http://www.joomla.org/">Joomla </a>module.<br />It's purpose is to display a Google map with multiple locations on it.<div>Locations can be grouped into categories.</div><div>All configuration is done through an xml inside it.</div><div>The module was born out of necessity. I needed it for a website I was working on for my father.</div><div>The requirement was to have a select box and a Google map and upon changing the selected item in the selection box a new set of Google markers would appear on the map.</div><div>For example the selection box could have "schools" and "kintergardens" and when modifying the selection to "schools" the schools on the map would be displayed and when choosing "kintergardens" the kindergartens on the map would be displayed.</div><div>The xml handles all configuration for this.</div><div>This is the first version(0.0.1). The project is still in it's infancy of course.</div><div>Hope it helps someone. Everything is open-source.</div><div>The project is hosted <a href="https://github.com/paratechnical/JMultiLocGMap">here</a>.</div>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com1tag:blogger.com,1999:blog-1789878216745543897.post-52953022704294240502011-03-05T12:48:00.000-08:002012-06-04T08:43:36.518-07:00Merging 2 word 2007(open xml) documentsI had to merge 2 word documents once.<br />
<div>
They were both in word 2007(open xml) format.</div>
<div>
I searched online everywhere for a way to merge them. </div>
<div>
All the links were about using the altChunk element. I have a problem with that. The entire document gets put in there and that is a lot of unneeded data(like header and footer etc.).</div>
<div>
<br /></div>
<div>
So I did it another way:</div>
<div>
1. cloned the second document's body</div>
<div>
2. cloned all it's children and added them to the first document's main part</div>
<br />
<br />
<script src="https://gist.github.com/2869131.js?file=MergeDocs"></script>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com2tag:blogger.com,1999:blog-1789878216745543897.post-74296514434390422952011-03-05T10:46:00.000-08:002013-03-19T04:10:30.905-07:00Radix tree implementation in C#Update: you can find the entire source code <a href="https://github.com/paratechnical/RadixTree">here</a> <br />
<br />
This is my implementation of a <a href="http://en.wikipedia.org/wiki/Radix_tree">radix tree</a> data structure. This is a data structure very similar to a <a href="http://en.wikipedia.org/wiki/Trie">trie</a>. Basically instead of having each one of your nodes hold a char you have each one of your nodes store a string or as wikipedia puts it, it is a trie where "each node with only one child is merged with its child".<br />
<div>
<div>
As I understand it the names radix tree and crit bit tree are only applied to trees storing integers and Patricia trie is retained for more general inputs.</div>
<div>
As usual I am outputting the code for each class.</div>
</div>
<div>
<br /></div>
First, the node:<br />
<script src="https://gist.github.com/paratechnical/2869155.js"></script>
<br />
Then the actual tree<br />
<script src="https://gist.github.com/paratechnical/2869170.js"></script>
And the test program:<br />
<script src="https://gist.github.com/paratechnical/2869203.js"></script>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com5tag:blogger.com,1999:blog-1789878216745543897.post-69225996358472397212011-02-26T07:00:00.000-08:002011-02-26T07:28:25.710-08:00PHP sitemap using a tree data structureI was once asked to create a sitemap given a specific table structure.<div>This was the structure(the table):</div><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-xgmtObrsirE/TWkWcYgk9lI/AAAAAAAAACg/cseeH6L-aeE/s1600/a.jpg"><img style="cursor:pointer; cursor:hand;width: 152px; height: 230px;" src="http://1.bp.blogspot.com/-xgmtObrsirE/TWkWcYgk9lI/AAAAAAAAACg/cseeH6L-aeE/s320/a.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5578014290346309202" /></a><br /><div><br /></div><div>The resulting sitemap as I understood it was supposed to look something like this:</div><div><br /></div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-wzSMFGOOqsY/TWkXQ8OXUYI/AAAAAAAAACo/umpFITltw0w/s1600/b.jpg"><img style="cursor:pointer; cursor:hand;width: 79px; height: 180px;" src="http://2.bp.blogspot.com/-wzSMFGOOqsY/TWkXQ8OXUYI/AAAAAAAAACo/umpFITltw0w/s320/b.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5578015193286791554" /></a><div><br /></div><div>The parenthesis after the titles actually hold the value of the id of the node so they weren't in the original design but whatever.</div><div><br /></div><div>So I implemented this in PHP using a tree data structure where each node held 3 pieces of information:</div><div>id - the current node's id as read from the database</div><div>parent_id - the current node's parent id as read from the database</div><div>title - the title</div><div><br /></div><div>A real-world implementation would also have to include a link and print out <a> elements but this is good enough for educational purposes.</a></div><div><a><br /></a></div><div><a>Each node also holds an array of subnodes.</a></div><div><a><br /></a></div><div><a>This is the database information read to create the tree:</a></div><div><a><br /></a></div><div><a><br /></a></div><br /><pre class="brush: sql"><br />CREATE TABLE IF NOT EXISTS `sitepages` (<br />`id` int(11) unsigned NOT NULL AUTO_INCREMENT,<br />`id_parent` int(11) unsigned NOT NULL,<br />`title` varchar(100) NOT NULL,<br />PRIMARY KEY (`id`)<br />) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;<br /><br />INSERT INTO `sitepages` (`id`, `id_parent`, `title`) VALUES<br />(1, 0, 'Home'),<br />(2, 1, 'Page 1'),<br />(3, 0, 'Page 2'),<br />(4, 1, 'Page 3'),<br />(5, 3, 'Page 4'),<br />(6, 1, 'Page 5'),<br />(7, 4, 'Page 6'),<br />(8, 7, 'Page 7'),<br />(9, 5, 'Page 8');<br /></pre><br /><br />I used to php classes to represent the tree. One to represent a node of the tree and one for the tree itself:<br /><br />This is for the node:<br /><br /><pre class="brush: php"><br /><br />//a node of a tree that is not binary, each nodes has multiple subnodes <br />class Node <br />{ <br />public $id; <br />public $id_parent; <br />public $childnodes; <br />public $title; <br /><br />public function __construct($id,$id_parent,$title) <br />{ <br />//in the test script I use mysql_fetch_object which populates the members and then //calls the constuctor with all <br />//parameters as null <br />//the following checks are here to make sure no members are accidentally overriden<br /> <br />if(!isset($this--->id))<br /> $this->id = $id;<br /><br /> if(!isset($this->id_parent)) <br /> $this->id_parent = $id_parent;<br /><br /> if(!isset($this->title)) <br /> $this->title = $title;<br /><br /> if(!isset($this->childnodes)) <br /> $this->childnodes = array();<br />}<br /><br />//check if this node has subnodes or not<br />public function hasChildren()<br />{<br /> if($this->childnodes === null)<br /> return false;<br /><br /> if(!isset($this->childnodes)) <br /> return false;<br /><br /> if(count($this->childnodes) == 0)<br /> return false;<br /><br /> return true;<br />}<br />}<br /><br /></pre><br /><br />This is the tree:<br /><br /><pre class="brush: php"><br />require("node.class.php"); <br />class Tree <br />{ <br /><br />private $root; <br /><br />//construct a tree with only one root node public function __construct() <br />{ <br />$this->root = new Node(0,0);<br />}<br /><br />//recursively traverse the tree looking for the appropiate parent id<br />//the node to be inserted($node) must have a parent id equal to one of the ids in the nodes of the tree<br />//if such an id is found in a node the node is included as a subnode of that node<br />private function addNodeRecursive(&$nextnode,$node)<br />{<br /> if($nextnode == null)<br /> return false;<br /> <br /> if($nextnode->id == $node->id_parent)<br /> {<br /> array_push($nextnode->childnodes,$node);<br /> return true;<br /> }<br /> else<br /> {<br /> if($nextnode->hasChildren())<br /> {<br /> foreach ($nextnode->childnodes as $childnode)<br /> {<br /> $this->addNodeRecursive($childnode,$node);<br /> }<br /> return false;<br /> }<br /> else<br /> return false;<br /> }<br />}<br /><br />//add a node to the tree<br />//check if it is a valid node start at the root and recursively traverse the tree looking for the appropiate parent id<br />public function addNode($node)<br />{<br /> if(!($node instanceof Node))<br /> return;<br /><br /> $nextnode = $this->root;<br /><br /> $this->addNodeRecursive($nextnode,$node);<br />}<br /><br />//print the tree in sitemap style<br />public function printSiteMap()<br />{<br /> foreach($this->root->childnodes as $childnode)<br /> {<br /> $this->printTree($childnode,0);<br /> }<br />}<br /><br />//print the tree recursively<br />//first print the current node's value and then print the cycle through it's subnodes<br />public function printTree(&$nextnode,$level)<br />{<br /> $this->printNodeAccordingToLevel($nextnode,$level);<br /><br /> if($nextnode->hasChildren())<br /> {<br /> foreach($nextnode->childnodes as $childnode)<br /> {<br /> $this->printTree($childnode,$level+1);<br /> }<br /> }<br />}<br /><br />//add the appropiate number of before each node value so as to obtain a nicely formatted sitemap<br />//each node value is indented with respect to the node's level<br />private function printNodeAccordingToLevel($node,$level)<br />{<br /> for($i=0;$i<$level;$i++) echo " "; echo $node->title."(".$node->id.")";<br /> <br /> echo "<br />";<br />}<br /><br />}<br /><br /><br /><br /></pre><br /><br />And this is a test just to make sure everything works:<br /><br /><pre class="brush: php"><br /><br />require("tree.class.php"); <br />$tree = new Tree(); <br />if(!($conn = mysql_connect("localhost","root",""))) <br />{ <br />echo "Could not connect"; <br />exit; <br />} <br />if(!mysql_selectdb("sitemap")) <br />{ <br />echo "No such database"; <br />exit; <br />} <br />if($result = mysql_query("SELECT id,id_parent,title FROM sitepages")) <br />{ <br />while ($page = mysql_fetch_object($result,'Node')) <br />$tree->addNode($page);<br />}<br /><br />$tree->printSiteMap();<br /><br /></pre>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-71733090254153224632011-02-23T13:39:00.000-08:002015-02-01T13:44:50.652-08:00Prefix tree(trie) implementation in F#Update: all the code can be downloaded from <a href="https://github.com/paratechnical/PrefixTreeFSharp">here</a> <br />
<br />
This is my implementation of a a prefix tree data structure in F#.<br/>
This structure is described in more detail in a previous <a href="http://paratechnical.blogspot.com/2011/01/implementation-of-prefix-tree-data.html">post of mine</a>.<br/>
I just describe the F# related stuff in the comments.<br/>
I'm sure this is not the best implementation. I am aware this is not the most functional implementation ever.<br/>
I am open to all constructive criticism of course.<br />
<br />
<script src="https://gist.github.com/paratechnical/0d963ec15691f025fe66.js"></script>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com1tag:blogger.com,1999:blog-1789878216745543897.post-48319506232906303382011-01-31T11:57:00.000-08:002015-02-01T14:06:38.063-08:00Autocomplete functionality using prefix trees<div>
Update: all the source code for this article can be accessed from <a href="https://github.com/paratechnical/PrefixTreeCSharp">here</a> <br />
<br />
This is a follow up to my <a href="http://paratechnical.blogspot.com/2011/01/c-implementation-of-parallel-sieve-of.html">previous post</a> on implementing a prefix tree in C#.</div>
<div>
This expands on that idea and uses the data structure to generate a list of autocomplete matches for a given input character sequence.</div>
These are the modifications I brought to the code:<br />
<br />
<script src="https://gist.github.com/paratechnical/54e2971d02df1364f089.js"></script>
<br />
<br />
<script src="https://gist.github.com/paratechnical/ca12ee27e344abd79ad5.js"></script>
<br />
<br />
This is the main program:<br />
<br />
<script src="https://gist.github.com/paratechnical/73e23553a8533489ffcc.js"></script>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-73569764328225210032011-01-30T12:52:00.000-08:002012-05-26T13:10:11.791-07:00Implementation of the prefix tree(trie) data structure in C#Update: all the source code for this article can be accessed from <a href="https://github.com/paratechnical/PrefixTreeCSharp">here</a> <br />
<br />
A <a href="http://en.wikipedia.org/wiki/Trie">prefix tree</a> is a data structure with many <a href="http://stackoverflow.com/questions/296618/what-is-the-most-common-use-of-the-trie-data-structure">uses</a>. I won't go into details about this data structure Wikipedia does a better job than I could. What I will do is post a C# implementation of this data structure and in a <a href="http://paratechnical.blogspot.com/2011/01/autocomplete-functionality-using-prefix.html">following post</a> I will show how to use this data structure when implementing an auto-complete solution.<br />
<div>
<br /></div>
<div>
These are the properties of a prefix tree:</div>
<div>
<ul>
<li>it stores strings(finite sequences of chars)</li>
<li>it had nodes :) (doooooooh!)</li>
<li>each node stores a character</li>
<li>each node can have a number of subnodes(0,1,2, ... infinity)</li>
<li>the char stored in a subnode is the one following the char stored by the node in a string</li>
<li>each node has a flag which is raised if a word ends there(this doesn't mean it can't have subnodes for example a prefix tree can store the words "cat" and "catastrophy" in this case node 3 stores 't' and has a subnode that stores 'a')</li>
</ul>
<div>
<br /></div>
</div>
And now the code:<br />
<div>
The node:<br />
<br />
<pre class="brush: csharp">
namespace PrefixTree
{
/// <summary>
/// this is the class that represents a node in the prefix tree
/// the tree stores:
/// 1. a char
/// 2. references to it's subnodes
/// 3. a flag which is raised if any word ends with the char stored in this node
/// </summary>
class Node
{
/// <summary>
/// char stored by the node
/// </summary>
public char Char;
/// <summary>
/// references to it's subnodes
/// </summary>
public List<node> Subtree;
/// <summary>
/// the flag which is raised if any word ends with the char stored in this node
/// </summary>
public bool AWordEndsHere;
/// <summary>
/// constructor
/// </summary>
/// <param name="c" />
public Node(char c)
{
Char = c;
Subtree = new List<node>();
}
/// <summary>
/// gets a reference to this node's subnode that stores the specified char
/// </summary>
/// <param name="c" />
the specified char
/// <returns>reference to the subnode that stores the specified char</returns>
public Node GetChild(char c)
{
if (Subtree.Count != 0)
foreach (var node in Subtree)
if (node.Char == c)
return node;
return null;
}
}
}</node></node></pre>
<br />
<br />
The tree:<br />
<br />
<pre class="brush: csharp">
namespace PrefixTree
{
/// <summary>
/// this is the main prefix tree class
/// </summary>
class Tree
{
private List<string> subsequentStrings;
private Node _root;
public Node Root
{
get
{
return _root;
}
}
/// <summary>
/// Create a prefix tree - the root node should always store ' '
/// </summary>
public Tree()
{
_root = new Node(' ');
}
/// <summary>
/// Add a word(series of characters) into the prefix tree
/// </summary>
/// <param name="word" />
public void AddWord(string word)
{
char[] chars = word.ToCharArray();
Node currentNode = _root;
Node child = null;
//cycle through the chars in the string
for (int counter = 0; counter < child =" currentNode.GetChild(word[counter]);" child ="="" newnode =" new" currentnode =" newNode;" currentnode =" child;" counter ="="" awordendshere =" true;">
/// return wether a string can be found in the tree
///
/// <param name="query" />
a string
/// <returns>true or false</returns>
public bool Find(string query)
{
char[] chars = query.ToCharArray();
Node currentNode = _root;
Node child = null;
//search throughout the child nodes of the current node for the current char
for (int counter = 0; counter < child =" currentNode.GetChild(chars[counter]);" child ="="" currentnode =" child;"></string></pre>
<br /></div>
<br />
The main program:<br />
<br />
<pre class="brush: csharp">
Tree prefixTree = new Tree();
prefixTree.AddWord("abc");
prefixTree.AddWord("abcd");
prefixTree.AddWord("abcde");
prefixTree.AddWord("abcdef");
prefixTree.AddWord("ab123cd");
prefixTree.AddWord("abc123d");
prefixTree.AddWord("abc132d");
string word = "abc";
if (prefixTree.Find(word))
Console.WriteLine("found");
else
Console.WriteLine("not found");</pre>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com2tag:blogger.com,1999:blog-1789878216745543897.post-32510547184395483722011-01-18T12:43:00.000-08:002011-01-18T12:53:14.916-08:00How to concatenate data from multiple rows into a single rows<div>I had a problem once. One table had a lot of data in a column on multiple rows. I had to display it in a view but I needed all the data on that column to be displayed in a single cell on a row. This view was supposed to be the data source for a report ... long story ...</div><div>Anyway, the basic idea is this:</div><div><br /></div><br /><br /><pre class="brush: sql"><br />SELECT some_column = (your select query<br /> for xml path(''))<br /></pre><br /><br /><div><br /></div><div>this only works in sql server 2008 because earlier versions don't have the "xml" keyword.</div><div>Here is an example:</div><div><br /></div><br /><br /><pre class="brush: sql"><br />SELECT Documente = (SELECT coalesce(td.Descriere+',','') as [text()] FROM Document d <br /> INNER JOIN TipDocument td ON d.TipDocumentId = td.Id for xml path(''))<br /><br /></pre>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-10978964847340710302011-01-16T10:22:00.000-08:002011-01-16T10:24:37.330-08:00Unable to start debugging on the web server error<p class="MsoNormal"><span style="font-size: 10pt; font-family: Arial, sans-serif; color: black; ">Just in case you also received this error:</span></p><p class="MsoNormal"><b style="mso-bidi-font-weight:normal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Unable to start debugging on the web server. Debugging failed because<br />integrated Windows authentication is not enabled. Please see Help for<br />assistance</span></b></p><p class="MsoNormal"><span style="font-size: 10pt; font-family: Arial, sans-serif; color: black; ">Here is what I did to not get it any more</span></p> <ol style="margin-top:0in" start="1" type="1"> <li class="MsoNormal" style="color:black;mso-list:l0 level1 lfo1"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Open IIS Manager (Internbet Information Services)</span></li><li class="MsoNormal" style="color:black;mso-list:l0 level1 lfo1"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Right-click the web site (in case you run it locally you only have<br /> Default web site) and pick Properties</span></li><li class="MsoNormal" style="color:black;mso-list:l0 level1 lfo1"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Choose "Directory Security" tab and click Edit on "Anonymous access and<br /> authentication control"</span></li><li class="MsoNormal" style="color:black;mso-list:l0 level1 lfo1"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">In the opening window, uncheck "Allow Anonymous access" and check<br /> "Integrated Windows Authentication" <o:p></o:p></span></li></ol>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-33908894263682965352011-01-16T10:18:00.000-08:002011-01-16T10:20:41.643-08:00Finding the path to an installed application in C#Well, you can find the path to a locally installed application by querying the windows registry.<div>The information is here:</div><div><p class="MsoNormal">HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\<application> </p> <p class="MsoNormal">Example photoshop.exe</p><p class="MsoNormal">I wrote this function in C# to do this for me:</p></div><br /><br /><pre class="brush: csharp"><br />/// <summary><br /> /// get the path to a specific application from the registry <br /> /// </summary><br /> /// <param name="ExeName">name of the application exe</param><br /> /// <returns></returns><br /> public static string GetApplicationPath(string ExeName)<br /> {<br /> try<br /> {<br /> RegistryKey OurKey = Registry.LocalMachine;<br /> OurKey = OurKey.OpenSubKey(@"Software\Microsoft\Windows\CurrentVersion\App Paths\"+ExeName, true);<br /> if (OurKey != null)<br /> return OurKey.GetValue("").ToString();<br /> else<br /> return "";<br /> }<br /> catch(Exception ex)<br /> {<br /> return "";<br /> }<br /> }<br /><br /></pre>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-46123185391114492762011-01-16T09:59:00.000-08:002011-01-16T10:14:55.749-08:00Exporting MS SQL data as sql scriptsIn <a href="http://www.phpmyadmin.net/home_page/index.php">PhpMyAdmin</a> exporting data to sql statements not to .mdb files or other databases is very straight forward. When I first had to do this with <a href="http://en.wikipedia.org/wiki/SQL_Server_Management_Studio">Sql Management Studio</a> I didn't know how at first. <div>Then I found out. </div><div>Basically, what you have to do is this:</div><div>1.right click on a database item</div><div>2.go to tasks -> generate scripts in the menu that appears</div><div><br /></div><div>You now have a wizard that will help you generate scripts for the objects you need. You can either generate one script for all the objects in your database(tables, triggers, stored procedures etc) or one script for just some of the object(you choose which) or each object you choose in it's own script. You can either generate these resulting scripts in files or in new query windows. Everything is configurable from the options window(the wizard step with the title "Choose script options"). </div><div><br /></div><div>Here is an example: </div><div>I wanted to just script the triggers in a db. Each in it's own file.</div><div>In the options window I set "script triggers" to true and in the following wizard form(with the title "Output options") I chose "file per object".</div><div><br /></div><div><br /></div>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com3tag:blogger.com,1999:blog-1789878216745543897.post-21042814623443926422011-01-16T09:55:00.000-08:002011-01-16T09:59:02.880-08:00Reseting visual studio<p class="MsoNormal"><b></b></p><p class="MsoNormal"><b><b style="mso-bidi-font-weight:normal">Visual Studio needs to be reset(in order to change a setting) / is frozen / doesn’t work right<o:p></o:p></b></b></p><b> <p class="MsoNormal"><b style="mso-bidi-font-weight:normal"><o:p> </o:p></b></p> <p class="MsoNormal">1. Close all instances of VS</p> <p class="MsoNormal">2. Click Start > Run...</p> <p class="MsoNormal">3. Type "devenv /resetuserdata"</p> <p class="MsoNormal">4. Open task manager and wait until devenv.exe process is finished executing</p> <p class="MsoNormal">5. Restart VS2005</p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">In Win 7:</p> <p class="MsoNormal">search for the folder in which devenv.exe can be found</p> <p class="MsoNormal">follow the steps above but instead of typing “devenv /resetuserdata” type “devenv.exe /resetuserdata”</p></b><p></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-46786934504016661602011-01-16T07:05:00.000-08:002011-01-16T09:55:02.395-08:00Finding the first non consecutive value in an identity columnThis is an interesting SQL problem that presented itself to me once.<div>You have this table. It has of course a primary key which is an identity column which has a seed of 1. </div><div>Then you insert a some data and then delete some data. </div><div>The primary key column(let's call it id from now on) will no longer contain only consecutive numbers.</div><div> So the question is: "How do you find the first non consecutive value in this column?"</div><div>Well, the first thing that came to my mind at the time was making a cursor. That works too but it's not very efficient.</div><div>Here is one implementation using a self join which worked for me:</div><br /><br /><pre class="brush: sql"/><br />select top 1 t1.id from [table] t1 left join [table] t2 on t1.id+1 = t2.id where t2.id is NULL<br /></pre>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-89577624737621220832011-01-14T13:01:00.000-08:002011-01-14T13:04:55.846-08:00Opening compressed databases or .mdf files in SQL Server Management Studio<p class="MsoNormal">If when you try to open a database or attaching a .mdf file you get the error:</p><p class="MsoNormal"></p><blockquote></blockquote><p class="MsoNormal"><em><span lang="EN-GB" style="font-size:10.0pt;font-family: "Trebuchet MS","sans-serif";color:red;mso-ansi-language:EN-GB"></span></em></p><blockquote><em><span lang="EN-GB" style="font-size:10.0pt;font-family: "Trebuchet MS","sans-serif";color:red;mso-ansi-language:EN-GB">is compressed but does not reside in a read-only database or filegroup. The file must be decompressed.</span></em></blockquote><p></p><p class="MsoNormal"></p><p class="MsoNormal">So here is the algorithm I followed to solve this issue:</p><p class="MsoNormal"></p><p class="MsoNormal">1. only if an existing database gives this error detach the database – right click -> Detach</p><p><span style="color: black; ">2. closed all SQL Server Management Studio instances<o:p></o:p></span></p><p><span style="color: black; ">3. Stop the SQL Server service.. (Start->Run -> services.msc)<o:p></o:p></span></p><p><span style="color: black; ">4. Selected .mdf file(should be blue if compressed ) Right click properties -> Advanced -> Uncheck <span></span>"Compress files to save disk space"<o:p></o:p></span></p><p><span style="color: black; ">5.(re)attach .mdf file</span></p><p></p><p></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-33745273933145115152011-01-14T12:58:00.000-08:002011-01-14T13:00:25.054-08:00Setting the week's start day in MS SQL<p class="MsoNormal">SET DATEFIRST 1 --start day is Monday</p> <p class="MsoNormal">SET DATEFIRST 7 --start day is Sunday – default</p> <p class="MsoNormal">To test it PRINT <span style="font-size:8.0pt;font-family: "Verdana","sans-serif";color:black">@@DATEFIRST <o:p></o:p></span></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-41542741969762696582011-01-14T12:57:00.000-08:002011-01-14T12:58:09.370-08:00Making ASP.NET write to a file<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes">Select the file right click->”Properties”->click the”Security” Tab->click the “Add” tab->write YOURCOMPUTERNAME/ASPNET replacting YOURCOMPUTERNAME with your computer’s name->check the “full control” checkbox below-> click “OK”</span></p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes">When calling the file from the code write with two “\” where one is found in the path for example c:\\q.xml.<o:p></o:p></span></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-19733702248481546552011-01-14T12:53:00.000-08:002011-01-14T12:56:34.855-08:00"Converting" an XmlNode to and XmlDocument in C#<pre class="brush: csharp"><br />private void Output(XmlNode xnodeOutput,bool bOutputToFile)<br /> {<br /> try<br /> {<br /> <br /> XmlDocument xdocOutput = new XmlDocument();<br /><br /> XmlNode xnodeTemp = xdocOutput.ImportNode(xnodeOutput, true);<br /><br /> XmlElement xWC = xdocOutput.CreateElement("whatever");<br /><br /> xdocOutput.AppendChild(xnodeTemp);<br /><br /></pre><br /><br />The first element will never by shown anyway(assuming you do xdocOutput.Save(afile.xml)) so it can have any name.Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-26544894587508024042011-01-14T12:50:00.000-08:002011-01-14T12:52:39.017-08:00Encode Sensitive parameters ASP.NET<p class="MsoNormal">Use:</p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; color:#2B91AF;mso-no-proof:yes">Convert</span><span style="font-size:10.0pt; font-family:"Courier New";mso-no-proof:yes">.ToBase64String(System.Text.<span style="color:#2B91AF">Encoding</span>.Unicode.GetBytes(<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes">To Encode</span></p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes">System.Text.<span style="color:#2B91AF">Encoding</span>.Unicode.GetString(<span style="color:#2B91AF">Convert</span>.FromBase64String(<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes">To decode<o:p></o:p></span></p><p class="MsoNormal"><span class="Apple-style-span" >This should work for both GET and POST</span></p><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"; mso-no-proof:yes"><br /></span></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-59545140791994664732011-01-14T12:43:00.000-08:002011-01-14T12:53:04.883-08:00Query string delimiters<p class="MsoNormal">Don’t use:</p> <p class="MsoNormal"><span style="mso-spacerun:yes"> </span>“#”</p> <p class="MsoNormal">Use:</p> <p class="MsoNormal">Anything else</p> <p class="MsoNormal">Reason:</p> <p class="MsoNormal">Browsers interpret # as end of string </p> <p class="MsoNormal">Don’t take my word for it test it in the address bar <a href="http://www.google.com/#www.microsoft.com">http://www.google.com#www.microsoft.com</a></p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0tag:blogger.com,1999:blog-1789878216745543897.post-77646435609510664702011-01-14T12:41:00.000-08:002011-01-14T12:53:52.126-08:00display/hide divs<p class="MsoNormal">To display/hide runat="server" divs in asp.net<b><o:p></o:p></b></p> <p class="MsoNormal">Don’t use:</p> <p class="MsoNormal">Divid.visible = true/false</p> <p class="MsoNormal">Use:</p> <p class="MsoNormal">Divid.attributes[“style”] = “display:none”/ “display:block”</p> <p class="MsoNormal">Reason:</p> <p class="MsoNormal">You won’t have javascript access to those divs</p>Paraschiv Andreihttp://www.blogger.com/profile/18109912331169645081noreply@blogger.com0