Technical Outlet

Friday, October 26, 2012

[InterviewStreet Problem] Find Strings

Update
The mistake was in counting the number of children. I have now passed all but two test cases, which probably require that I implement the Suffix Tree using lists instead of arrays.
----------------------------------------------------------------------------------------------------------------------------

So I have been trying to solve problems under the "Strings" section in InterviewStreet. Find Strings is one of those questions and I am here to discuss a possible strategy to solve the problem. ~~I'd like to make it clear from the beginning that my own solution has not been accepted yet, but I am unable to see the flaw in the approach and am expecting someone to point it out.~~

So, the question requires us to find the number of unique substrings in a given set of strings. At first I thought as the question unfolds and since I use C++ for most problems, I was thinking in terms of the available data structures in STL. So, at first I thought of storing the unique substrings (of the order of the square of the length of the string) in a 'set'. The set takes care of the uniqueness, but the problem arises if I try to find the 'k-th' smallest string. This would take really long, for each query.

So, the things that led me to another solution were the following realizations :-

First solution takes too long inserting each string
Insertion of the substrings of a string like "aab", will require steps to insert each of {a, aa, aab, ab, b}, i.e. 5 different insertion steps. Here, I thought if I could simply mark {a,aa} as unique substrings as well when inserting aab
Searching for a string in a set (when checking for uniqueness) requires comparison of almost the entire string even if a similar string has already been compared with. For instance, in a set of {abc, abe}, I am checking if "abd" exists, then I would need to carry out 6 comparisons (between characters). This wastes the earlier learnt lesson that the first two characters are 'ab'.

These things led me to think in terms of tries and consequently I related that it is possible find substrings of a string in its Suffix Tree. A suffix tree is, basically, a trie of suffixes of a string. So, if I create one suffix tree as the set of the unique substrings, I have the following advantage :-

I have to insert only the m suffixes of a string of length m. This reduces the number of insertions to linear from quadratic in the previous algorithm.
While checking if the string to be inserted is already present in the set, I need only O(d) comparisons, where d is the length of the string to be inserted.
While checking for the k-th smallest string (lexicographically), I can make use of the information about the number of children of a node (not the number of leaves it leads). The number of children of a node can be related to the number of unique "substrings" (not suffixes, which is given by the number of leaves).

So, eventually, I ended up coding up a SuffixTree and the solution I submitted can be found below (or follow this link). ~~But again, this solution has been able to pass only one of the test cases. So, either there's some serious flaw with the approach or I've failed to code what I had in mind.~~

I'd be glad to hear suggestions for improvements in the algorithm or the code.

Tuesday, July 3, 2012

Converting a file descriptor to an I/O Stream in Linux [C++]

Recently, if one can call something that happened over 2 months ago 'recent', I decided to do a Networks assignment entirely in C++ and using streams. Why, you ask ? Well, the simplest answer is, "I was bored." I had done another network assignment in C, through which I had seen quite a lot of functions that are required to connect, send/receive messages, etc. So, this was something like a challenge.

And that's exactly what it turned out to be, a challenge. Four hours into the assignment, which was obviously just 2 days before the submission, I was struggling to understand how could I convert all the file descriptors that were being returned from various C based functions into C++ streams.

As it turns out, due to difference in uses and implications of file descriptors in different operating systems, there was no standard function to convert a File Descriptor to a stream. That was what I learnt after hours of scouring over StackOverflow and running into answers that explained "How to get the underlying FILE* from fstream?" until I finally found out a solution that works on G++.

This may definitely not be the best approach, but a hack.

I used it here and built upon it. Besides the problem of handling Unicode characters (which I understand was because I was using string and not wstring ), I did not face any problem with streams; so I would assume that the solution worked well.

If you are aware of a smarter, quicker and cleaner method of doing this, please share it in comments.

Monday, January 9, 2012

Alias a URL on Apache

Many a times I have needed to share a page using Apache, without copying the page into the '/var/www' folder. Discussed below is one of the solutions that I often use.

Lets say you have a folder '/home/user/images' that you would like to share over Apache but you might not want to put it over into '/var/www' maybe because you have other scripts performing on this folder or taking backup from time to time.

In such a scenario, Apache's mod_alias comes to our rescue. This module allows us to provide an alias URL for the required directory. To achieve the same, insert the following lines in the '/etc/apache2/httpd.conf'

Alias /image /ftp/pub/image

<Directory /ftp/pub/image> Order allow,deny Allow from all </Directory>

Care must be taken regarding the trailing slash ('/') in the directory's name. Among the options available, I suggest not using the trailing slash in any path (neither the alias, nor the actual path).

This is not the end to the process though. One important step is to ensure that the path is 'executable' by other users as well. Each of the parent directory should be also be executable by other users.

Use 'chmod 711' on each of the parent directories.

Voila, now you are done. Just restart apache and visit the URL

site-address/alias

Friday, January 6, 2012

Chrome becomes the most popular browser in India

TechRaga reported in September, "Chrome Beats Internet Explorer And Becomes #2 Browser in India". The latest statistics from StatCounter show that Chrome is now the most popular browser in India, closely followed by Firefox.

Source: StatCounter Global Stats - Browser Market Share

The above image depicts the share of browsers in the last 10 weeks, and it clearly shows that Firefox and Chrome are fighting for the top spot.

Despite the 3rd spot in India, IE enjoys the first spot worldwide, while Chrome stands at the 2nd spot after recently overtaking Firefox globally.

Source: StatCounter Global Stats - Browser Market Share

According to StatCounter, among various browser versions Chrome 15.0 has already taken over the global market followed by IE 8.0.

Source: StatCounter Global Stats - Browser Version Market Share

These browser wars remind us of the IE-Netscape war which was won by IE, but it looks like Open Source Browsers are becoming more popular, even among ordinary users.

Thursday, November 10, 2011

Lets grow with Facebook

Now, this is not some expert advice; just giving words to the thoughts of many. Many of us have realized, at some point of this year or another, that Facebook is not just another site, not anymore. With a user base of over 800 million, there's barely anyone left out of the social networking site.

So what does that mean to us ? Firstly, we must realize that all your dreams to start now and grow up to compete against Facebook, might be a little too far-fetched. I am, generally, not a pessimistic person, so I would like to add that former statement holds for most of the people out there planning to go up against Facebook; you might be an exception.

Secondly, and in my opinion more importantly, the large user base of Facebook gives others the idea to make use of the user base. We need to realize that the number of 'active' users on Facebook is a good indicator of 'active' internet users. These are the people any one wants to attract to their own products. But, if the number of people on facebook includes the people you want to target, why not target them directly ? Facebook now offers plethora of APIs giving developers a whole new world still left to be explored.

The point I am trying to convey should be clear from the following example. Say, you are a developer/entrepreneur with a idea that you think could change the way people think about 'something'. But again, your idea is heavily dependent on the input from your users; their participation is what drives your idea. In that case, you need a lot of users to visit your site. Mark Zuckerberg, on the other hand, is trying to make sure that people spend more time on Facebook and have less reasons to go outside. Given the resources, including the user base and money, they are expected to do a good job at that. I suggest, that we help him do this. How ? Why do you need a new site, when you can just create an application 'within' facebook. Looking at various apps such as The Guardian and Washington Post Social Reader, one can only begin to imagine the possibilities of such applications.

So what's the advantage ? For you, users; for users, the fact that they get the content without leaving facebook. At first, it might sound naive, but think about it and remember what they say,

"If you can't beat them, join them."