Friday, September 19, 2008

Pune Tech

Since coming back to Pune, I have been trying to get a hang of the tech scene in Pune. From the campus recruitments at IITB, my view is that Pune has a good tech ecology developing. We have had some startups working on exciting products coming to the campus. This just seems to be the right time to be in Pune.

Last week, I attended the monthly meet of the Pune Open Coffee Club (POCC). POCC is a group which seeks to promote startups, provide a platform for people with ideas to meet, seek advice, get in touch with VC, etc. and be noticed. Last week's meet was a good experience. There was a panel discussion on bootstrapping of startups with Anand Soman, Tarun Malaviya, and Shridhar Shukla and a showcase of a new startup in Pune. You can find a liveblog of the proceedings here.

A very useful resource to keep track of the tech scene in Pune is punetech.com. On the wiki, you can find links to various companies based in Pune and interest groups like the Pune Google Technology Users group. There is also a calendar, which lists all the tech events scheduled in and around Pune. All this is ofcourse community edited, so its for us techies in Pune to build on this initiative.

Saturday, March 22, 2008

About Fibinacci Numbers

Discovered a few new facts about Fibonacci numbers. All I knew about Fibonacci numbers what that it is the sequence 1,1,2,3,5,8,13,21.... , where the nth number was the sum of the (n-1)th and (n-2)the Fibonacci number. Well, I have always wondered what makes these numbers special. Turns out the nth Fibonacci number can be defined as the number of ways n can be represented as a sum of 1's and 2's. That gives a combinatorial interpretation to the Fibonacci numbers, and raises them from being outcomes of a mere uninteresting addition process. I got to know about this fact during a fascinating talk at IIT, Bombay by Prof. Manjul Bhargava from Princeton. (More about that talk later).

And that's not all about the Fibonacci numbers. It turns out that any number can be represented as the sum of Fibonacci numbers. That means Fibonacci numbers can serve as a base system. However, there could be more than one way of representing the same number as a sum pf Fibonacci number. Check this to know more.

To finish this post, Fibonacci numbers were first described by the Indian linguist Hemachandra about a 100 years before Fibonacci described them and probably by Pingala in 200 B.C. too.

Thursday, October 04, 2007

Information Retrieval Books

A list of good books on IR here:
http://researchonsearch.blogspot.com/2005/12/information-retrieval-textbooks.html

Wednesday, September 26, 2007

Compiling GIZA++

Compiling GIZA++ can be a pain, since the available source was compiled for an older version of gcc. With improved compliance provided by gcc to standard C++ in terms of template syntax and semantics, the old GIZA++ does not compile on gcc 4.2. With a lot of fixes, I was able to get GIZA++ working, but the word class creation tool mkcls proved a very tough nut to crack. Luckily, I found a link gcc 4.2 compiled source code here on the StatMT site. Hope that helps anybody looking for GIZA++ and mkcls.

Saturday, September 15, 2007

Batch conversion of doc files to text files

Its kind of irritating when you need text files to run language processing applications on and you have corpora in the form of word documents. Here is a way to convert those doc files to txt files without having to open the document editor and do a 'Save As' for each corpus document. Here come Open Office macros to the rescue. You can write up a macro to save the file as a text file. The macro can then be invoked from the commandline by starting up Open Office in invisible mode. And you can wrap all this in a nice shell script to do any filtering/cleaning after saving them as text files. A note of caution: I observed that OpenOffice saves the text files asynchronously, so you the file might not be available for processing by the following script lines. Better to pause for a while, while OOffice saves the document. You can find more about this nifty timesaver here:

http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html?page=2

Saturday, August 11, 2007

Attending Conferences

Here is an interesting post Conferences: Costs and Benefits. I have never been to a conference, but having sit through many a talk, it seems only a fraction of the time spent is really useful. But I should have an experience of atleast one conference before I comment. Undoubtedly, going around asking questions is not something you can do reading conference proceedings.

Saturday, July 07, 2007

You and your Research

A highly enlightening and inspiring talk by Richard Hamming about what it takes to do significant research, that I read again after a long time. Find it here.

Sunday, July 01, 2007

Writing code at compile time

How do you write a C program, which at compile time allows you to input another program at the terminal. When you run the 1st program, the code that you input at the terminal should execute.

Something like:

bash#gcc -o 1 1.c
#incude
int main(void)
{
printf("Let a thousand ideas bloom :)");
}
^D
bash#./1
Let a thousand ideas bloom :)


So write the code for 1.c. Hint: Assume gcc and any UNIX variant as OS.

As it turns out, the solution is a one liner.

#include "/dev/tty"

While compiling, the macroprocessor tries to open and read from the file /dev/tty, just as it would do for any other include like stdio.h. Since /dev/tty is the terminal, you can now input the second program at the terminal. It get compiled and Bingo! you execute code written at compile time.

Saturday, June 30, 2007

Where does science begin ?

Haven't you seen a lot of scientific work built on fundamental assumptions? A problem I have is in accepting unintuitive assumptions in a lot of research. Especially in a nascent field like Natural Language Processing, you find a lot of these. These are not like Euclidean axioms which seem pretty reasonable. Yet, research without such asssumptions seems to be an impossibility. The problem is then to determine the right set of axioms which serve as a basis to build the theory. Some insights I found in a Phd thesis:

Any philosophical system, any science has to start with assumptions, axioms which cannot be really proved or disproved, which are fundamentally arbitrary but hopefully convincing. In his Tractatus Logico-Philosophicus, Wittgenstein [1918] writes that the only true philosophy would be to utter proven scientific facts, to use nothing but defined symbols of a defined formalism – i.e. to renounce on metaphysics and thus on philosophy:

Die richtige Methode der Philosophie wäre eigentlich die: Nichts zu sagen, als was sich sagen lässt, also Sätze der Naturwissenschaft – also etwas, was mit Philosophie nichts zu tun hat–, und dann immer, wenn ein anderer etwas Metaphysisches sagen wollte, ihm nachzuweisen, dass er gewissen Zeichen in seinen Sätzen keine Bedeutung gegeben hat. ... (Wittgenstein 1918: 85, § 6.53)

Wittgenstein is aware that the problems with this suggestion are, however, that every definition necessitates a definition of the defining terms until we reach the unprovable maxims. If we refuse to accept these fundamental maxims, the cornerstones of meaning, we cannot state anything and are condemned to remain silent.

Wovon man nicht sprechen kann, darüber muss man schweigen. (Wittgenstein 1918: 85, § 7)

These maxims have transcendental, metaphysical quality, only they make any meaning possible and can thus instantiate our questions and answers in life.

Wir fühlen, dass, selbst wenn alle möglichen wissenschaftlichen Fragen beantwortet sind, unsere Lebensprobleme noch gar nicht berührt sind. Freilich bleibt dann eben keine Frage mehr; und eben dies ist die Antwort. (Wittgenstein 1918: 85, § 6.52)
Only the transcendental character of metaphysical philosophy can really give answers and assert meaning. If we only utter scientific proven facts we can only replace meaningless utterances with one another. E.g. in semantics we can step from language to metalanguage to meta-meta-etc.-langauge, but this does not bring us an inch closer to real meaning. On the other hand, because we cannot define the maxims we use, we remain incompetent about them nevertheless. Again, Wittgenstein’s famous quote applies:

Wovon man nicht sprechen kann, darüber muss man schweigen. (Wittgenstein 1918: 85, § 7)

We are therefore in principle disqualified from speaking, from stating anything meaningful or even “scientific”. If we accept a minimal set of maxims on which everybody agrees science seems to be possible nevertheless, as long as we can base everything on these maxims.


A New Beginning

For a long time this blog lived a dormant existence under the name 'Machine Learning Chronicle' . Over the last year, caught between baffling Gaussian tosses and intricate kernel machines which go by the fancy name of Support Vector Machines, I have not been able to write much. Ideas are not restricted to a genre or science and hence I thought of making this blog more broad-based and cover more topics in science and technology. So, here I rechristen this blog as 'Let a thousand ideas bloom!' - for ideas are the heart of science.

Over the last year, I have picked interests in information retrieval, natural language processing, cognitive sciences and the Web 2.0 phenomenon. In addition, programming, physics and space talk are always perennial favourites of mine. So that is what this place is about ... facts, thoughts and ideas.

So let me set the ball rolling ...

Wednesday, September 27, 2006

Tools on MSN adCenter labs

Found a set of interesting tools on the MSN AdCenter Labs site. They can provide you with insight into search trends in interesting ways like Google Trends. Some of the tools available are:

  • Content Categorization
  • Keyword Categorization
  • Demographic Prediction

Friday, August 11, 2006

My new blog ...

My interests primarily being in machine learning and data mining, a lot of my course are related to these fields. There is so much happening, so much to learn, to think about and to create. I surely warrants a space of its own to record intersting information, thoughts I happen to stumble upon. So, here I present my new blog, Machine Learning Chronicle for the purpose.