Saturday, June 27, 2009

XP Array

Physically storage disks are divide in raid group. There is a virtualiation from raid group to ldev. Ldev is logical unit of storage. Only ldevs are refrenced by users. When an ldev is referenced it is first searched in cache if not located then actual raid group where data which is referenced is located. In xp-array there is an array of disk in a frame also called DKC.All DKC serial numbers are unique. There are multiple ports in each disk through which user access the disks. Two ports are maintained by one processor. Disk arrays are great power consumer.

Friday, June 26, 2009

last.fm

It is most popular site for music database. It creates music profile of users by collecting information by it's audio player software which is compatible with most of the music players. On Linux amarok independently has functionality of submitting information about songs you played.
You just need to sign up and install the software. Profile building is completely autonomous as you play songs. Build profile is really cool. It also suggest songs you like. It shows your favorite artists and track. It can also prepare many kind of charts based on your taste. You can compare your music taste with your friends. The installed software also shows information about current playing track. It keeps track of all songs you have played and shows average tracks played by you weekly, monthly, yearly and overall.I am a great fan of it.

Audio Players

I have seen many people using video players to play songs regularly. I really don't understand it. It is ridiculous because music players are specifically made for listening songs and so they have special features related to audio and songs which you can't get in any video player. For example sorting your music library. Helping you search songs faster not only by name but also by other tags like genre, artist, album, date etc. Then there is building the music profile through last.fm. Last.fm is really cool site you just need to make your account and then start listening songs by music player remaining it will do. It will create a complete database of your listened songs based on different songs. You can compare your taste of music with your friends. You can see their preference of artist/songs.
There are a lot of free music player to choose from so if you don't like one try some other avoid using video players.
Audio players preference is more of person taste because basic criteria is fulfilled by almost all major ones e.g. speed, filetype support, rating songs, smart playlist, editing tags, searching based on tags, display based on tags, hotkeys, mini player mode.
Then there is some special benefit with specific players like genius(itunes), podcasts, light weight, simplicity of use, EQ.
Some best audio players which I prefer:

Itunes: My best audio player. It has everything I need.

Amarok: Second best. On linux I used only amarok. It is light weight(generally true for all linux products), has lot of features.

Winamp
It has a lot of very good gui themes. Third best.


Foobar
Light weight and quick. Low memory usage, Lesser interface options and features. Best EQ best sound quality.

Windows Media Player
There are many funny skins, default is best option and this time it is really good. This is not so common with Microsoft products, they try hard to make not so pretty/useful user interface.

B.S. Player
Lightweight, Good sound quality, lesser option. Also a good video player with lot of file support.

Tuesday, June 23, 2009

Installation on windows


Hadoop is made only for linux(:]). On windows it can be run through cygwin. You need to install Cygwin with ssh and sshd included. Installation of cygwin is weired you need to select open ssh specifically while installing cygwin. If you have cygwin installed you can check whether it contains open ssh or not by typing command ssh.
Then you will need jdk from sun. Put java folder in your c drive(or any other drive where you wish). I am specific about it because if the path to jdk folder has any intermediate folder whose name has space in it, then you need to use double quote or \[space] for that folder name.
Get an stable release of Hadoop and inside the conf/hadoop-env.sh uncomment tha java path as
export JAVA_HOME=path to jdk example if every thing is default is
export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk[version]
Configure hadoop-site.xml
The conf/hadoop-site.xml file is basically a properties file that lets you configure all sorts of HDFS and MapReduce parameters on a per-machine basis. You can just copy the XML below into your conf/hadoop-site.xml file.


After these change you need to run dos2unix command for file hadoop-env.sh otherwise you get some error which says it doesn't recognise the sequence '\r' or something of that sort. It is because the file has been changed to windows mode. If at any time you get error of this kind just do
dos2unix hadoop_folder/conf/*
dos2unix hadoop_folder/bin/*
Now you need to configure your ssh in cygwin
Hadoop uses SSH to allow the master computer(s) in a cluster to start and stop processes on the slave computers. So you will need to generate a public-private key pair for your user on each cluster machine and exchange each machine's public key with other machine in cluster.
To generate a key pair, open Cygwin and issue the following commands ($> is the command prompt):
$> ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
now to add the public key to each other machine you need to do ssh that machine and do ok when asked for adding the public key permanently other way is do

$> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now, you should be able to SSH into your local machine using the following command:
$> ssh localhost


To quit the SSH session and go back to your regular terminal, use:
$> exit

Now that you have public and private key pairs on each machine in your cluster, you need to share your public keys around to permit passwordless login from one machine to the other. Once a machine has a public key, it can safely authenticate a request from a remote machine that is encrypted using the private key that matches that public key.

If you have problem in connecting to localhost via ssh localhost command reason may be because of firewall is stopping the port 22 to be used by ssh.

Configure your hosts file: Not strictly required if your systems which run hadoop are static.
Open your Windows hosts file located at c:\windows\system32\drivers\etc\hosts and add the following lines
master IP address
slave IP address
It already has a line with which assigns IP address to localhost which is 127.0.0.1

You are done! To check it, just create a java project in eclipse(eclipse because you need to export it as jar file with all classes and dependencies). There is a program named WordCount.java inside hadoop release use this as your example project. Export the jar file of hadoop in eclipse project
In Eclipse, right-click on your project, go to Build Paths then Add External Archives. Browse to the hadoop folder and select the file hadoop-version-core.jar.
Now import the project as jar from eclipse (let us call the project jar WordCount.jar).


To start your cluster, make sure you’re in cygwin on the master and have changed to your hadoop installation directory. To fully start your cluster, you’ll need to start DFS first and then MapReduce.
Starting DFS

Issue the following command:

$> bin/start-dfs.sh


Running the job
Launch cygwin go to directory hadoop[version]
Put the WordCount.jar in your C drive now create a folder named input also create a long text file inside input folder name it input.txt.
Copy input files into HDFS
Make a directory in the Hadoop Distributed File System (dfs) for your input files.
To copy input data files into dfs from your home directory, do the following:

bin/hadoop dfs -copyFromLocal ../input .
.=current folder you may use some other location.

Finally, to run the job execute the following command

./bin/hadoop jar ../WordCount.jar WordCount ../input ../output

If you have not started hadoop or firewall is preventing the port to be used by hadoop you get error like already tried num time to connect or not able to connect, then do
bin/stop-dfs.sh
bin/start-dfs.sh
if problem persists it means port is not open.

This will place the result files in a directory called "output" in the dfs. You can then copy these files back to your CAC home directory by executing the following(same as what you have done for input file):
bin/hadoop dfs -copyToLocal output output

Note, that one output file is produced for each reduce job you run. The WordCount example uses the system-configured limit of the number of reduce jobs, so do not be surprised to see 10-20 output files (the exact number depends on the number of cluster nodes running and their configuration). You can control this limit programatically via the setNumReduceTasks() method of the JobConf class in the hadoop API. Refer to the map reduce tutorial for more details on running map reduce jobs.

When you are finished with the output files, you should delete the output directory. Hadoop will not automatically do this for you, and it will throw an error if you run it while there is an old output directory. To do this, execute:

/usr/local/hadoop/bin/hadoop dfs -rmr output
Retrieve the results
The results have been written to a new folder called output on your c drive. There should be one file, named part-00000 which lists all the words on this webpage, along with their occurrence count. Note, that before running hadoop again you will need to delete the entire output folder, since hadoop will not do this for you.


To stop MapReduce, issue the following command on the master:

$>bin/stop-mapred.sh


To stop DFS, issue the following command on the master:

$> bin/stop-dfs.sh

To stop all services related to hadoop do
$> bin/stop-all.sh

Problems I faced
First is related to files changed in dos format so need to use dos2unix command. You get error like command not found 'r'
Second problem is can't connect to localhost:port_num already tried (num) time.
After running hadoop successfully first time I again get this problem next day. It again consumes a lot of time because ssh localhost is working fine. I have also tried the exact IP address in conf/hadoop-site.xml so loopback is prevented but it doesn't solve the problem. I don't know how I get away from that problem exactly because I was just playing around when it starts working again. In between I have reformatted the dfs.
$ bin/hadoop namenode -format

Monday, June 22, 2009

Entering Command Line Arguments in Netbeans

All about Eclipse

Where is my project explorer window
If you have closed it then you can view it again through window->Show View->Project Explorer. There are many other view options also available here like console, javadoc, search etc.

Relative path in Eclipse
/src is not home folder for project in eclipse. Eclipse see the top level directory of a project as home directory so, if you want to use some input file in project put it in main project folder and then only you can use file name simply to access it in your code.If you want to put the input file in /src then specify /src/filename.
Export tool
If you have a large (high resource consuming) project to run then this tool is very helpful. By file->Export choose Runnable jar file choose a destination name to store it click finish and you are done. You will gate a runnable jar file which can run on any system which has jvm. just do
java -jar filename.jar
or on linux just do
chmod 744 filename.jar
./filename.jar
jar file will start executing. If you are using some other file as input then folder containing jar file will be home folder for program so can put other files relative to this home folder.This is java compiler related feature so you can create jar file from command line also same as javadoc feature.

Sunday, June 21, 2009

Hadoop

A great open source project by Apache. Using MapReduce it divides the resource consuming project to multiple system. It has a tracker called master and slaves. Tracker divides and assigns the job. It uses Hadoop distributed filesystem to distribute the data and then process them in parallel. Yahoo is biggest user and also largest contributor of Hadoop. Other users include face book, Amazon, last.fm, IBM, NY times, Veoh, joost.

Netbeans

Developed and maintained by Sun micro system. It is for C, C++, java and ruby.

Thursday, June 18, 2009

Eclipse

2 years back I hate using IDE platforms reason being, I don't like suggestions it start giving as you type and also the showing error in an incomplete line or method. Then there is package thing and running from main class which I don't understand, Actually I try to open already created files. So what I think is IDE are not for beginners who most of the time do copy-paste or write randomly in same file.
Although Vi is best editor but sometime using some IDE really helps you. Eclipse is an open source IDE initiated by IBM but now it is independent and has good developers community of it's own.Lot of applications has plugins for eclipse I have worked on hadoop and flex plugins. It is also lighter than Netbeans. I am fan of the export tool using this you can export a runnable jar file which you can run on any other system which is helpful when you are doing a project which need a lot of resources. I am no where proficient in using it so sometime get confused actually it has lot of things to help you. I have used it only for java but it is also for perl, python, C and C++.

Friday, June 12, 2009

cool shortcuts

One way access programs faster is to include all highly used program executable in PATH. You can do it easily by going to environment variable inside my computer's property under advance tab. After this you just do win + r (shortcut for run) and type the name of executable which generally is name of program hit enter.

Alt + Space bar for main menu of a window(use to minimize/maximize/close/move/re size)
Alt + Esc to switch between open applications
Win + Pause/break for opening System's property
Win + F to search in computer
Win + E open explorer window
Alt + Enter show property of selected item in explorer
F4 in explorer displays the address bar list
F6 for cycle through all items on screen(eg from quick lunch to task bar)
Shift + F10 displays right click option
Ctrl+Shift+Esc for opening task manager
Shift + Tab move opposite of Tab
Spacebar In filling forms used to check the box.
---------------------
win + l lock computer
F2 for renaming file/folder
Alt + underlined character in menu items( If you see carefully there are underlined character in each menu item)
Win + d Show desktop
Win + m minimize all windows
Win +Shift + m Restore the minimized windows
Ctrl + N Start another explorer/browser with same address in Mozilla it is just another instance(not with same address)
The lesser you use mouse the better will be your speed. Cool Shortcuts looks impressive to use.

Thursday, June 11, 2009

Win-XP

I really hate using windows. Is it Viruses or its speed? May be I am biased. One reason is expectation since it get so much from users. When something goes wrong on windows it really get on my nerves, each second I lost becomes hour. On Linux things are different because I enjoy experimenting things on it. Things always take time here like I have seen people spending days for installing some software on Linux ending up learn to live without it.
Windows is necessity because of game. My worst experiences on Windows are setting the PATH variable, deleting file which operating system claims to be used by something even just after restart it says so, then comes the speed if you are not formatting/fragmenting the disk or cleaning registry it really becomes slow after 5 to 6 months of use then searching, starting and shutting down takes a lot of time. Actually Windows help you here so it keep on crashing on regular intervals.
Once I have spent 2 hours on running a java program which uses oracle jdbc driver, I have set it in path variable but not restarted the dos window(my fault :P).
Sometime it decides not to end a program it doesn't matter what utility you use. Then comes problem while deleting sometime also copying files/folders which it claims to been used damm .. I have shut all processes except system critical processes still it is been used. Once a friend comes to me with a folder on his desktop which he is not able to delete because it says the path does not exists. The folder is empty and it is on his desktop. We tried some third party software to delete it which after restarting the computer says the same thing(no folder with this path).. Finally he has to reinstall Vista.
Then there is the amazing so called user friendly user-interface! and icons in shell32. With os comes ridicules themes which consumes reasonable resources the old classic theme is still on top. If you are a mac user it seems like ages old interface.Microsoft is successful only because people are stuck with it and some good software specially games run on their os only.