Tuesday, November 30, 2010

Jajuk

Writing a music player for the lazy is a overwhelmingly tough job to do. The lazy people expect all kinds of automatic stuff. Play by Genre, ambience, mood, ratings. Play with untagged songs nicely instead of dumping them under one "Unknown" Label. Automatic tagging of songs whenever possible. Automatically fetch Lyrics, cover, wiki information. Ability to detect which songs the user doesn't like and play them lesser and lesser. File-tree view, album-tree view, artist-tree view. Automatic purging of duplicates. Music discovery via similar artists or people-who-liked-this-also-liked-this. Internet Radio for the ones who are bored of their collection. Some ego-boosting features for music junkies like all kinds of graphs and statistics of your collection and the Genre you like the most.

Overall, I have wayy too many expectations out of a music player. It is more like a music manager for me. Till date I hadn't gotten over my first love Amarok-1.4 completely. I didn't like the direction Amarok-1.9 and 2.0 took, so I dumpled Amarok long ago. iTunes and all its Linux clones are disappointing altogether too, so do NOT mention them. Finally today I came across this Java-based player called "Jajuk". I have to say it the most I liked since Amarok-1.4. It has brilliant features, amazingly intuitive interface which somehow hides all the complexity behind the complex features it provides. There is just an endless list of features and they actually work! It is my new default music player.

There are some catches though ->
1) Being Java-based it is heavy. Once I got used to it though, it is not a problem.
2) Global hotkeys are not supported. Now this is a *must* have. I have filed a feature request. Lets see whether it gets footage :-D.

Give Jajuk a try. It is good.
Find it here: http://jajuk.info/index.php/Main_Page

Monday, November 22, 2010

Traffic Shaping

So today was a rather weird day and I had a searing pain in my head. So I decided on a special masochistic thing to learn : traffic shaping. Since I am working on a project that needs this exclusively, I reckoned it will be helpful too.

I am sure most of you must have come across this term. In case you haven't, the wiki page here provides a good introduction. Traffic shaping is mostly used by ISPs and people who manage routers. It is a way to give priority to certain users or certain types of data. The term "Giving priority" hugely underestimates the complexity of the task. eg. Real time data streaming may value real-time data delivery whereas others might tolerate some delay. In general, when traffic shaping goes near QoS, it becomes a difficult problem to solve.

However, here I am going to rant about the ISP aspect of traffic shaping (i.e. allocating bandwidth per user, per service, etc). "tc" is a very powerful command in Linux to implement traffic shaping. I won't be giving sample commands or such because tc is quite complex. Instead (if anyone ever ends up using this post, which I strongly suspect is going to be my future self) I will give out links to tutorials I used for my setup.

Some introduction to tc is in line. With tc you can create queues per device. Multiple queues. You can choose which queueing discipline you are going to use. Queueing discipline says how the queue is managed. You can have a standard FIFO queue or a Token-Bucket filter (neatly manages bandwidth difference between two lines) or Stochastic Fair Queueing (ensures fairness). Queues can be defined as hierarchy and they can inherit each other's properties. Eg. I might define a queue for www traffic which takes up 40% of the bandwidth and 60% of everything else. Now, I can define another queue for ssh which is a child of the "other" queue (it basically means that the ssh queue can borrow bandwidth from its parent and go beyond its limit to occupy upto 60% of the bandwidth. This way you guarantee minimum service). On routers, each queue can represent a customer or a chunk of customers. High paying customers get more share of the bandwidth. After we are done defining this classes of traffic, next step is to define rules for saying which traffic belongs to which class. tc provides a wide variety of filters for the purpose. You can look at almost any of the TCP or IP headers, which interface the packet comes from, etc. For all purposes, this set of filters proves to be sufficient.

Now that we know how tc works, lets see what all tc can do:
1) Provide only a defined amount of bandwidth to a particular user (write a filter based on IP address)
2) Provide a defined amount of bandwidth to a particular service (write a filter based on port number)
3) Enable flexibility. i.e. if there is a burst of traffic in one class accommodate the burst instead of providing hard boundaries. (burst and cburst parameters control that)
4) Rates can be specified in percentage of total as well as actual absolute values like 8kbits, etc.

An addition to tc can do more wonderful stuff. It is called "netem" which stands for network emulator. netem has tunable knobs for any parameter you can conceive which can help you simulate a WAN. Here is what we tried today:
1) Simulate delays. If you are simulating a WAN at home, the major issue is that delays don't get simulated. With tc combined with netem, you can introduce fake delays. You can even vary delays about a point. eg. You can say probabilistically change delay at 100ms +- 10ms. You can even change the probability distribution by which it randomises the delay.
2) Simulate losses. When in WAN loses are inevitable. Either due to network congestion or corruption. netem with tc can simulate both. You can simulate percentage losses, losses in bursts, losses following a specific probability distribution, losses following a particular pattern. Anything under the sun.
3) Packet duplication, packet corruption, packet reordering.
4) Introducing a latency is only a defined type of traffic (to test QoS)

Anyone who knows a little about networks realises how difficult each one of this is to implement. Tuning all these knobs appropriately can give you an awesome simulation of the internet. I am working on a project which needs me to simulate network congestion at home. With a line-speed router and a speed of 100mbps, it is really tough to simulate congestion. We used tc command exclusively (with a lot of actual studies which has derived the loss and corruption values) to successfully simulate the internet at home. We now a get a very good TCP congestion window graph. The things that tc can do are pretty amazing. Unfortunately tc is a complex command and needs a lot of knowledge first to start using it. Fret not. Here are links to some tutorials that can get you going:

The classic TLDP tutorial
HTB tutorial : Contains some excellent explanations and practical examples with commands.
netem Introduction

Learning tc is a process and the end result is pretty satisfying. \m/ to the author of tc.

Tuesday, October 26, 2010

gdb-heap

The newer gdb (gdb 7 and above) is supporting memory debugging.
Quoting from http://fedoraproject.org/wiki/Features/MemoryDebuggingTools

"The new "gdb-heap" package adds a new "heap" command to /usr/bin/gdb.

The command allows you to get a breakdown of how that process is using
dynamic memory.

It allows for unplanned memory usage debugging: if a process
unexpectedly starts using large amounts of memory you can attach to it
with gdb, and use the heap command to figure out where the memory is
going. You should also be able to use it on core dumps.

We believe this approach is entirely new, and is unique to Fedora 14. "


Sounds promising!

Saturday, October 2, 2010

Writing cache efficient programs

Just recently I discovered how much effect cache misses can have on the running time of your program! Its truly amazing how much savings you can make, yet I had ignored it for more than 4 years now.

Consider the matrix multiplication program again


for (i = 0; i < MAX_NUM; i++) {
     for (j = 0; j < MAX_NUM; j++) {
          for (k = 0; k < MAX_NUM; k++)
               Y[i][j] += A[i][k] * B[k][j];
      }
}


Note that when you have a cache miss (depending on your cache properties), you'll fetch in 32 bytes of consecutive data from main memory. Now see the B[][] array in the program. We are accessing it column-wise. Now, once we access B[0][0], we fetch in B[0][0] tp B[0][7]. But, we only end-up using B[0][0]. Only 12.5% cache usage. REALLY bad!

When MAX_NUM = 1000, this code takes ~6.778ms to run.

Now, how can we optimise this for good cache usage. One method is to transpose B[][] matrix and use the transpose in the multiplication. (Transpose = Interchange rows and columns). This way, we'll access B[][] by its rows and improve cache usage.

New code will look something like this (For the moment, ignore the time taken to transpose the matrix. We'll come to it later):


for (i = 0; i < MAX_NUM; i++) {
     for (j = 0; j < MAX_NUM; j++) {
          for (k = 0; k < MAX_NUM; k++)
                Y[i][j] += A[i][k] * B[j][k];
      }
}


Notice how we access B[][] row-wise now. Our cache-efficient version takes 5.629ms for a 1000 dimension array.
So how much did we save? We saved 16.9% of execution time!! Isn't that cool?
17% savings is too much to ignore!!

Coming back to transpose of the matrix. There are multiple solutions to this problem:
1) Store it as the transpose right from the start. Thus, you don't have to do that operation before a multiplication. Neat solution.
2) There are cache-efficient ways to transpose a matrix which take less than 0.15ms for a 1000-element array. You still end-up with ~12% savings! I'll try to write a post on that someday too.

Ofcourse this is a very crude solution to use cache efficiently and can be optimised further to get even lower execution times. There is one called "Blocking factor" which I am still learning. I never thought so much about cache hits/misses when writing a program. For 17% savings, I certainly should!

Combined with my earlier post on parallelizing a program, you could effectively have double the cache size (assuming dual core) and hence more the savings!

Saturday, September 25, 2010

Geek Saturday (Rants)

Fixed three irritating problems on my Fedora-13 laptop.

1) When fading, mouse/keyboard input doesn't stop fading and you have to go enter the password to unlock the screen.
https://bugzilla.redhat.com/show_bug.cgi?id=612620
The latest xorg-x11-server update fixed the problem. Simple enough!

2) Inserting headphones in the audio jack doesn't mute the speakers.
https://bugzilla.redhat.com/show_bug.cgi?id=623632
After some grepping in /var/log/messages and google, found out that my kernel couldn't detect the model of my audio card. Simple fix was to add
<code>
options snd_hda_intel model=quanta
</code>
to /etc/modprobe.conf/dist-alsa.conf to force snd_hda_intel module to use the given model.

3) Flash videos don't work in fullscreen.
https://bugzilla.redhat.com/show_bug.cgi?id=608957
Simple fix was this:
<code>
mkdir -p /etc/adobe
echo "OverrideGPUValidation=1" > /etc/adobe/mms.cfg
</code>

There is just one irritating problem now. When I shutdown Fedora with windows open on various desktops, Fedora should remember the desktop number the window was open on. However, when I start back, all windows come back to desktop 1. hmm Couldn't figure out a solution to that yet :-(. But it was low priority anyway. All the major problems are now resolved :-). Peace!

Friday, September 24, 2010

On paranoia

This graduation thing is making me a paranoid. First I semantically obfuscated my email addresses everywhere and now I just turned off cookies on my browser after reading this: http://www.cookiecentral.com/. gmail and most sites which need some sort of a user identification ofcourse use cookies so you have to go selectively allow cookies for certain sites. But after reading how doubleclick.net uses cookies, I really don't mind the extra effort.

After my MS, I will probably be such a security paranoid as to even block ssh connections to my machine as Linus does :-(. Ignorance is such a bliss!

Wednesday, September 15, 2010

Writing parallel programs with C and gcc

I just found a cool gcc feature. gcc implements the OpenMP specification. OpenMP specification details how to write shared memory parallel programs in various languages. This got me interested and I took the best parallelisable example : The matrix multiplication. Matrix multiplication is an O(n^3) algorithm. The loop structure of a matrix multiplication program look like this:


for (i = 0; i < MAX_NUM; i++) {
     for (j = 0; j < MAX_NUM; j++) {
          for (k = 0; k < MAX_NUM; k++)
               Y[i][j] += A[i][k] * B[k][j];
      }
}


The i and j loops are parallel. This means that given a unique (i, j) combination, the computations in the loop body do not depend on any previously computed values for multiplication i.e. each matrix element can be computed independently. 

Lets get out hand dirty and make the i loop parallel. I have chosen size of matrix to be 1000X1000 so that we can see the speed-up clearly.

Now, all we have to do is, add one #pragma line before the i loop. Here is what the parallel code will look like


#pragma omp parallel for default(shared) private(i, j, k)
for (i = 0; i < MAX_NUM; i++) {
     for (j = 0; j < MAX_NUM; j++) {
          for (k = 0; k < MAX_NUM; k++)
               Y[i][j] += A[i][k] * B[k][j];
      }
}


Don't worry about the #pragma statement. These are actually quite simple to figure out and are a 5 minute job to learn from here: http://www.openmp.org/mp-documents/cspec20.pdf

Note that we haven't specified how many parallel tasks to fork. This is specified via an environment variable OMP_NUM_THREADS. This allows great flexibility in specifying how many threads to use without recompiling the program.

To enable OpenMP spec, use the "-fopenmp" switch with gcc:

$ gcc -o parallel -fopenmp parallel.c


Here are some test-runs which clearly show the speed-up:
Only one thread (serial program)

$ export OMP_NUM_THREADS=1
$ time ./parallel

real    0m13.633s
user    0m13.558s
sys    0m0.028s


Two threads

$ export OMP_NUM_THREADS=2
$ time ./parallel

real    0m8.144s
user    0m13.963s
sys    0m0.035s


Four threads

$ export OMP_NUM_THREADS=4
$ time ./parallel

real    0m7.960s
user    0m13.815s
sys    0m0.037s


Notice the speed-up?
With one thread, we have a execution time of 13.6s. With two threads it becomes 8.144s which is almost half. And with four threads it is  7.9s. Now what happened in the case of 4 threads? Didn't we expect 1/4th time? Well, mine is only a dual core PC, so the max speed-up for a CPU intensive task is when number of parallel tasks = number of cores/threads = 2.

You can see the running parallel threads with

$ ps -Haux | grep parallel
jitesh    7240 26.0  0.0  44536  1164 pts/0    Rl+  20:10   0:00 ./parallel
jitesh    7240 25.5  0.0  44536  1164 pts/0    Rl+  20:10   0:00 ./parallel
jitesh    7240 27.5  0.0  44536  1164 pts/0    Rl+  20:10   0:00 ./parallel
jitesh    7240 27.5  0.0  44536  1164 pts/0    Rl+  20:10   0:00 ./parallel


The "-H" switch displays the threads.


Cool, isn't it?
Thus, a parallel program with minimum effort and LOTS of time savings. We reduced the execution time of matrix multiplication to half with proper parallelism. On a 16-core machine with native support for threads, imagine how parallel programs will perform!

gcc and OpenMP FTW!

Note: If you are interested in learning OpenMP, see the spec links I have posted in the body of the blog.

Monday, September 6, 2010

A note on email address obfuscation

Email address obfuscation:
When you put your email address on the internet (eg.
jitesh@example.com), there is a chance that your address is mined for
spamming. Spam-bots crawl the internet day-and-night to mine email
addresses for spamming. Now, you can do something to prevent a spam
bot reading your email address. The key factor here is that an
automated BOT is going to scan the pages, so it is easy to fool it. Or
is it?

A very common form of obfuscation is to spell out the special
characters. eg. jitesh AT example DOT com. Note that this is infact a
very weak form of obfuscation. Because, you are simply changing the
syntax of writing your email address and hence, the bot-writer has to
just add one more grammar to his list of rules and he is done.
Ofcourse, bot-writers are not so stupid as to miss this simple change.

What you need is a semantic change which is impossible (or extremely
hard to be practical) for the bots to infer. But, humans can do it
easily. eg. I might obfuscate my address as : my-first-name AT example
DOT com
Note that: "my-first-name" here is a semantic change. Only a human can
infer that this is to be replaced by "jitesh". This is REAL
obfuscation.

So guys, if you have a non-gmail account with a sucky spam filter,
make your obfuscation stronger and do NOT underestimate spammers :-)

Wednesday, September 1, 2010

Loving LaTeX

I've been learning LaTeX for a while now and it is *awesome* (For the uninitiated LaTeX is a language used for typesetting. You can print all sorts of mathematical and scientific symbols using LaTeX). Here is the result of experiments. Pretty neat eh?

\m/

 
From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.os.minix
Subject: Free minix-like kernel sources for 386-AT
Message-ID: <1991Oct5.054106.4647@klaava.Helsinki.FI>
Date: 5 Oct 91 05:41:06 GMT
Organization: University of Helsinki


Do you pine for the nice days of minix-1.1, when men were men and wrote their own device drivers? Are you without a nice project and just dying to cut your teeth on a OS you can try to modify for your needs? Are you finding it frustrating when everything works on minix? No more all-nighters to get a nifty program working? Then this post might be just for you :-) 

As I mentioned a month(?) ago, I'm working on a free version of a minix-lookalike for AT-386 computers. It has finally reached the stage where it's even usable (though may not be depending on what you want), and I am willing to put out the sources for wider distribution. 

It is just version 0.02 (+1 (very small) patch already), but I've successfully run bash/gcc/gnu-make/gnu-sed/compress etc under it.

Saturday, May 8, 2010

Completing the Text Processing Arsenal

For months and months, I have been procrastinating to learn the last tool in my text processing arsenal: awk. After a particularly uneventful afternoon, I decided today was the day. Sleeves rolled-up, chips by the table, all sources of interference disposed off, I sat down and finally put the last piece of the jigsaw in place. Today I rant about my text processing arsenal.

Here is a list of text processing commands/scripts available at your disposal in Linux and how you can combine them to serve your purpose.

1) The very basics: echo, cat, less (Useful to look through large logs or files that don't fit into one screen), more (less is really better), head (Prints first few lines of a file. Can be a very good help in scripts), tail (with -f switch, can be useful in watching logs real-time as they are generated)

2) Some more basics: grep. grep is oxygen. I use it all the time! I suggest you really go through that linked tutorial and thoroughly learn grep. It has some nice tools in its toolbox (like umm, the -v switch which inverts the regular expression, etc). I have also written a tutorial on grep earlier.

3) sort, uniq: I find both of them terribly useful for handling lists and looping on the same action for a list.
eg.
<code>
# cat package_list | grep "^java-*" | sort | uniq | xargs -n 1 build_package.sh
</code>
This command takes a list of packages, searches for all java packages, eliminated duplicates and schedules a build for each,

4) All the above commands mainly encompass reading or searching through already available data. What about removing/selecting only parts of sentences or translating one character to another? (like small-case characters to upper case, etc). Not to worry. cut, tr to the rescue. Cut can remove parts of a line (by defining delimiters and selecting the part) and I use tr mostly to convert lower to upper case or squeeze whitespace. More on that here.

5) Getting your hands really dirty. All of the above commands really enhance your ability to mine data from a file. grep-ping interesting lines using grep, selecting only part of them using cut, making it presentable using tr, etc etc. What about actual editing in shell-scripts? Something like find and replace? Well, *lightning and thunder* sed is here! sed (which stands for stream editor) has saved me a lot of time. The very motif of sed is search and replace. And if offers VERY advanced searching patterns. You can specify regular expressions, search only a part of a file, replace first or all occurrences and a lot more. A VERY awesome tutorial is here. After grep, sed is the most useful tool I know.
eg. Converting fs_get to xfs_get in the entire source was never easier than this:
<code>
# find . -name "*.[cChH]" | xargs sed -i 's/fs_get/xfs_get/g`
</code>
Done!

6) All this is fine. But what about the advanced data mining and report generation tasks? awk is here. awk can process tables and columns of data. I majorly use awk to easily select and reformat text.
eg. If you observe your /var/log/messages file, the 5th field is always the process name. Suppose I wanted to find out how many times ntpd synced to the latest time. I can run
<code>
# cat /var/log/messages | awk '$5 ~ /^ntpd/' | grep "kernel time sync" | wc -l
</code>
Don't be intimidated by the awk syntax, its simple really. Just invest 10 mins is reading this tutorial here.

7) Lastly, the most insignificant command that I have never used in my life: join. Does exactly what a database join does.

So, this is my text processing arsenal. With echo, cat, grep, sed and awk I could also write a mini-database which will have the worst performance ever, but will work just fine nevertheless. Insert, Delete, Edit, Query, Join can all be implemented using just these commands. Among them, they cover almost all the text processing requirements.

Finally, I would like to add some more commands that I rarely use or are in my wishlist or can be useful for my readers.

8) dos2unix, unix2dos: DOS likes to write newline as "\r\n". UNIX prefers "\n". This can lead to really interesting issues in Makefiles, etc when files are written on Windows and executed on Linux. Above 2 utilities are used to convert from one format to another. iconv is a generic converter from hundreds to hundreds of formats.

9) gettext: This is in my wishlist, but I don't see myself learning to use gettext anywhere in the near future. Gettext is used to localize a program to your own language or a foreign language.


Thats all for now folks. Comments/Feedback appreciated.



Friday, February 19, 2010

MP3 tagger

After trying a host of automatic tagger tools : EasyTag, Kid3, pytagger, id3v2, iTunes itself, etc etc, I've found the perfect auto-tagging tool.

Picard by MusicBrainz. And the awesomeness is that it is GPL :-)

There is MusicBrainz integration available with Amarok, iTunes, rhythmbox, banshee, etc.. but none of the tools has as awesome and simple interface as Picard! The other tools, namely, easytag, kid3, pytagger and id3v2 use the CDDB database blindly. CDDB doesn't allow queries by track name. Only album name and artist. That is an inconvenience. Also, you cannot cluster multiple mp3 files together into one and query the database. Thus, each file has to be queried manually. More inconvenience.
Picard to the rescue! It has all those niceties + the awesome (open and free to download) MusicBrainz database to back it up and a very very intuitive and simple UI!

Hail Picard!

Tuesday, January 19, 2010

UBI / UBIFS on NANDSIM simulator

NAND simulator (NANDSIM) is an extremely useful debugging and development tool which simulates NAND flashes in RAM or a file.NANDSIM can emulate various errors and report wear statistics, which is extremely useful when testing how flash software handles errors.

Steps involved in the working of NANDSIM :

1 ) We work as a super user.
aruna@narsil:~$ sudo su
[sudo] password for aruna:

2) Now we mount the NANDSIM module.That is,we create a virtual raw flash device.
The parameters are as follows :
first_id_byte : The first byte returned by NAND Flash 'read ID' command (manufacturer ID)

second_id_byte : The second byte returned by NAND Flash 'read ID' command (chip ID).The entire table of chip ids is given in the file nand_ids.c in the source code given here.

third_id_byte and fourth_id_byte are optional parameters which are initialized to 0xff (empty) by the system if not specified by the user.They are the third and fourth ID returned by the READ ID command.

By default NANDSIM uses RAM but if you do not have enough RAM, you can make it emulate the flash on top of a file using the cache_file nandsim module parameter.

We create a 256MiB emulated NAND flash with 2KiB NAND page size :

root@narsil:/home/aruna# modprobe nandsim first_id_byte=0x20 second_id_byte=0xaa third_id_byte=0x00 fourth_id_byte=0x15

root@narsil:/home/aruna# cat /proc/mtd
dev: size erasesize name
mtd0: 10000000 00020000 "NAND simulator partition 0"

3)MTD is not LDM-enabled and udev does not create device MTD device nodes automatically, so create /dev/mtd0

root@narsil:/home/aruna# mknod /dev/mtd0 c 90 0

4) Now we attach UBI to our MTD device. (This can also be done by the ubiattach utility)

root@narsil:/home/aruna# modprobe ubi mtd=0

5)We create a volume on our newly mounted UBI.

root@narsil:/home/aruna# ubimkvol /dev/ubi0 -N myvolume -s 200MiB

Here we have ,
Volume ID 0, size 1626 LEBs (209793024 bytes, 200.1 MiB), LEB size 129024 bytes (126.0 KiB), dynamic, name "myvolume", alignment 1

root@narsil:/home/aruna# cat /proc/mtd
dev: size erasesize name
mtd0: 10000000 00020000 "NAND simulator partition 0"
mtd1: 0c813000 0001f800 "myvolume"
Here we note that the erasesize is the size of the physical eraseblock on mtd0 (raw flash) which is 128Kb but the size on mtd1 is the size of the logical eraseblock.For each PEB,space is required to store 2 headers - one to keep track of the erase counters and the other for the logical-physical mapping.This space is not included in the LEB.Hence the size of LEB is 126Kb.

6)We mount ubifs on our UBI volume.And copy a file to it.

root@narsil:/home/aruna# mkdir /mnt/ubifs
root@narsil:/home/aruna# mount -t ubifs ubi0:myvolume /mnt/ubifs
root@narsil:/home/aruna# cp /home/aruna/5.l /mnt/ubifs

7) Now we wish to create an image of our data so that we need not have to mount an empty device the next time we use our flash.

This can be done in two steps : Firstly we use a tool called mkfs.ubifs which creates a ubifs image of our data.We can stop here and then for next use we again create an mtd device , mount ubi on it,create a ubi volume and then use this image to recover our data.

The second step is to use the ubinize tool on the image created by the mkfs.ubifs tool.This creates ubi image (which includes data about ubi volumes ).In this case,on the next mount ,we create an mtd device and then use the image created
by ubinize.After this we mount ubi and ubifs. (This is a better method as we do not have to mount the empty ubi every time.)

Both,mkfs.ubifs and ubinize are mtd-utils.Hence we have to sort out the dependencies before using them.

8) In Fedora install zlib-devel, lzo-devel, and e2fsprogs-devel packages and in Debian install zlib1g-dev, liblzo2-dev and uuid-dev packages.

9) The git repository of mtd-utils is available at git://git.infradead.org/mtd-utils.git
We use :
root@narsil:/home/aruna#git clone git://git.infradead.org/mtd-utils.git

10) The clone command creates a new directory named mtd-utils.We cd into this directory.Here we find another directory called ubi-utils.In this directory,

root@narsil:/home/aruna/mtd-utils/ubi-utils# make install

This is to be done before step 11 else lubi library is not available for mkfs.ubifs

11)Now back in mtd-utils,

root@narsil:/home/aruna/mtd-utils#make install

12) Now that our utilities are properly installed , we create the ubifs image first.In the folder mkfs.ubifs :

root@narsil:/home/aruna/mtd-utils/mkfs.ubifs# ./mkfs.ubifs -r /mnt/ubifs -m 2048 -e 129024 -c 2047 -o ubifs.img

where,
/mnt/ubifs is data whose image is to be created
-m 2048 indicates that the minimum input/output unit size of the flash this UBIFS image is created for is 2048 bytes (NAND page in this case)
-e 129024 is the logical eraseblock size of the UBI volume this image is created for
-c 2047 specifies maximum file-system size in logical eraseblocks; this means that it will be possible to use the resulting file-system on volumes up to this size (less or equivalent) so in this particular case, the resulting FS may be put on volumes up to about 251MiB (129024 multiplied by 2047)

13) Copy the created file ubifs.img to the ubi-utils folder.Then create another file called ubinize.cfg

root@narsil:/home/aruna/mtd-utils/ubi-utils# cat ubinize.cfg
[ubifs]
mode=ubi
image=ubifs.img
vol_id=0
vol_size=200MiB
vol_type=dynamic
vol_name=myvolume
vol_flags=autoresize

Now we create the ubi image :

root@narsil:/home/aruna/mtd-utils/ubi-utils# ubinize -o ubi.img -m 2048 -p 128KiB -s 512 ubinize.cfg

where
-p 128KiB is the physical eraseblock size of the flash chip the UBI image is created for is 128KiB (128 * 1024 bytes)
-s 512 indicates that the flash supports sub-pages and sub-page size is 512 bytes

Now we have the ubi.img image saved on persistent storage.

14) We can unmount the file system and our virtual device.

15) The next time we need to use the device ,we follow only these steps :

a)Create the device

aruna@narsil:~$ sudo su
[sudo] password for aruna:
root@narsil:/home/aruna# modprobe nandsim first_id_byte=0x20 second_id_byte=0xaa third_id_byte=0x00 fourth_id_byte=0x15
root@narsil:/home/aruna# mknod /dev/mtd0 c 90 0

b) Load image to it :

root@narsil:/home/aruna/mtd-utils/ubi-utils# dd if=ubi.img of=/dev/mtd0 bs=20481024+0 records in
1024+0 records out
2097152 bytes (2.1 MB) copied, 0.0220489 s, 95.1 MB/s

c)Mount UBI to it :

root@narsil:/home/aruna/mtd-utils/ubi-utils# modprobe ubi mtd=0
root@narsil:/home/aruna/mtd-utils/ubi-utils# cat /proc/mtd
dev: size erasesize name
mtd0: 10000000 00020000 "NAND simulator partition 0"
mtd1: 0f90c000 0001f800 "myvolume"

d)Mount UBIFS to our volume :

root@narsil:/home/aruna/mtd-utils/ubi-utils# mount -t ubifs ubi0:myvolume /mnt/ubifs

We can check the file we pasted here before creating the image by:

root@narsil:/home/aruna/mtd-utils/ubi-utils# ls /mnt/ubifs
5.l


Happy reading ! :)

git-perforce interface

The best thing I found out and learned today was the git interface for perforce repositories. There are many things about perforce that made me cringe.
1) No support for local commits. Limited support is provided by distinct changelists, but it only works if a non-intersecting set of files are being modified in both commits. It is rarely the case. This usually means you cannot commit small logical chunks of code and sometimes have to submit hugeeeee code-changes (which is not recommended)
2) No rollbacks or editing commits.
3) Limited branching support (Like I don't have permission to branch on the repository. But, with git I can have infinite local branches)
4) Send out patches that others can apply. (Perforce patches cannot be applied)
5) Manually check-out each file I am going to edit. This is a major pain.

So, I set out looking for some kind of git-perforce inter-conversion software and found that git itself provides the required scripts. My internet is screwed, so I cannot provide a full link as of now. but, you'd find the git's git repository at git-scm.com. Goto contrib/fast-import. Under that where you'll find both the documentation and the script. Unfortunately, fedora decided not to package it.

With that in place, I was able to use my perforce repository as a git repository. Commit local changes. Edit commits. Send out patches for each commit separately and all the cool stuff git has to offer. (Note that perforce repository can be seen as remote branch p4/master) git-p4 assigns a changelist id to each commit, thus mapping perforce commits to git commits. At last, just call "git-p4 rebase" (to accept latest changes) and then "git-p4 submit" to submit all changes to the perforce repository.

Another hassle I got rid of was to manually check-out every file I want to edit in perforce. With git, its not necessary.

I sign-off as a happy user!

Update:
Here is the link I was talking about.
http://repo.or.cz/w/git.git/tree/ff6d26a0e1d8fad775010fa3b689c0c027da8bb0:/contrib/fast-import