Wednesday, December 3, 2008

Diff and Patch

Patches show the difference between two versions of the same file.
A patch shows which lines were newly added and which lines were removed. What makes a patch very useful is that a patch can be applied to the old version of the file to obtain a new version.
i.e. (old version) X (Patch) = (new version)

This fact makes it easy to manage a community software where a software is open-source and anyone can submit changes. Trust me on this, without patches we would be nowhere.
For eg, Andrew Morton (Linux kernel maintainer) gets hundreds of patches everyday. In the absence of patches, he would have to integrate each and every change manually which would be a VERY time consuming and an inaccurate process. Patches automise this job.

"Diff" is used to create patches and "patch" is used to apply it.

>> diff [options] [original version] [modified version]
Generally, I used the diff command with "-du" options.
eg. diff -du old_test.c new_test.c > test.patch

Note that diff produces output on the stdout and you'll have to redirect it to a file.
-d : Use an algorithm which will produce smaller set of differences
-u : Use unified output format
(-u causes newly added lines to be prefixed by "+" and removed lines prefixed by "-". Otherwise, other prefixes are used which people generally dislike :)

-r : recursive.
Now this is a beauty. You can find differences between all files under a directory. So, if you have changed multiple files, use this option to generate only ONE patch and _not_ a patch for each file separately.

-N : Include new files
With -r, it is possible that you might have added a new file in the newer version of the software which wasn't there in the old version. Diff by default ignores such files. Use this option to change this default behaviour.

Random notes on patches:
1) In patch files, you will find something called hunks. Hunks are a set of changes alongwith some context. By default, 5 lines above the change and 5 lines below the change constitute the context. View a patch file in emacs/vim and you'll know what I am talking about. Context is a very important part of the patch file. I'll come back to it when I talk about applying patches.

2) The beauty of patch files is that you can actually MODIFY them. Well, it won't seem to be a big deal if you havn't used patch much. But, it is a quite handy trick. You can remove entire hunks and the patch is still valid!
So, play around with it a bit.

3) Many people and inherently afraid of patches. Trust me they'll have a very hard time when they work in a group on a single project. Patches are a great way to review changes that have been made or pass them on to others. Without patches, I cannot concieve a way I'll pass on my changes to others!
Patches are very easy to use. Just try and get comfortable with them. Ask around if you think you are stuck.

Now that we know how to generate patches, it is time to move ahead and learn how to apply these patches.
But, I think this post is long enough. So, I'll write about it in the next post.

(Do leave a comment and let me know if I need to add/remove/modify anything to make the post lucid and easily understandable yet comprehensive)



After reading this one I seriously think its not too difficult now...
Any new thing (new to me) related to the Linux kernel sends shivers down my spine....
I havent used a debugger yet for the same reason ...and dont code until I'm convinced tht the design is robust and it'll work... (Older guys wud say, this is the actual way to do it..however i do it out of the fear of debugging :P )

@$%deja vu$% said...

many people in the Linux community frown upon debuggers. They think that if you need a debugger to know what is wrong with your program, you don't really know what you are doing!.. and that is dangerous!
So, they prefer the paper-pencil approach... it is good in a way, that it gives you a thorough understanding of what is exactly happening behind the scenes..

anyway.. more on debuggers in another post :)

Vedang said...

me, I've never used a debugger. I find it a better exercise to use paper/pencil, and it helps me clearly understand what I have to do. Debuggers somehow seems like cheating.

great patch post jitesh. remember the crazy trouble we had with the diff command? the man page of diff uses the words source and target instead of original file and modified file. Confusing as hell!

@$%deja vu$% said...

arre.. apan revision control system pan vaprayla pahije hoti (@kedar.. vaparli hoti actually.. just sangtoy tyala :P) ... bhaari asta...

look-out for a post on git... Linux ne swatah lihileli revision control system ahe.. yaaaaad a...

Pokerguys said...

Updating a program is usually a good idea,but in some cases its not good. That is what we're here for! There have been several instances where a new release of a program is not always good.reason is several factors.

visit may be usefull for all