Wednesday, January 5, 2011

Using diff and patch

SkyHi @ Wednesday, January 05, 2011
The diff and patch utilites can be intimidating to the newcomer, but they are not all that difficult to use, even for the non-programmer. If you are at all familiar with makefiles, you might find yourself frequently wanting to patch a file, either to correct an error that you've found or to add something that you need to the makefile. After I began using the mrxvt terminal, I wanted to give it Japanese capability. My main O/S is FreeBSD. It manages packages with its ports and package system. To install a package from a port, one uses the port's Makefile, which will download and compile the souce code, in a manner familiar to those who use Gentoo's portage (which was inspired by FreeBSD ports) or ArchLinux's makepkg.
In this case, I wanted to edit the port's Makefile to enable Japanese support.
To do this, I simply had to add a line to the Makefile.
CONFIGURE_ARGS+=   --enable-xim --enable-cjk --with-encoding=eucj 
Ok, this is simple. However, after doing this, I thought that perhaps I should submit a patch to the port's maintainer, giving others the opportunity to include Japanese support. This was a little more complicated, because the change to the Makefile meant that I should include a message when they installed the port, telling them what to do if they wished to include Japanese capability.
So, I had to add the following lines, in their proper place in the port's Makefile.
.if defined(WITH_JAPANESE)
 CONFIGURE_ARGS+=   --enable-xim --enable-cjk --with-encoding=eucj 
 .endif # WITH_JAPANESE
 
 pre-everything::
  @${ECHO_MSG} "=========================================>"
 @${ECHO_MSG} "For Japanese support use make -DWITH_JAPANESE install"
  @${ECHO_MSG} "=========================================>"
I created my new Makefile, and then, using the diff command, created a patch.
diff -uN Makefile Makefile.new > patch.Makefile
The diff command has various flags. Simply doing a diff between two files shows something like (I'm just showing a few lines here)
5c5
< # $FreeBSD: /repoman/r/pcvs/ports/x11/mrxvt/Makefile,v 1.7 2005/07/22 22:38:58 pav Exp $
---
> # $FreeBSD: ports/x11/mrxvt/Makefile,v 1.7 2005/07/22 22:38:58 pav Exp $
22a23,26
> .if defined(WITH_JAPANESE)
> CONFIGURE_ARGS+=   --enable-xim --enable-cjk --with-encoding=eucj 
> .endif # WITH_JAPANESE
> 
25a30,37
In this simple example, you can probably figure out its meaning. The < refers to lines in the original file that aren't in the new one and > refers to lines in the new file that aren't in the old one. The 5c5 means that there is a difference in the 5th line. The c means something would have to be changed for them to match. The 25a30,37 means that text would be added at line 25. In this case, we don't have it, but there is also use of the letter d for text to be deleted.
This is a bit hard to read, especially if there are many differences. Therefore, most people prefer unified diffs, diff with the -u flag. This gives us something like (again, with many lines snipped)
--- Makefile.orig Sat Sep 10 17:16:53 2005
+++ Makefile Fri Sep 16 03:13:52 2005
@@ -20,9 +20,21 @@
 USE_X_PREFIX= yes
 GNU_CONFIGURE= yes
 USE_REINPLACE= yes
+.if defined(WITH_JAPANESE)
+CONFIGURE_ARGS+=   --enable-xim --enable-cjk --with-encoding=eucj 
+.endif # WITH_JAPANESE
This shows a few lines before and after the change, which helps define context. (I've snipped the lines below this change, but you can see that three lines above it are included.) Let's examine this a bit. The first lines are fairly straightforward, they have --- and the old file's name, then +++ and the name of the new file. It also contains the ctime (the time the file was last modified.) Next is what is known as the hunk. This line will start with @@ then have the old file's starting line, the old number of lines, the new start and the new number of lines, then another @@.
Understand that the three lines above and below the change remain as they are. The 3 lines are simply to give context. In this case, including that context, the change starts at line 20. Lines 20-23 will remain unchanged. Including the 3 lines above and below the differences, the change will go for 9 lines. So, we are changing 9 lines, starting from line 20, (which will include 3 lines above and 3 lines below the actual change). Therefore, this is shown with a minus sign.
Following that is the plus sign. The first number 20, is the first line of the new file and the change, including the 3 lines above and below, will continue for 21 lines. Note that I have not shown the entire patch and also some of those lines may simply be blank lines. So, the hunk starts with
@@ -20,9 +20,21 @@
Next comes the actual patch itself, the 3 lines of context and the change.
Note that in the patch, there is a space before the 3 lines of context, and then the lines below have a plus sign. A space before a line means that nothing will be changed. A plus sign means the line will be added. If there had been lines to be deleted, they would have had a minus sign in front of them.
Let's create two files to make this a little clearer. Using your favorite text editor, create patchtest.txt and patchtestnew.txt. The patchtest.txt will read
This is a file.
These first three lines are 
lines of context. They 
will remain unchanged. They will have spaces in front of them.
Here are the lines that will be changed.  They will begin with
minus signs, because they are being deleted.
Now, we will add three 
more lines that are only
context.  They will have spaces in the patch
Now, patchtest1.txt
This is a file.
These first three lines are 
lines of context. They 
will remain unchanged. They will have spaces in front of them.
These lines have been changed.  They will have plus
signs in front of them.
Now, we will add three 
more lines that are only
context.  They will have spaces in the patch
Create the patch.
patch -uN patchtest.txt patchtest1.txt > patch.txt
View the patch
less patch.txt
You will see
--- patchtest Sun Feb 26 19:35:43 2006
+++ patchtest1 Sun Feb 26 19:35:14 2006
@@ -2,8 +2,9 @@
 These first three lines are 
 lines of context. They 
 will remain unchanged. They will have spaces in front of them.
-Here are the lines that will be changed.  They will begin with
-minus signs, because they are being deleted.
+These lines have been changed.  They will have plus
+signs in front of them.
 Now, we will add three 
 more lines that are only
 context.  They will have spaces in the patch
+This is yet another line that is different.
You can see the first line, This is a file, wasn't included in the patch--that's because it was outside of the three lines of context.
Now that we've made our patch, we can apply it.
patch patchtest.txt < patch.txt
You will see
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- patchtest Sun Feb 26 19:35:43 2006
|+++ patchtest1 Sun Feb 26 19:35:14 2006
--------------------------
Patching file patchtest using Plan A...
Hunk #1 succeeded at 2.
done
Now, patchtest.txt has been patched. If you now do a diff between patchtest and patchtest1, you'll just be put back at your command prompt, showing that there are no differences.
This is simplest form of creating diffs and using patches. Sometimes, you patch an entire directory--those who compile their own kernels may have done this. Rather than downloading an entire new tarball of the new kernel, there are often patches, especially of minor revision numbers. The README in /usr/src/linux has instructions for using these patches. When you are applying a patch to an entire directory tree, you may need to use the -p1 option. The p[number] basically helps determine the path to the file or files being patched. See man(1) patch for details and examples. For instance, if you had a patch for the entire Linux kernel source tree, and were in /usr/src you might do
patch -p1 < mylinuxsource.patch
As this varies, depending not only upon your location when applying the patch, but also what is in the patch, it's best to see the man page, however, just keep in mind that if trying to apply a patch that covers several files in a directory doesn't work, it may be the p[number] that is causing the difficulty.
Although this becomes more complex when making patches consisting of many hunks, or patching many files in a directory, (such as the Linux kernel source tree) the basic concept is the same. It is hoped that this article gives the reader a better understanding of diff and patch, and will help them to read and understand patches. This can be very handy--sometimes, a patch has something you don't want, so it' always good to look at it before applying it.
Patches can also be reversed with the -R flag. Suppose you try an experimental patch and it breaks something. You can then patch the file again with the -R flag.
Take our patchtest and patchtest1. Let's run patch again with the -R flag.
patch -R patchtest.txt < patch.txt
Again, you'll see that Hrrm...Looks like a unified diff to me message and a message that it succeeded. Actually, if you forget the -R flag, patch often catches it. Patch the file one more time with patch patchtest.txt < patch.txt and it should succeed. Once again, patchtest and patchtest1 are identical. Now, try it again, without the -R flag. You'll see a message
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- patchtest Sun Feb 26 19:35:43 2006
|+++ patchtest1 Sun Feb 26 19:35:14 2006
--------------------------
Patching file patchtest using Plan A...
Reversed (or previously applied) patch detected!  Assume -R? [y] 
If you type y then you should once again see that it succeeded.
If anyone is interested, my patch for mrxvt was accepted, and the port is now available with the option to enable Japanese.


REFERENCES
http://www.linuxforums.org/articles/using-diff-and-patch_80.html