DocDiff Readme
2000-12-09..2004-05-29, Hisashi MORITA
Table of Contents
News
- 0.3.0 (2004-05-29)
- Re-designed and re-written from scratch.
- Supports multiple encodings (ASCII, EUC-JP, Shift_JIS, UTF-8) and multiple
eols (CR, LF, CRLF).
- Supports more output formats (tty, HTML, Manued, wdiff-like,
user-defined markup text).
- Supports configuration files (/etc/docdiff/docdiff.conf,
~/etc/docdiff/docdiff.conf (or ~/.docdiff/docdiff.conf)).
- Introduced digest (summary) mode.
- Approximately 200% faster than older versions, thanks to akr's diff
library.
- Better documentation and help message.
- License changed from Ruby's to modified BSD style.
- Pure Ruby. Does not require external diff program such as GNU diff, or
morphological analyzer such as ChaSen.
- Runs on both Unix and Windows (tested on Debian GNU/Linux and Cygwin)
- Unit tests introduced to decrease bugs and to encourage faster
development.
- Makefile introduced.
- 0.1.8 (2003-12-14)
- Displays warning when --bymorpheme is specified but ChaSen is not available
(patch by Akira YAMADA: Debian bug #192258).
- Supports system-wide configuration file (if ~/.chasenrc.docdiff does not
exist, reads /etc/docdiff/chasenrc) (patch by Akira YAMADA: Debian bug
#192261).
- 0.1.7 (2003-11-21)
- HTML output retains spaces ( patch by Akira YAMADA).
- Manued output is added. Use --manued command line option to get result in
Manued-like format.
- Fixed .chasenrc.docdiff to be compatible with the latest ChaSen, so that it
does not cause error.
- Alphabet words in the output may look ugly, since ChaSen does not keep
spaces between alphabetical words recently.
- Other minor bug fixes and code cleanup.
- 0.2.0b2 (2001-08-31)
- 0.2.0b1 (2001-08-31)
- A bit faster than 0.1.x, using file cache.
- A bit cleaner code.
- 0.1.6 (2001-05-16)
- Increased diff option number from 100000 to 1000000 in order to support
900KB+ text files.
- 0.1.5 (2001-01-17)
- Erased useless old code which were already commented out.
- Added documentation. (Updated README, more comments)
- First public release. Registered to RAA.
- 0.1.4 (2001-01-16)
- Output is like <tag>ab</tag>, instead of ugly
<tag>a</tag><tag>b</tag> (thanks again to Masatoshi
Seki for suggestion).
- Fixed hidden bug ('puts' was used to output result).
- Some code clean-up, though still hairly enough.
- 0.1.3 (2001-01-09)
- Tested with Ruby 1.6.2.
- Fixed "meth(a,b,)" bug (thanks to Masatoshi Seki).
- Switched development platform from Windows to Linux, but it should work
fine on Windows too, except for ChaSen stuff.
- 0.1.2 (2000-12-28)
- 0.1.1 (2000-12-25)
- Bug fix and some cleanup.
- Quotes some of HTML special characters (<>&) when output in
HTML.
- Added support for tty output using escape sequence.
- 0.1.0 (2000-12-19)
- ChaSen works fine now.
- GetOptLong was introduced to support command line options.
- 0.1.0a1 (2000-12-16)
- Added ChaSen support. Japanese word by word comparison requires
ChaSen.
- Converted scripts from Shift_JIS/CRLF to EUC-JP/LF.
- 0.0.2 (2000-12-10)
- 0.0.1 (2000-12-09)
- First version. Proof-of-concept.
- Supports ASCII, EUC-JP, LF only.
- Supports HTML output only.
- Requires GNU diff.
- Distributed under the same license as Ruby's
See the ChangeLog for detail.
Todo
- Fixing bugs.
- Incorporate user feedback.
- Better auto-recognition of encodings and eols.
- Make CSS and tty escape sequense customizable in config files.
- Adding a file upload CGI script, so that DocDiff can be used as a web application.
- Make stand-alone Windows application using Exerb, hopefully with fancy GUI.
- Multilingualization (M17N), when Ruby is multilingualized.
- Write "DocPatch".
Description
Compares two files word by word / char by char
Summary
DocDiff compares two files and shows the difference. It can compare files word
by word, char by char, or line by line. It has several output formats such as
HTML, tty, Manued, or user-defined markup.
It supports several encodings and end-of-line characters, including ASCII,
UTF-8, EUC-JP, Shift_JIS, CR, LF, and CRLF.
Requirement
- Ruby
(Note that you may need additional ruby library such as iconv and uconv,
if your OS's Ruby package does not include those.)
Installation
-
Place docdiff/ directory and its contents to ruby library directory, so that
ruby interpreter can load them.
(e.g. % cp -r docdiff /usr/lib/ruby/1.8
)
-
Place docdiff.rb in command binary directory.
(e.g. % cp docdiff.rb /usr/bin/
)
Optionally you may want to rename it to docdiff.
(e.g. % mv /usr/bin/docdiff.rb /usr/bin/docdiff
)
-
(Optional) If you want site-wide configuration file, place docdiff.conf.example
as /etc/docdiff/docdiff.conf and edit it.
(e.g. % cp docdiff.conf.example /etc/docdiff.conf
% $EDITOR /etc/docdiff.conf
)
-
(Optional) If you want per-user configuration file, place docdiff.conf.example
as ~/etc/docdiff/docdiff.conf and edit it.
(e.g. % cp docdiff.conf.example ~/etc/docdiff.conf
% $EDITOR ~/etc/docdiff.conf
)
Usage
Synopsis
docdiff [options] oldfile newfile
e.g. % docdiff old.txt new.txt > diff.html
See the help message for detail (docdiff --help).
Example
% cat sample/01.en.ascii.lf
Hello, my name is Watanabe.
I am just another Ruby porter.
% cat sample/02.en.ascii.lf
Hello, my name is matz.
It's me who has created Ruby. I am a Ruby hacker.
% docdiff sample/01.en.ascii.lf sample/02.en.ascii.lf
Hello, my name is
Watanabe.
matz.
It's me who has created Ruby.
I am
just another
a
Ruby
porter.
hacker.
%
Configuration
You can place configuration files at:
- /etc/docdiff/docdiff.conf (site-wide configuration)
- ~/etc/docdiff/docdiff.conf (user configuration)
(~/etc/docdiff/docdiff.conf is used by default in order to keep home directory clean, preventing dotfiles and dotdirs from scattering around. Alternatively, you can use ~/.docdiff/docdiff.conf as user configuration file name, following the traditional Unix convention.)
Notation is as follows (also refer to the file docdiff.conf.example included in the distribution archive):
# comment
key1 = value
key2 = value
...
Every value is treated as string, unless it seems like a number. In such case, value is treated as a number (usually an integer).
Troubleshooting
-
Sometimes DocDiff fails to auto-recognizing encoding and/or end-of-line
character. You may get an error like this.
charstring.rb:47:in `extend': wrong argument type nil (expected Module) (TypeError)
In such a case, try explicitly specifying encoding and end-of-line character.
License
This software is distributed under so-called modified BSD style license
(
http://www.opensource.org/licenses/bsd-license.php (without advertisement
clause)). By contributing to this software, you agree that your contribution
may be incorporated under the same license.
Copyright and condition of use of main portion of the source:
Copyright (C) Hisashi MORITA. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the University nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
diff library (docdiff/diff.rb and docdiff/diff/*) was originally a part of
Ruby/CVS by Akira TANAKA. Ruby/CVS is licensed under modified BSD style
license. See the following for detail.
Credits
- Hisashi MORITA (primary author)
- Shin'ichiro HARA (initial idea and algorithm suggestion)
- Masatoshi SEKI (patch)
- Akira YAMADA (patch, Debian package)
- Kenshi Muto (testing, bug report)
- Akira TANAKA (diff library author)
Resources
Format
Similar Software
There are several other software that can compare text word by word and/or
character by character.