Today, I actually undertook that task when carving form EMF files produced over 1000 files to be examined. Doing so exposed me to the full capabilities of unoconv, and I'm quite excited about the possibilities.
What, exactly is unoconv?
From the man page: "unoconv is a command line utility that can convert any file format that OpenOffice can import, to any file format that OpenOffice is capable of exporting." This begs the question: what can OpenOffice (or LibreOffice) import and export? Glad you asked:
$ unoconv --show
The following list of document formats are currently available:So, what's missing? The newer Microsoft 'x' formats: docx, xlsx, etc. (Microsoft Office XML) are not listed, but conversion is still possible! Let you mantra be "unoconv is a command line utility that can convert any file format that OpenOffice can import, to any file format that OpenOffice is capable of exporting."
bib - BibTeX [.bib]
doc - Microsoft Word 97/2000/XP [.doc]
doc6 - Microsoft Word 6.0 [.doc]
doc95 - Microsoft Word 95 [.doc]
docbook - DocBook [.xml]
html - HTML Document (OpenOffice.org Writer) [.html]
odt - Open Document Text [.odt]
ott - Open Document Text [.ott]
ooxml - Microsoft Office Open XML [.xml]
pdb - AportisDoc (Palm) [.pdb]
pdf - Portable Document Format [.pdf]
psw - Pocket Word [.psw]
rtf - Rich Text Format [.rtf]
latex - LaTeX 2e [.ltx]
sdw - StarWriter 5.0 [.sdw]
sdw4 - StarWriter 4.0 [.sdw]
sdw3 - StarWriter 3.0 [.sdw]
stw - Open Office.org 1.0 Text Document Template [.stw]
sxw - Open Office.org 1.0 Text Document [.sxw]
text - Text Encoded [.txt]
txt - Plain Text [.txt]
vor - StarWriter 5.0 Template [.vor]
vor4 - StarWriter 4.0 Template [.vor]
vor3 - StarWriter 3.0 Template [.vor]
xhtml - XHTML Document [.html]
The following list of graphics formats are currently available:
bmp - Windows Bitmap [.bmp]
emf - Enhanced Metafile [.emf]
eps - Encapsulated PostScript [.eps]
gif - Graphics Interchange Format [.gif]
html - HTML Document (OpenOffice.org Draw) [.html]
jpg - Joint Photographic Experts Group [.jpg]
met - OS/2 Metafile [.met]
odd - OpenDocument Drawing [.odd]
otg - OpenDocument Drawing Template [.otg]
pbm - Portable Bitmap [.pbm]
pct - Mac Pict [.pct]
pdf - Portable Document Format [.pdf]
pgm - Portable Graymap [.pgm]
png - Portable Network Graphic [.png]
ppm - Portable Pixelmap [.ppm]
ras - Sun Raster Image [.ras]
std - OpenOffice.org 1.0 Drawing Template [.std]
svg - Scalable Vector Graphics [.svg]
svm - StarView Metafile [.svm]
swf - Macromedia Flash (SWF) [.swf]
sxd - OpenOffice.org 1.0 Drawing [.sxd]
sxd3 - StarDraw 3.0 [.sxd]
sxd5 - StarDraw 5.0 [.sxd]
tiff - Tagged Image File Format [.tiff]
vor - StarDraw 5.0 Template [.vor]
vor3 - StarDraw 3.0 Template [.vor]
wmf - Windows Metafile [.wmf]
xhtml - XHTML [.xhtml]
xpm - X PixMap [.xpm]
The following list of presentation formats are currently available:
bmp - Windows Bitmap [.bmp]
emf - Enhanced Metafile [.emf]
eps - Encapsulated PostScript [.eps]
gif - Graphics Interchange Format [.gif]
html - HTML Document (OpenOffice.org Impress) [.html]
jpg - Joint Photographic Experts Group [.jpg]
met - OS/2 Metafile [.met]
odd - OpenDocument Drawing (Impress) [.odd]
odg - OpenOffice.org 1.0 Drawing (OpenOffice.org Impress) [.odg]
odp - OpenDocument Presentation [.odp]
otp - OpenDocument Presentation Template [.otp]
pbm - Portable Bitmap [.pbm]
pct - Mac Pict [.pct]
pdf - Portable Document Format [.pdf]
pgm - Portable Graymap [.pgm]
png - Portable Network Graphic [.png]
pot - Microsoft PowerPoint 97/2000/XP Template [.pot]
ppm - Portable Pixelmap [.ppm]
ppt - Microsoft PowerPoint 97/2000/XP [.ppt]
pwp - PlaceWare [.pwp]
ras - Sun Raster Image [.ras]
sda - StarDraw 5.0 (OpenOffice.org Impress) [.sda]
sdd - StarImpress 5.0 [.sdd]
sdd3 - StarDraw 3.0 (OpenOffice.org Impress) [.sdd]
sdd4 - StarImpress 4.0 [.sdd]
sti - OpenOffice.org 1.0 Presentation Template [.sti]
stp - OpenDocument Presentation Template [.stp]
svg - Scalable Vector Graphics [.svg]
svm - StarView Metafile [.svm]
swf - Macromedia Flash (SWF) [.swf]
sxi - OpenOffice.org 1.0 Presentation [.sxi]
tiff - Tagged Image File Format [.tiff]
vor - StarImpress 5.0 Template [.vor]
vor3 - StarDraw 3.0 Template (OpenOffice.org Impress) [.vor]
vor4 - StarImpress 4.0 Template [.vor]
vor5 - StarDraw 5.0 Template (OpenOffice.org Impress) [.vor]
wmf - Windows Metafile [.wmf]
xhtml - XHTML [.xml]
xpm - X PixMap [.xpm]
The following list of spreadsheet formats are currently available:
csv - Text CSV [.csv]
dbf - dBase [.dbf]
dif - Data Interchange Format [.dif]
html - HTML Document (OpenOffice.org Calc) [.html]
ods - Open Document Spreadsheet [.ods]
ooxml - Microsoft Excel 2003 XML [.xml]
pdf - Portable Document Format [.pdf]
pts - OpenDocument Spreadsheet Template [.pts]
pxl - Pocket Excel [.pxl]
sdc - StarCalc 5.0 [.sdc]
sdc4 - StarCalc 4.0 [.sdc]
sdc3 - StarCalc 3.0 [.sdc]
slk - SYLK [.slk]
stc - OpenOffice.org 1.0 Spreadsheet Template [.stc]
sxc - OpenOffice.org 1.0 Spreadsheet [.sxc]
vor3 - StarCalc 3.0 Template [.vor]
vor4 - StarCalc 4.0 Template [.vor]
vor - StarCalc 5.0 Template [.vor]
xhtml - XHTML [.xhtml]
xls - Microsoft Excel 97/2000/XP [.xls]
xls5 - Microsoft Excel 5.0 [.xls]
xls95 - Microsoft Excel 95 [.xls]
xlt - Microsoft Excel 97/2000/XP Template [.xlt]
xlt5 - Microsoft Excel 5.0 Template [.xlt]
xlt95 - Microsoft Excel 95 Template [.xlt]
Using unoconv
To use unoconv, you first have to start the listener:
$ unoconv -l & #the '&' backgrounds the process and returns control of the
terminal winodow to your
[1] 9998 #9998 is the process number of the listener.
We can see that the listener is a python program and the killall command to cancel the listener would have to be directed at python. To avoid killing other processes, 'kill 9998' should be used rather than 'killall python':
$ ps 9998
PID TTY STAT TIME COMMAND
9998 pts/0 Sl 0:00 /usr/bin/python /usr/bin/unoconv -l
With the listener running, conversion of documents is straight forward, as we can see from the help:
$ unoconv -h
usage: unoconv [options] file [file2 ..]
Convert from and to any format supported by OpenOffice
unoconv options:
-c, --connection=string use a custom connection string
-d, --doctype=type specify document type
(document, graphics, presentation, spreadsheet)
-e, --export=name=value set export filter options
eg. -e PageRange=1-2
-f, --format=format specify the output format
-i, --import=string set import filter option string
eg. -i utf8
-l, --listener start a listener to use by unoconv clients
-o, --outputpath=name output directory
--pipe=name alternative method of connection using a pipe
-p, --port=port specify the port (default: 2002)
to be used by client or listener
-s, --server=server specify the server address (default: localhost)
to be used by client or listener
-t, --template=file import the styles from template (.ott)
-T, --timeout=secs timeout after secs if connections to OpenOffice fail
--show list the available output formats
--stdout write output to stdout
-v, --verbose be more and more verbose
So, in its simplest form, conversion takes the following form:
$ unoconv test.docx
The command will finish silently if successful. It creates a .pdf by default in the same directory as the document. Add the -f [fmt] option to convert to a different format, for example:
$ unoconv -f txt test.docx
When your conversion work is done, close the listener with:
$ kill 9998
Now you see why unoconv is number one!
No comments:
Post a Comment