OpenOffice.org & Python
The OpenOffice.org documentation is complex and
large. Still, I had the feeling that simple tasks with OpenOffice.org
and Python had to be simple. Once I had digged
through the tough documentation and combined my freshly acquired
knowledge with Danny's
OOo library, it turned out to be rather manageable.
This article / tutorial / HOWTO gives you a couple of examples of
Python code that accesses and manipulates OpenOffice.org. Since most
examples that are already out there discuss text documents, I'll be
writing about spreadsheets.
I regularly get customers that want an Excel or OpenOffice.org
spreadsheet imported into a database. This is easily
done with a few lines of Python code. Since we have Python access to
OOo anyway, we'll also look at calculating with
OpenOffice.org.
Preparation
In order to communicate with OpenOffice.org, it needs to be
started in a special way, and we need some Python modules.
Starting OpenOffice.org
First you have to close OpenOffice.org, including the
quick-start tray icon if you're on Windows. Then start
OpenOffice.org like this:
ooffice "-accept=socket,host=localhost,port=8100;urp;"
Installing Python modules
Install Danny's
OpenOffice.org library. Place it into your development
directory and execute:
mkdir -p Danny/OOo
touch Danny/__init__.py
touch Danny/OOo/__init__.py
mv OOoLib.py Danny/OOo/
You also need the python-uno package. It is shipped with Ubuntu
Linux, so probably also with other distributions. If you have
trouble finding/downloading/installing it, just ask me.
Gaining access to a document
Basically, there are two ways of getting access to a document.
One is loading it from file, the other is simply using the
currently opened OpenOffice.org document. The goal of this section
is to obtain a reference to a spreadsheet document.
Accessing the current document
Using the currently opened document is the easiest method for
single tasks. For example: sending an email to all email
addresses in the spreadsheet. You can simply start
OpenOffice.org as described above, open the spreadsheet as usual,
then run the Python program. This method is also very
simple for not-so-technical people that want to have
the least number of unfamiliar "file open" dialog boxes.
The code to get a document reference is very simple:
import Danny.OOo.OOoLib as OOoLib
desktop = OOoLib.getDesktop()
doc = desktop.getCurrentComponent()
With this reference in the 'doc' variable, we can get to the data.
Loading a document from file
Sometimes you don't want to manually open a file, but simply
provide the software with a file name. There is a small caveat
here. You have to pass an URL to OpenOffice.org, even if you want
to access a file on the local filesystem. This code will do the
trick for you, though:
filename = 'somefile.ods'
import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
desktop = OOoLib.getDesktop()
doc = desktop.loadComponentFromURL(
url, "_blank", 0, () )
With this reference in the 'doc' variable, we can get to the
data.
Getting to the data
Once we have our document reference, we can get to the data.
Let's say we have a single spreadsheet in the document, that
contains a list of names and email addresses in the first resp.
the second row. First, we have to access the sheet:
sheet = doc.getSheets().getByIndex(0)
We can then get to the data, and do with it whatever we please.
I tend to store things into a SQLite database which can
then be copied to whatever computer needs the data. Once you know
how to get to the data in the spreadsheet, you can find your own
ways of dealing with it:
value = sheet.getCellByPosition(col, row).getValue()
text = sheet.getCellByPosition(col, row).getFormula()
That's right, text is accessed as a formula.
If you call getValue(), you get the floating point
numeric value of the cell. If it contains text, 0.0 is
returned.
Other functions can be found in the XCell
documentation.
Putting the above together, we get the following to get to the
names and email addresses in the sheet. I assume that if the email
cell is empty, we have reached the end of the list.
# Insert code from above to get the document
# reference 'doc'.
sheet = doc.getSheets().getByIndex(0)
row = 0
while True:
name = sheet.getCellByPosition(0, row).getFormula()
email = sheet.getCellByPosition(1, row).getFormula()
if not email:
break
print '%30s %20s' % (name, email)
row += 1
Types of data
As far as I can tell from the getType() function
documentation there are only four distinguishable cell
types:
That means no date type, no time type, no nothing. I'm
confident there is some way of determining those - after all,
OpenOffice.org itself can do it too - but at this moment I don't
know how. If you do know, let me know.
Calculating with OpenOffice.org
Since the full power of OpenOffice.org is at your hands, it's
also quite easy to let it perform calculations for you. First,
let's create our formulas:
sheet.getCellByPosition(0, 0).setValue(33)
sheet.getCellByPosition(0, 1).setValue(9)
sheet.getCellByPosition(0, 2).setFormula('=SUM(A1:A2)')
Getting to the result is as easy as calling
getValue() on the formula cell:
result = sheet.getCellByPosition(0, 2).getValue()
I'll leave more complex examples as an excersise to the
reader.
Converting from/to Excel or to PDF
You might have just moved from MS Office to OpenOffice.org, or
you might have a customer that wants only Excel spreadsheets.
Many people are interested in automated conversion, for just as
many reasons.
Conversion requires no more than obtaining the document
reference as described above, and then saving the file with the
correct filter.
From Excel to OpenOffice.org
From Excel to OpenOffice.org's Calc format is rather trivial,
since the conversion is already done when you open the Excel file.
All that is left, is saving it to the correct place:
filename = '/tmp/output.ods'
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, ())
From OpenOffice.org to Excel
I'll post the code first, assuming you already have a document
reference in the variable 'doc'.
filename = 'output.xls'
properties = (OOoLib.makePropertyValue(
'FilterName',
'MS Excel 97'
),)
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, properties)
That's all there is to it. The 'FilterName' property determines
the output filter, which in turn tells OpenOffice.org to store the
file as MS Excel file.
From OpenOffice.org to PDF
The code is almost identical to the code above, except that the
output filename ends in '.pdf' and the filter name is different
too. Here is the code:
filename = 'output.pdf'
properties = (OOoLib.makePropertyValue(
'FilterName',
'writer_pdf_Export'
),)
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, properties)
Processing all files in a directory
With the information above, you should be able to do the
automated conversion of any number of files. There is one thing
that might come in handy: call the document's
close() method once you're done with it.
That prevents your memory being flooded by all opened files.
I'll give a complete example that converts all
Excel files in the current directory into OpenOffice.org Calc
files.
import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path
import glob
desktop = OOoLib.getDesktop()
for file in glob.glob('*.xls'):
url = unohelper.systemPathToFileUrl(
os.path.abspath(file))
doc = desktop.loadComponentFromURL(
url, "_blank", 0, () )
to_url = url.replace('.xls', '.ods')
doc.storeToURL(to_url, ())
doc.close(True)
Automatically starting OOo
All the examples assumed OpenOffice.org was already started
with the correct options. Of course, it would be nice if our
software could do that for us. Replace the import
Danny.OOo.OOoLib line with this chunk:
import os
import subprocess
import sys
import time
import uno
NoConnectionException = uno.getClass(
"com.sun.star.connection.NoConnectException")
ooffice = 'ooffice ' \
'"-accept=socket,host=localhost,port=8100;urp;"'
def start_OOo():
'''Starts OpenOffice.org with a listening socket.'''
# Fork to execute OOo
if os.fork():
return
# Start OpenOffice.org and report any errors that
# occur.
try:
retcode = subprocess.call(ooffice, shell=True)
if retcode < 0:
print >>sys.stderr, \
"OOo was terminated by signal", \
-retcode
elif retcode > 0:
print >>sys.stderr, \
"OOo returned", \
retcode
except OSError, e:
print >>sys.stderr, "Execution failed:", e
# Terminate this process when OOo has closed.
raise SystemExit()
# Import the OpenOffice.org module, and start
# OpenOffice.org if we can't connect to it.
started_OOo = False
while 'Danny.OOo.OOoLib' not in sys.modules:
try:
import Danny.OOo.OOoLib
except NoConnectionException:
if not started_OOo:
print "Starting OOo"
started_OOo = True
start_OOo()
time.sleep(0.1)
print "OOo started"
It might not be the pretties code in the world
(if you can improve it, please send me a copy) but it gets
the job done.
More information
If you're eager to find out more, the Desktop
API documentation is a good place to start.