OpenOffice.org & Python
This article was originally written in October 2006, and has been subsequently updated.
The OpenOffice.org documentation is complex and large. Still, I had the feeling that simple tasks with OpenOffice.org and Python had to be simple. Once I had digged through the tough documentation and combined my freshly acquired knowledge with Danny’s OOo library, it turned out to be rather manageable.
This article / tutorial / HOWTO gives you a couple of examples of Python code that accesses and manipulates OpenOffice.org. Since most examples that are already out there discuss text documents, I’ll be writing about spreadsheets.
I regularly get customers that want an Excel or OpenOffice.org spreadsheet imported into a database. This is easily done with a few lines of Python code. Since we have Python access to OOo anyway, we’ll also look at calculating with OpenOffice.org.
Preparation
In order to communicate with OpenOffice.org, it needs to be started in a special way, and we need some Python modules.
Starting OpenOffice.org
First you have to close OpenOffice.org, including the quick-start tray icon if you’re on Windows. Then start OpenOffice.org like this:
ooffice "-accept=socket,host=localhost,port=8100;urp;"
Installing Python modules
Install Danny’s OpenOffice.org library. Place it into your development directory and execute:
mkdir -p Danny/OOo
touch Danny/__init__.py
touch Danny/OOo/__init__.py
mv OOoLib.py Danny/OOo/
You also need the python-uno package. It is shipped with Ubuntu Linux, so probably also with other distributions. If you have trouble finding/downloading/installing it, just ask me.
Gaining access to a document
Basically, there are two ways of getting access to a document. One is loading it from file, the other is simply using the currently opened OpenOffice.org document. The goal of this section is to obtain a reference to a spreadsheet document.
Accessing the current document
Using the currently opened document is the easiest method for single tasks. For example: sending an email to all email addresses in the spreadsheet. You can simply start OpenOffice.org as described above, open the spreadsheet as usual, then run the Python program. This method is also very simple for not-so-technical people that want to have the least number of unfamiliar “file open” dialog boxes.
The code to get a document reference is very simple:
import Danny.OOo.OOoLib as OOoLib
desktop = OOoLib.getDesktop()
doc = desktop.getCurrentComponent()
With this reference in the ‘doc’ variable, we can get to the data.
Loading a document from file
Sometimes you don’t want to manually open a file, but simply provide the software with a file name. There is a small caveat here. You have to pass an URL to OpenOffice.org, even if you want to access a file on the local filesystem. This code will do the trick for you, though:
filename = 'somefile.ods'
import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
desktop = OOoLib.getDesktop()
doc = desktop.loadComponentFromURL(
url, "_blank", 0, () )
With this reference in the ‘doc’ variable, we can get to the data.
Getting to the data
Once we have our document reference, we can get to the data. Let’s say we have a single spreadsheet in the document, that contains a list of names and email addresses in the first resp. the second row. First, we have to access the sheet:
sheet = doc.getSheets().getByIndex(0)
We can then get to the data, and do with it whatever we please. I tend to store things into a SQLite database which can then be copied to whatever computer needs the data. Once you know how to get to the data in the spreadsheet, you can find your own ways of dealing with it:
value = sheet.getCellByPosition(col, row).getValue()
text = sheet.getCellByPosition(col, row).getFormula()
That’s right, text is accessed as a formula. If you call getValue()
, you
get the floating point numeric value of the cell. If it contains text, 0.0 is
returned.
Other functions can be found in the XCell documentation.
Putting the above together, we get the following to get to the names and email addresses in the sheet. I assume that if the email cell is empty, we have reached the end of the list.
# Insert code from above to get the document
# reference 'doc'.
sheet = doc.getSheets().getByIndex(0)
row = 0
while True:
name = sheet.getCellByPosition(0, row).getFormula()
email = sheet.getCellByPosition(1, row).getFormula()
if not email:
break
print '%30s %20s' % (name, email)
row += 1
Types of data
As far as I can tell from the
getType()
function
documentation
there are only four distinguishable cell types:
EMPTY
VALUE
TEXT
FORMULA
That means no date type, no time type, no nothing. I’m confident there is some way of determining those - after all, OpenOffice.org itself can do it too - but at this moment I don’t know how. If you do know, let me know.
Calculating with OpenOffice.org
Since the full power of OpenOffice.org is at your hands, it’s also quite easy to let it perform calculations for you. First, let’s create our formulas:
sheet.getCellByPosition(0, 0).setValue(33)
sheet.getCellByPosition(0, 1).setValue(9)
sheet.getCellByPosition(0, 2).setFormula('=SUM(A1:A2)')
Getting to the result is as easy as calling getValue()
on the formula cell:
result = sheet.getCellByPosition(0, 2).getValue()
I’ll leave more complex examples as an excersise to the reader.
Converting from/to Excel or to PDF
You might have just moved from MS Office to OpenOffice.org, or you might have a customer that wants only Excel spreadsheets. Many people are interested in automated conversion, for just as many reasons.
Conversion requires no more than obtaining the document reference as described above, and then saving the file with the correct filter.
From Excel to OpenOffice.org
From Excel to OpenOffice.org’s Calc format is rather trivial, since the conversion is already done when you open the Excel file. All that is left, is saving it to the correct place:
filename = '/tmp/output.ods'
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, ())
From OpenOffice.org to Excel
I’ll post the code first, assuming you already have a document reference in the variable ‘doc’.
filename = 'output.xls'
properties = (OOoLib.makePropertyValue(
'FilterName',
'MS Excel 97'
),)
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, properties)
That’s all there is to it. The ‘FilterName’ property determines the output filter, which in turn tells OpenOffice.org to store the file as MS Excel file.
From OpenOffice.org to PDF
The code is almost identical to the code above, except that the output filename ends in ‘.pdf’ and the filter name is different too. Here is the code:
filename = 'output.pdf'
properties = (OOoLib.makePropertyValue(
'FilterName',
'writer_pdf_Export'
),)
url = unohelper.systemPathToFileUrl(
os.path.abspath(filename))
doc.storeToURL(path, properties)
Processing all files in a directory
With the information above, you should be able to do the automated conversion of
any number of files. There is one thing that might come in handy: call the
document’s close()
method once you’re done with it. That prevents your
memory being flooded by all opened files.
I’ll give a complete example that converts all Excel files in the current directory into OpenOffice.org Calc files.
import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path
import glob
desktop = OOoLib.getDesktop()
for file in glob.glob('*.xls'):
url = unohelper.systemPathToFileUrl(
os.path.abspath(file))
doc = desktop.loadComponentFromURL(
url, "_blank", 0, () )
to_url = url.replace('.xls', '.ods')
doc.storeToURL(to_url, ())
doc.close(True)
Automatically starting OOo
All the examples assumed OpenOffice.org was already started with the correct
options. Of course, it would be nice if our software could do that for us.
Replace the import Danny.OOo.OOoLib
line with this chunk:
import os
import subprocess
import sys
import time
import uno
NoConnectionException = uno.getClass(
"com.sun.star.connection.NoConnectException")
ooffice = 'ooffice ' \
'"-accept=socket,host=localhost,port=8100;urp;"'
def start_OOo():
'''Starts OpenOffice.org with a listening socket.'''
# Fork to execute OOo
if os.fork():
return
# Start OpenOffice.org and report any errors that
# occur.
try:
retcode = subprocess.call(ooffice, shell=True)
if retcode < 0:
print >>sys.stderr, \
"OOo was terminated by signal", \
-retcode
elif retcode > 0:
print >>sys.stderr, \
"OOo returned", \
retcode
except OSError, e:
print >>sys.stderr, "Execution failed:", e
# Terminate this process when OOo has closed.
raise SystemExit()
# Import the OpenOffice.org module, and start
# OpenOffice.org if we can't connect to it.
started_OOo = False
while 'Danny.OOo.OOoLib' not in sys.modules:
try:
import Danny.OOo.OOoLib
except NoConnectionException:
if not started_OOo:
print "Starting OOo"
started_OOo = True
start_OOo()
time.sleep(0.1)
print "OOo started"
It might not be the pretties code in the world (if you can improve it, please send me a copy) but it gets the job done.
More information
If you’re eager to find out more, the Desktop API documentation is a good place to start.