OpenOffice.org & Python

This article was originally written in October 2006, and has been subsequently updated.

The OpenOffice.org documentation is complex and large. Still, I had the feeling that simple tasks with OpenOffice.org and Python had to be simple. Once I had digged through the tough documentation and combined my freshly acquired knowledge with Danny’s OOo library, it turned out to be rather manageable.

This article / tutorial / HOWTO gives you a couple of examples of Python code that accesses and manipulates OpenOffice.org. Since most examples that are already out there discuss text documents, I’ll be writing about spreadsheets.

I regularly get customers that want an Excel or OpenOffice.org spreadsheet imported into a database. This is easily done with a few lines of Python code. Since we have Python access to OOo anyway, we’ll also look at calculating with OpenOffice.org.

Preparation

In order to communicate with OpenOffice.org, it needs to be started in a special way, and we need some Python modules.

Starting OpenOffice.org

First you have to close OpenOffice.org, including the quick-start tray icon if you’re on Windows. Then start OpenOffice.org like this:

ooffice "-accept=socket,host=localhost,port=8100;urp;"

Installing Python modules

Install Danny’s OpenOffice.org library. Place it into your development directory and execute:

mkdir -p Danny/OOo
touch Danny/__init__.py
touch Danny/OOo/__init__.py
mv OOoLib.py Danny/OOo/

You also need the python-uno package. It is shipped with Ubuntu Linux, so probably also with other distributions. If you have trouble finding/downloading/installing it, just ask me.

Gaining access to a document

Basically, there are two ways of getting access to a document. One is loading it from file, the other is simply using the currently opened OpenOffice.org document. The goal of this section is to obtain a reference to a spreadsheet document.

Accessing the current document

Using the currently opened document is the easiest method for single tasks. For example: sending an email to all email addresses in the spreadsheet. You can simply start OpenOffice.org as described above, open the spreadsheet as usual, then run the Python program. This method is also very simple for not-so-technical people that want to have the least number of unfamiliar “file open” dialog boxes.

The code to get a document reference is very simple:

import Danny.OOo.OOoLib as OOoLib

desktop = OOoLib.getDesktop()
doc = desktop.getCurrentComponent()

With this reference in the ‘doc’ variable, we can get to the data.

Loading a document from file

Sometimes you don’t want to manually open a file, but simply provide the software with a file name. There is a small caveat here. You have to pass an URL to OpenOffice.org, even if you want to access a file on the local filesystem. This code will do the trick for you, though:

filename = 'somefile.ods'

import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path

url = unohelper.systemPathToFileUrl(
    os.path.abspath(filename))
desktop = OOoLib.getDesktop()
doc = desktop.loadComponentFromURL(
    url, "_blank", 0, () )

With this reference in the ‘doc’ variable, we can get to the data.

Getting to the data

Once we have our document reference, we can get to the data. Let’s say we have a single spreadsheet in the document, that contains a list of names and email addresses in the first resp. the second row. First, we have to access the sheet:

sheet = doc.getSheets().getByIndex(0)

We can then get to the data, and do with it whatever we please. I tend to store things into a SQLite database which can then be copied to whatever computer needs the data. Once you know how to get to the data in the spreadsheet, you can find your own ways of dealing with it:

value = sheet.getCellByPosition(col, row).getValue()
text = sheet.getCellByPosition(col, row).getFormula()

That’s right, text is accessed as a formula. If you call getValue(), you get the floating point numeric value of the cell. If it contains text, 0.0 is returned.

Other functions can be found in the XCell documentation.

Putting the above together, we get the following to get to the names and email addresses in the sheet. I assume that if the email cell is empty, we have reached the end of the list.

# Insert code from above to get the document
# reference 'doc'.

sheet = doc.getSheets().getByIndex(0)

row = 0
while True:
    name = sheet.getCellByPosition(0, row).getFormula()
    email = sheet.getCellByPosition(1, row).getFormula()

    if not email:
        break

    print '%30s %20s' % (name, email)

    row += 1

Types of data

As far as I can tell from the getType() function documentation there are only four distinguishable cell types:

  • EMPTY
  • VALUE
  • TEXT
  • FORMULA

That means no date type, no time type, no nothing. I’m confident there is some way of determining those - after all, OpenOffice.org itself can do it too - but at this moment I don’t know how. If you do know, let me know.

Calculating with OpenOffice.org

Since the full power of OpenOffice.org is at your hands, it’s also quite easy to let it perform calculations for you. First, let’s create our formulas:

sheet.getCellByPosition(0, 0).setValue(33)
sheet.getCellByPosition(0, 1).setValue(9)
sheet.getCellByPosition(0, 2).setFormula('=SUM(A1:A2)')

Getting to the result is as easy as calling getValue() on the formula cell:

result = sheet.getCellByPosition(0, 2).getValue()

I’ll leave more complex examples as an excersise to the reader.

Converting from/to Excel or to PDF

You might have just moved from MS Office to OpenOffice.org, or you might have a customer that wants only Excel spreadsheets. Many people are interested in automated conversion, for just as many reasons.

Conversion requires no more than obtaining the document reference as described above, and then saving the file with the correct filter.

From Excel to OpenOffice.org

From Excel to OpenOffice.org’s Calc format is rather trivial, since the conversion is already done when you open the Excel file. All that is left, is saving it to the correct place:

filename = '/tmp/output.ods'

url = unohelper.systemPathToFileUrl(
    os.path.abspath(filename))
doc.storeToURL(path, ())

From OpenOffice.org to Excel

I’ll post the code first, assuming you already have a document reference in the variable ‘doc’.

filename = 'output.xls'

properties = (OOoLib.makePropertyValue(
        'FilterName',
        'MS Excel 97'
    ),)

url = unohelper.systemPathToFileUrl(
    os.path.abspath(filename))
doc.storeToURL(path, properties)

That’s all there is to it. The ‘FilterName’ property determines the output filter, which in turn tells OpenOffice.org to store the file as MS Excel file.

From OpenOffice.org to PDF

The code is almost identical to the code above, except that the output filename ends in ‘.pdf’ and the filter name is different too. Here is the code:

filename = 'output.pdf'

properties = (OOoLib.makePropertyValue(
        'FilterName',
        'writer_pdf_Export'
    ),)

url = unohelper.systemPathToFileUrl(
    os.path.abspath(filename))
doc.storeToURL(path, properties)

Processing all files in a directory

With the information above, you should be able to do the automated conversion of any number of files. There is one thing that might come in handy: call the document’s close() method once you’re done with it. That prevents your memory being flooded by all opened files.

I’ll give a complete example that converts all Excel files in the current directory into OpenOffice.org Calc files.

import Danny.OOo.OOoLib as OOoLib
import unohelper
import os.path
import glob

desktop = OOoLib.getDesktop()

for file in glob.glob('*.xls'):
    url = unohelper.systemPathToFileUrl(
        os.path.abspath(file))
    doc = desktop.loadComponentFromURL(
        url, "_blank", 0, () )

    to_url = url.replace('.xls', '.ods')
    doc.storeToURL(to_url, ())
    doc.close(True)

Automatically starting OOo

All the examples assumed OpenOffice.org was already started with the correct options. Of course, it would be nice if our software could do that for us. Replace the import Danny.OOo.OOoLib line with this chunk:

import os
import subprocess
import sys
import time
import uno

NoConnectionException = uno.getClass(
        "com.sun.star.connection.NoConnectException")

ooffice = 'ooffice ' \
    '"-accept=socket,host=localhost,port=8100;urp;"'

def start_OOo():
    '''Starts OpenOffice.org with a listening socket.'''

    # Fork to execute OOo
    if os.fork():
        return

    # Start OpenOffice.org and report any errors that
    # occur.
    try:
        retcode = subprocess.call(ooffice, shell=True)
        if retcode < 0:
            print >>sys.stderr, \
                "OOo was terminated by signal", \
                -retcode
        elif retcode > 0:
            print >>sys.stderr, \
                "OOo returned", \
                retcode
    except OSError, e:
        print >>sys.stderr, "Execution failed:", e

    # Terminate this process when OOo has closed.
    raise SystemExit()

# Import the OpenOffice.org module, and start
# OpenOffice.org if we can't connect to it.
started_OOo = False
while 'Danny.OOo.OOoLib' not in sys.modules:
    try:
        import Danny.OOo.OOoLib
    except NoConnectionException:
        if not started_OOo:
            print "Starting OOo"
            started_OOo = True
            start_OOo()
        time.sleep(0.1)

print "OOo started"

It might not be the pretties code in the world (if you can improve it, please send me a copy) but it gets the job done.

More information

If you’re eager to find out more, the Desktop API documentation is a good place to start.