Converting exceptionally large SOSI files to PostGIS

Sosicon has the ability to assemble several SOSI source files into one large SQL file. However, the maximum size of the destination file depends on the target machine’s memory resources. Sometimes, the data source is too large for a single-run approach and you will run out of memory.

Fortunately, there is an easy workaround to avoid memory exhaustion. You will have to build the database creation script (DDL statements) separately, and then split up your insertion script (SQL statements) across multiple SQL files. That is quite easily accomplished with the -create and -insert parameters.

In order to create the database table structure, Sosicon needs to know what kind of data to expect. Hence, the initial run must include all SOSI source files.

Building the creation script

When you specify the -create parameter, no memory will be reserved for the data. Only the table structure is saved. Let’s say we have a directory with 19 large SOSI files numbered from 01_conty.sos to 19_county.sos. The following command can be run to compile the creation script:

ls *_county.sos | sosicon -2psql -create -o create_tables.sql

Sosicon will analyze all the SOSI files before building a file named create_tables.sql. When this file is run against PostgreSQL, a set of empty tables will be created, having the relevant columns for the SOSI source in place.

Data insertion scripts

Now, for each SOSI file in the directory, run sosicon with the -insert parameter, starting with 01_county.sos:

sosicon -2psql -insert -o insert_01_county.sql 01_county.sos

When you are done, you will have 20 files: 1 creation script and 18 insertion scripts. Run the creation script first, then the insertion scripts.

Leave a Reply

Your email address will not be published. Required fields are marked *