My first contact with UN/EDIFACT was based on a source code
exchange with USR-Tuebingen. I've showed them how to use a
Linux box to access the Verzeichniss Lieferbarer Buecher
CDROM from other Unix sites. They gave me a tool called
ediview
, an interactive UN/EDIFACT browser written
in C and used for the Frankfurt EDITEUR project. The parser
was table driven, and to my horror they told me that they've
retyped the printed EDITEUR draft for those tables.
My first attempt to reengineer this C source started with :
sed -e 's/??/^A/g' \
-e 's/?+/^B/g' \
-e 's/?'"'"'/^C/g' |
tr "^A^B^C+'" "?+'\t\n"
This gave me some tabular view on UN/EDIFACT messages intended to be loaded into a Postgres database, or viewed with less.
Soon thereafter I found the UN/EDIFACT batch directory at Premenos and wrote a 200 lines GAWK script to translate EDIFACT messages into a human readable form looking like:
LINE ITEM NUMBER : 1
Product identification : 0471949000 ISBN
Name : Cherry
Vorname : Gordon E
Titel : Birmingham
Untertitel : a study in geography, hislanning
Ort : Chichester
Verlag : Wiley(John)(W Sussex)
Erscheinungsjahr : 1994
Seiten : 254p
Ausstattung : ? ill ; 24cm. - Bibl.? P.237-244.
Subject (topical) : 39100200? Urban studies
Ordered quantity : 1
Suggested retail price : YYY 37.5 Catalogue
Reference qualifier : QNB 00023302 9
Reference date/time : 19960208 CCYYMMDD
Line item reference number : 8217
Reference qualifier : BFN S.KON.39
You may note the mixture of German and English translations, as the EDITEUR codelist extension I had, had been the German ones typed by USR.
The EDITEUR project stopped. IBU and others continued to use their home grown format, together with horror full MS-DOS applications, for book order routing.
I've started to think about SGML for a report system, when I found
Martin Bryan's homepage about XML/EDI. The first Edi2SGML was written
within a night shift, and I was able to process EDIFACT messages using
nsgmls
or Jade
. Edi2SGML was written in Perl and produced:
<!-- *** LIN+1 -->
<line.item>
<line.item.number>1</line.item.number>
</line.item>
<!-- *** PIA+5+0471949000:IB -->
<additional.product.id>
<product.id.function.qualifier coded="5">Product identification</product.id.function.qualifier>
<item.number.identification>
<item.number>0471949000</item.number>
<item.number.type coded="IB">ISBN (International Standard Book Number)</item.number.type>
</item.number.identification>
</additional.product.id>
<!-- *** IMD+F+010+:::Cherry -->
<item.description>
<item.description.type coded="F">Free-form</item.description.type>
<item.characteristic coded="010">Author Name<item.characteristic>
Cherry
</item.description>
This SGML and the later XML from XML::Edifact-0.2 had a real problem
with name clashes between segment, composite and element definitions
in the original UN/EDIFACT batch directory, causing trouble when it
came to validating the SGML/XML. As an example, take a look at the
composite definition file trcd
:
C080 PARTY NAME
Desc: Identification of a transaction party by name, one to five
lines. Party name may be formatted.
010 3036 Party name M an..35
020 3036 Party name C an..35
030 3036 Party name C an..35
040 3036 Party name C an..35
050 3036 Party name C an..35
060 3045 Party name format, coded C an..3
Here we have a composite called PARTY NAME
and elements also
called Party name
. The first idea of using case sensitivity
of XML to distinct between them, lost its glance when it came to
the PNA segment, which is also called PARTY NAME
. But XML
offers namespaces for situations like this, so a possible XML::Edifact
translation of the above EDITEUR book order line item is :
<?xml version="1.0"?>
<!DOCTYPE editeur:message SYSTEM "./editeur.dtd">
<!-- XML message produced by edi2xml.pl (c) Kraehe@Bakunin.North.De -->
<editeur:message
xmlns:editeur='./editeur.rdf'
xmlns:edifact='./edifact.rdf'
xmlns:trsd='./edifact_trsd.rdf'
xmlns:trcd='./edifact_trcd.rdf'
xmlns:tred='./edifact_tred.rdf'
xmlns:uncl='./edifact_uncl.rdf'
xmlns:anxs='./edifact_anxe.rdf'
xmlns:anxc='./edifact_anxc.rdf'
xmlns:anxe='./edifact_anxe.rdf'
xmlns:unsl='./edifact_unsl.rdf'
>
<!-- SEGMENT UNB+UNOC:2+STUB+BLA+960209:0843+72 -->
<anxs:interchange.header>
<anxc:syntax.identifier>
<anxe:syntax.identifier unsl:code="0001:UNOC">UN/ECE level C</anxe:syntax.identifier>
<anxe:syntax.version.number>2</anxe:syntax.version.number>
</anxc:syntax.identifier>
<anxc:interchange.sender>
<anxe:sender.identification>STUB</anxe:sender.identification>
</anxc:interchange.sender>
<anxc:interchange.recipient>
<anxe:recipient.identification>BLA</anxe:recipient.identification>
</anxc:interchange.recipient>
<anxc:date.time.of.preparation>
<anxe:date>960209</anxe:date>
<anxe:time>0843</anxe:time>
</anxc:date.time.of.preparation>
<anxe:interchange.control.reference>72</anxe:interchange.control.reference>
</anxs:interchange.header>
<!-- ... lot's of segments deleted ... -->
<!-- SEGMENT LIN+1 -->
<trsd:line.item>
<tred:line.item.number>1</tred:line.item.number>
</trsd:line.item>
<!-- SEGMENT PIA+5+0471949000:IB -->
<trsd:additional.product.id>
<tred:product.id.function.qualifier uncl:code="4347:5">Product identification</tred:product.id.function.qualifier>
<trcd:item.number.identification>
<tred:item.number>0471949000</tred:item.number>
<tred:item.number.type.coded uncl:code="7143:IB">ISBN (International Standard Book Number)</tred:item.number.type.coded>
</trcd:item.number.identification>
</trsd:additional.product.id>
<!-- SEGMENT IMD+F+010+:::Cherry -->
<editeur:item.description>
<tred:item.description.type.coded uncl:code="7077:F">Free-form</tred:item.description.type.coded>
<editeur:item.characteristic.coded editeur:code="7081:010">Author Name</editeur:item.characteristic.coded>
<trcd:item.description>
<tred:item.description>Cherry</tred:item.description>
</trcd:item.description>
</editeur:item.description>
Using namespaces not only allows to define a working DTD for plain EDIFACT, it also offers a nice way to translate code list extensions as in the above EDITEUR example.
In the above example each xmlns is referencing a RDF file as its URI. Those files do not yet exist, but are proposed to the XML::Edifact-0.5 version.