List of all members | Public Types | Public Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | Static Private Attributes
utl::Reader Class Reference

Utility for parsing XML files. More...

#include <Reader.h>

Public Types

enum  EValidationType { eDTD, eSCHEMA, eNONE }
 

Public Member Functions

Branch GetTopBranch () const
 Get the top Branch (represents same entity as document node) More...
 
std::string GetUri () const
 
 Reader (const std::string &name, const EValidationType validationType=Reader::eNONE)
 Constructor with validation options. More...
 
 Reader (const ReaderStringInput &inputString, const EValidationType validationType=Reader::eNONE)
 Constructor from an input string (as opposed to file). EValidationType not (yet) used. More...
 
 Reader (Branch &branch)
 

Private Types

enum  EInputType { eFromFile = 0, eFromMemBuf }
 

Private Member Functions

void Initialize ()
 
Readeroperator= (const Reader &re)
 
 Reader (const Reader &re)
 

Static Private Member Functions

static Branch Parse (const std::string &input, const EValidationType validationType, const EInputType inputType)
 
static void Terminate ()
 

Private Attributes

Branch fTopBranch
 

Static Private Attributes

static ReaderErrorReporterfgErrReporter = nullptr
 
static bool fgInitialized = false
 

Detailed Description

Utility for parsing XML files.

Utility for reading data from XML files.

Author
T. Paul
P. Cattaneo

The various framework configuration files use XML format. The rationale for using XML, rather than some homemade ASCII format, is that it is a universally adopted standard which is designed to be both a human readable and machine parsable way of organizing data, and, as such, there are several well written XML parsers which are freely available. As part of the Offline code, we provide a utility called Reader which is based on the Xerces parser, and provides ease-of-use at the cost of a small reduction in functionality compared to Xerces. See GAP-2001-44 for details on the Reader code as well as a brief XML tutorial and further information in the advantages of using XML.

You may want to have a configuration file for your module in order to store values you wish to vary, such as cuts or other parameters. In such a case you can, of course, use the Reader to parse it. Here we give an example of how to do this. This example is probably more involved than a typical module configuration, but it serves to illustrate everything you are ever likely to need to know concerning navigation through a hierarchy of data.

Below we explain how to use the Reader to access your configuration data.

Accessing your configuration file

You can tell the CentralConfig where your configuration file lives by adding a <configLink> tag in the bootstrap.xml file. The information in the file can then be looked up from within your code. See the documentation on fwk::CentralConfig for an explanation of how to do this.

Navigating Data Organized in Trees

For purposes of illustrating how to navigate through a tree of data, we consider the following example XML file.

<?xml version="1.0" ?>

<document>

<!-- Example detector simulation parameters - simplified
for illustrative purposes -->

<detectorSimParameters>

  <tank id="1">
    <radius unit="m"> 1.8 </radius>
    <height unit="m"> 1.2 </height>
    <PMT id="1">
      <position unit="cm"> 0.0 120.0 60.0 </position>
      <maxQe> 0.30 </maxQe>
    </PMT>
    <PMT id="2">
      <position unit="cm"> 103.92 -60.0 60.0 </position>
      <maxQe> 0.30 </maxQe>
    </PMT>
    <PMT id="3">
      <position unit="cm"> -103.92 -60.0 60.0 </position>
      <maxQe> 0.30 </maxQe>
    </PMT>
  </tank>

  <tank id="2">
    <PMT id="3" >
      <position unit="cm"> -103.92 -60.0 60.0 </position>
      <maxQe> 0.27 </maxQe>
      <photonEnergyBin unit="eV">
        2.14  2.16  2.19  2.23  2.27  2.32
      </photonEnergyBin>
    </PMT>

    <tyvekProperties>
      <reflectivity>  0.9164 </reflectivity>
      <specularLobe>  0.2 </specularLobe>
      <specularSpike> 0.0 </specularSpike>
    </tyvekProperties>
  </tank>

</detectorSimParameters>

</document>

Note the information is organized in a heirarchical "tree" structure. In Reader parlance, each element in the XML document constitutes a Branch. For example, in the document above tank is a Branch, which has child Branches PMT and tyvekProperties. The Branch PMT, in turn, has child Branches position, maxQe and photonEnergyBin. Note that a Branch can contain either more branches, data, or both.

The begin tag for a Branch may also have (optional) "attributes." In the examples above, one attribute that appears is called id and another is called unit. The unit is meant to set the unit for data contained in a branch, and is treated in a special way, as discussed below.

Finding the top of the document

The first step in navigating a tree is to request the top branch of the XML document from the Reader. If we use the CentralConfig to look up the XML file, we can retrieve the top Branch of the document as follows:

CentralConfig* cc = CentralConfig::GetInstance();
Branch topB = cc->GetTopBranch("someConfigFile");

In our example, this will return the document branch. You can verify this using the GetBranchName method:

cout << topB.GetBranchName() << endl;

Getting the Desired Branch

Starting from the top branch, you can find any of the children. A child branch can be retrieved by name as in this example:

Branch simB = topB.GetChild("detectorSimParameters");

If you want to look up a Branch which has one or more attributes associated with it, you can load up the desired attributes into map and pass it as an argument of the GetChild method:

map<string, string> atts;
atts["id"] = "1";
Branch heightB = simB.GetChild("tank",atts);

Casting Data to the Desired Type

One of the tasks the Reader performs for you is casting the data in a branch to the requested type. Currently the following types are supported:

bool, int, float, double, char*, string
vector<bool>, vector<int>, vector<float>, vector<double>, vector<string>
list<bool>, list<int>, list<float>, list<double>, list<string>,
utl::TimeStamp, vector<utl::TimeStamp>

Casting to the desired type is handled by the utl::Branch::GetData methods. This method is overloaded with argument lists corresponding to the different supported types. For example, to return a double corresponding to the height data for the tank with id="1", one could write:

double height;
heightB.GetData(height);

where the heightB branch is set in the examples above. Retrieving data into an STL container like list or vector is also straightforward, as in the following example:

map<string, string> tankAtts;
tankAtts["id", 2];
map<string, string> pmtAtts;
pmtAtts["id", 3];
list<float> energy;
simB.GetChild("tank", tankAtts).GetChild("PMT", pmtAtts).GetChild("photonEnergyBin").GetData(energy);

The simB branch, which points to detectorSimParameters, is set in the examples above.

The Reader does not generally protect against user foolishness. If you try to cast some text to an int, for example, the Reader will do its best and return an int. There are other mechanisms to protect against such mistakes, however, as discussed in the section on validation of configuration files.

Dealing with Units

As mentioned previously, the Reader provides a mechanism to handle units associated with data in an XML file. Units may be declared via a special ``unit'' attribute, which appears inside the begin tag. In the XML file above, there are several examples of this.

When data from the XML file is cast via the utl::Branch::GetData methods, any unit attribute which may be present in the tag is evaluated and the data between the begin and end tags is multiplied by whatever factor is necessary to convert it into Auger base units. In this way you can write data in whatever units are the most convenient, and the Reader code will convert them back into "official" units. The Reader supports units expressions, so for example you could write something like:

<g unit="m/s^2"> 9.8 </g>
<aperture unit="sr*km2"> 3000 </aperture>

The Auger base units are defined inside the AugerUnits.h include file, which is a compilation of factors to convert dimensional quantities into official Auger units. For example, some of the AugerUnits.h conversion factors for length read:

static const double meter = 1.0;
static const double m = meter;
static const double millimeter = 1.e-3*meter;
static const double mm = millimeter;

As an example of how one uses these conversion factors, suppose you want to read in the following branch of the example XML file:

<radius unit="m"> 1.8 </radius>

This could be done with the following lines:

double radius;
simB.GetBranch("tank", "1").GetBranch("radius").GetData(radius);

Then, to print the radius variable in millimeters, one could use the appropriate conversion factor from AugerUnits.h, as in the following code:

cout << "radius = " << radius/mm << endl;

Validating configuration files

It is obviously a good idea to check data read in from XML files for garbage and typos before using it. Traditionally, this has been done by writing a lot of code to verify that each required piece of data is found in the file and that it is set to some reasonable value. For instance, if you browse through various modules, you may encounter snippets of code like this:

Branch dataB = topB.GetChild("data");
if (!dataB) {
ERROR("Could not find requested data");
return eFailure;
}
double fData;
dataB.GetData(fData);

Or with silent defaults:

Branch dataB = topB.GetChild("data");
double fData = 13;
if (dataB)
dataB.GetData(fData);

While this bit of code is better than nothing, it is not particularly comprehensive, as it only checks for existence of a required Branch, but doesn't do anything to verify that the data contained therein are sensible. Writing more detailed validation tends to be tedious and error prone. Most people cannot be bothered.

A better approach is to exploit one of the standard methods for XML document validation. Validation in this case means employing an auxiliary ASCII file (rather than C++ code) which sets down rules that a particular XML document must obey. When the XML document is parsed, it is checked against these rules. There are several validation standards, and we have adopted one known as Schema for most of the framework's internal configuration files. Schema files are just XML files with specially defined tags that can be used to place requirements on the structure and contents of the XML file. While Schema is very comprehensive and can get involved for complex applications, most module configuration files are simple enough that it is generally quite easy to prepare Schema files to validate them. Although it is not necessary to use Schema for your physics modules, it generally leads to more robust software, and is therefore worth considering.

Here is an example of an XML file and its corresponding Schema file.

--------------------- XML data file -----------------------------------

<aConfigFile  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:noNamespaceSchemaLocation='someplace/aConfigFile.xsd'>

  <WaterRIndex>              1.33  </WaterRIndex>
  <dEdXMuon unit="GeV">      0.2   </dEdXMuon>
  <PhotonIntLength unit="m"> 0.06 0.07 0.08 0.1 0.21 </PhotonIntLength>

</aConfigFile>

---------------------- Schema validation file -------------------------

<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
           xmlns:auger="http://www.auger.org/schema/types">

  <xs:import namespace="http://www.auger.org/schema/types"   
             schemaLocation="/someplace/AugerSchemaTypes.xsd"/>

  <xs:element name="aConfigFile">  
    <xs:complexType>   
      <xs:all>         
        <xs:element name="WaterRIndex" type="xs:double" 
                    minOccurs="1" maxOccurs="1"/>
        <xs:element name="dEdXMuon" type="auger:doubleWithUnit"/>
        <xs:element name="PhotonIntLength" type="auger:listOfDoublesWithUnits"/>
      </xs:all>
    </xs:complexType>
  </xs:element>

</xs:schema>

Notice first that each element in the XML file is declared in the Schema file by a line beginning with <xs:element .... The xs: prefix is just a declaration of the Schema standard namespace, which is analogous to the std namespace in C++. The first element to be declared in the Schema is aConfigFile, the outermost element in the XML file. Next, the <xs:complexType> tag indicates that contents of the aConfigFile element are "complex," meaning that these contents can comprise, among other things, a number of sub-elements. In this example the sub-elements of aConfigFile in the XML file are waterRIndex dEdXMuon, and PhotonIntLength. In the Schema, each of these three sub-elements is declared to be of a particular type; WaterRIndex is declared as a double, dEdXMuon is a double with a unit attached to it, and PhotonIntLength is a list of doubles with units attached to each. Notice that doubleWithUnit and listOfDoublesWithUnit types live in a namespace called auger: rather than xs:. This is because these types are not standard, but are defined in the file AugerSchemaTypes.xsd, as they prove useful for some of our applications. The line in the schema file which begins <xs:import namespace=....> is an instruction to import these special Auger data types from the file where they are defined. The minOccurs and maxOccurs attributes appearing in the declaration of the WaterRIndex element specify that this element must appear exactly once. Finally the <xs:all> tags surrounding the declarations of the three elements sets the requirement that all three of the declared elements must be found in the XML file.

When the XML file above is parsed, it is checked against the Schema file, whose location is specified by the xsi:noNamespaceLocation='... attribute of the <aConfigFile... element. Any violations of the rules set forth in the Schema result in an error message specifying the nature of the offense and line and column where it occurs. For example, the following mistakes would be caught during validation:

  <WaterRIndex> 1.33xs  </WaterRIndex>           <!-- stray characters -->
  <WaterRIndex unit="EeV> 1.33  </WaterRIndex>   <!-- should not have a unit -->
  <dEdXMuon unit="GeV">  0.2 0.3   </dEdXMuon>   <!-- only one double allowed -->
  <dEdXMuon> 0.2 </dEdXMuon>                     <!-- forgot the unit -->
  <PhotonIntLength unit="m">  </PhotonIntLength> <!-- forgot the data -->

Furthermore, if any of the three declared elements is missing from the XML file, the <all> condition would be violated, and a corresponding error reported. Programming this level of error checking on your own would require significantly more typing (and debugging!) than preparing a Schema file.

If you retrieve the top Branch of a configuration file using the CentralConfig, then Schema validation will be performed if (and only if) you specify a schema file using the xsi:noNamespaceSchemaLocation attribute of the top element of your XML document. If you leave that attribute out, or if the file it points to is nonexistent, then no validation will be performed. If you use the Reader utility directly to open and parse and XML file, they you can explicitly switch on and off validation. See the doxygen documentation for the Reader class for details.

More detailed explanations and tutorials related XML and Schema abound on the WWW.

Author
T. Paul
P. Cattaneo
D. Veberic
J. Gonzalez

Definition at line 25 of file Reader.h.

Member Enumeration Documentation

Enumerator
eFromFile 
eFromMemBuf 

Definition at line 54 of file Reader.h.

Enumerator
eDTD 
eSCHEMA 
eNONE 

Definition at line 28 of file Reader.h.

Constructor & Destructor Documentation

utl::Reader::Reader ( const std::string &  name,
const EValidationType  validationType = Reader::eNONE 
)

Constructor with validation options.

Parameters
nameof the XML file
validationTypevalidation option

Definition at line 34 of file Reader.cc.

utl::Reader::Reader ( const ReaderStringInput inputString,
const EValidationType  validationType = Reader::eNONE 
)

Constructor from an input string (as opposed to file). EValidationType not (yet) used.

Parameters
inputStringis input XML string.
validationTypevalidation option

Definition at line 45 of file Reader.cc.

References utl::ReaderStringInput::GetInputString().

utl::Reader::Reader ( Branch branch)

Definition at line 52 of file Reader.cc.

References utl::Branch::GetTopBranch(), and utl::Branch::HasTopBranch().

utl::Reader::Reader ( const Reader re)
private

Member Function Documentation

Branch utl::Reader::GetTopBranch ( ) const
inline
string utl::Reader::GetUri ( ) const

Definition at line 89 of file Reader.cc.

void utl::Reader::Initialize ( void  )
private

Definition at line 61 of file Reader.cc.

References WARNING.

Reader& utl::Reader::operator= ( const Reader re)
private
Branch utl::Reader::Parse ( const std::string &  input,
const EValidationType  validationType,
const EInputType  inputType 
)
staticprivate

Definition at line 106 of file Reader.cc.

References fwk::AsString(), ERROR, exists, exit, and WARNING.

void utl::Reader::Terminate ( )
staticprivate

Definition at line 79 of file Reader.cc.

References INFO.

Member Data Documentation

ReaderErrorReporter * utl::Reader::fgErrReporter = nullptr
staticprivate

Definition at line 70 of file Reader.h.

bool utl::Reader::fgInitialized = false
staticprivate

Definition at line 71 of file Reader.h.

Branch utl::Reader::fTopBranch
private

Definition at line 64 of file Reader.h.

Referenced by GetTopBranch().


The documentation for this class was generated from the following files:

, generated on Tue Sep 26 2023.