Utility for parsing XML files. More...
#include <Reader.h>
Public Types | |
enum | EValidationType { eDTD, eSCHEMA, eNONE } |
Public Member Functions | |
Branch | GetTopBranch () const |
Get the top Branch (represents same entity as document node) More... | |
std::string | GetUri () const |
Reader (const std::string &name, const EValidationType validationType=Reader::eNONE) | |
Constructor with validation options. More... | |
Reader (const ReaderStringInput &inputString, const EValidationType validationType=Reader::eNONE) | |
Constructor from an input string (as opposed to file). EValidationType not (yet) used. More... | |
Reader (Branch &branch) | |
Private Types | |
enum | EInputType { eFromFile = 0, eFromMemBuf } |
Private Member Functions | |
void | Initialize () |
Reader & | operator= (const Reader &re) |
Reader (const Reader &re) | |
Static Private Member Functions | |
static Branch | Parse (const std::string &input, const EValidationType validationType, const EInputType inputType) |
static void | Terminate () |
Private Attributes | |
Branch | fTopBranch |
Static Private Attributes | |
static ReaderErrorReporter * | fgErrReporter = nullptr |
static bool | fgInitialized = false |
Utility for parsing XML files.
Utility for reading data from XML files.
The various framework configuration files use XML format. The rationale for using XML, rather than some homemade ASCII format, is that it is a universally adopted standard which is designed to be both a human readable and machine parsable way of organizing data, and, as such, there are several well written XML parsers which are freely available. As part of the Offline code, we provide a utility called Reader which is based on the Xerces parser, and provides ease-of-use at the cost of a small reduction in functionality compared to Xerces. See GAP-2001-44 for details on the Reader code as well as a brief XML tutorial and further information in the advantages of using XML.
You may want to have a configuration file for your module in order to store values you wish to vary, such as cuts or other parameters. In such a case you can, of course, use the Reader to parse it. Here we give an example of how to do this. This example is probably more involved than a typical module configuration, but it serves to illustrate everything you are ever likely to need to know concerning navigation through a hierarchy of data.
Below we explain how to use the Reader to access your configuration data.
You can tell the CentralConfig where your configuration file lives by adding a <configLink>
tag in the bootstrap.xml
file. The information in the file can then be looked up from within your code. See the documentation on fwk::CentralConfig for an explanation of how to do this.
For purposes of illustrating how to navigate through a tree of data, we consider the following example XML file.
<?xml version="1.0" ?> <document> <!-- Example detector simulation parameters - simplified for illustrative purposes --> <detectorSimParameters> <tank id="1"> <radius unit="m"> 1.8 </radius> <height unit="m"> 1.2 </height> <PMT id="1"> <position unit="cm"> 0.0 120.0 60.0 </position> <maxQe> 0.30 </maxQe> </PMT> <PMT id="2"> <position unit="cm"> 103.92 -60.0 60.0 </position> <maxQe> 0.30 </maxQe> </PMT> <PMT id="3"> <position unit="cm"> -103.92 -60.0 60.0 </position> <maxQe> 0.30 </maxQe> </PMT> </tank> <tank id="2"> <PMT id="3" > <position unit="cm"> -103.92 -60.0 60.0 </position> <maxQe> 0.27 </maxQe> <photonEnergyBin unit="eV"> 2.14 2.16 2.19 2.23 2.27 2.32 </photonEnergyBin> </PMT> <tyvekProperties> <reflectivity> 0.9164 </reflectivity> <specularLobe> 0.2 </specularLobe> <specularSpike> 0.0 </specularSpike> </tyvekProperties> </tank> </detectorSimParameters> </document>
Note the information is organized in a heirarchical "tree" structure. In Reader parlance, each element in the XML document constitutes a Branch. For example, in the document above tank
is a Branch, which has child Branches PMT
and tyvekProperties
. The Branch PMT
, in turn, has child Branches position
, maxQe
and photonEnergyBin
. Note that a Branch can contain either more branches, data, or both.
The begin tag for a Branch may also have (optional) "attributes." In the examples above, one attribute that appears is called id
and another is called unit
. The unit
is meant to set the unit for data contained in a branch, and is treated in a special way, as discussed below.
The first step in navigating a tree is to request the top branch of the XML document from the Reader. If we use the CentralConfig to look up the XML file, we can retrieve the top Branch of the document as follows:
In our example, this will return the document
branch. You can verify this using the GetBranchName
method:
Starting from the top branch, you can find any of the children. A child branch can be retrieved by name as in this example:
If you want to look up a Branch which has one or more attributes associated with it, you can load up the desired attributes into map
and pass it as an argument of the GetChild method:
One of the tasks the Reader performs for you is casting the data in a branch to the requested type. Currently the following types are supported:
Casting to the desired type is handled by the utl::Branch::GetData methods. This method is overloaded with argument lists corresponding to the different supported types. For example, to return a double corresponding to the height
data for the tank
with id="1"
, one could write:
where the heightB
branch is set in the examples above. Retrieving data into an STL container like list
or vector
is also straightforward, as in the following example:
The simB
branch, which points to detectorSimParameters
, is set in the examples above.
The Reader does not generally protect against user foolishness. If you try to cast some text to an int, for example, the Reader will do its best and return an int. There are other mechanisms to protect against such mistakes, however, as discussed in the section on validation of configuration files.
As mentioned previously, the Reader provides a mechanism to handle units associated with data in an XML file. Units may be declared via a special ``unit'' attribute, which appears inside the begin tag. In the XML file above, there are several examples of this.
When data from the XML file is cast via the utl::Branch::GetData methods, any unit attribute which may be present in the tag is evaluated and the data between the begin and end tags is multiplied by whatever factor is necessary to convert it into Auger base units. In this way you can write data in whatever units are the most convenient, and the Reader code will convert them back into "official" units. The Reader supports units expressions, so for example you could write something like:
<g unit="m/s^2"> 9.8 </g> <aperture unit="sr*km2"> 3000 </aperture>
The Auger base units are defined inside the AugerUnits.h include file, which is a compilation of factors to convert dimensional quantities into official Auger units. For example, some of the AugerUnits.h conversion factors for length read:
As an example of how one uses these conversion factors, suppose you want to read in the following branch of the example XML file:
<radius unit="m"> 1.8 </radius>
This could be done with the following lines:
Then, to print the radius
variable in millimeters, one could use the appropriate conversion factor from AugerUnits.h, as in the following code:
It is obviously a good idea to check data read in from XML files for garbage and typos before using it. Traditionally, this has been done by writing a lot of code to verify that each required piece of data is found in the file and that it is set to some reasonable value. For instance, if you browse through various modules, you may encounter snippets of code like this:
Or with silent defaults:
While this bit of code is better than nothing, it is not particularly comprehensive, as it only checks for existence of a required Branch, but doesn't do anything to verify that the data contained therein are sensible. Writing more detailed validation tends to be tedious and error prone. Most people cannot be bothered.
A better approach is to exploit one of the standard methods for XML document validation. Validation in this case means employing an auxiliary ASCII file (rather than C++ code) which sets down rules that a particular XML document must obey. When the XML document is parsed, it is checked against these rules. There are several validation standards, and we have adopted one known as Schema for most of the framework's internal configuration files. Schema files are just XML files with specially defined tags that can be used to place requirements on the structure and contents of the XML file. While Schema is very comprehensive and can get involved for complex applications, most module configuration files are simple enough that it is generally quite easy to prepare Schema files to validate them. Although it is not necessary to use Schema for your physics modules, it generally leads to more robust software, and is therefore worth considering.
Here is an example of an XML file and its corresponding Schema file.
--------------------- XML data file ----------------------------------- <aConfigFile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='someplace/aConfigFile.xsd'> <WaterRIndex> 1.33 </WaterRIndex> <dEdXMuon unit="GeV"> 0.2 </dEdXMuon> <PhotonIntLength unit="m"> 0.06 0.07 0.08 0.1 0.21 </PhotonIntLength> </aConfigFile> ---------------------- Schema validation file ------------------------- <xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:auger="http://www.auger.org/schema/types"> <xs:import namespace="http://www.auger.org/schema/types" schemaLocation="/someplace/AugerSchemaTypes.xsd"/> <xs:element name="aConfigFile"> <xs:complexType> <xs:all> <xs:element name="WaterRIndex" type="xs:double" minOccurs="1" maxOccurs="1"/> <xs:element name="dEdXMuon" type="auger:doubleWithUnit"/> <xs:element name="PhotonIntLength" type="auger:listOfDoublesWithUnits"/> </xs:all> </xs:complexType> </xs:element> </xs:schema>
Notice first that each element in the XML file is declared in the Schema file by a line beginning with <xs:element ...
. The xs:
prefix is just a declaration of the Schema standard namespace, which is analogous to the std
namespace in C++. The first element to be declared in the Schema is aConfigFile
, the outermost element in the XML file. Next, the <xs:complexType>
tag indicates that contents of the aConfigFile
element are "complex," meaning that these contents can comprise, among other things, a number of sub-elements. In this example the sub-elements of aConfigFile
in the XML file are waterRIndex
dEdXMuon
, and PhotonIntLength
. In the Schema, each of these three sub-elements is declared to be of a particular type; WaterRIndex
is declared as a double, dEdXMuon
is a double with a unit attached to it, and PhotonIntLength
is a list of doubles with units attached to each. Notice that doubleWithUnit
and listOfDoublesWithUnit
types live in a namespace called auger:
rather than xs:
. This is because these types are not standard, but are defined in the file AugerSchemaTypes.xsd, as they prove useful for some of our applications. The line in the schema file which begins <xs:import namespace=....>
is an instruction to import these special Auger data types from the file where they are defined. The minOccurs
and maxOccurs
attributes appearing in the declaration of the WaterRIndex
element specify that this element must appear exactly once. Finally the <xs:all>
tags surrounding the declarations of the three elements sets the requirement that all three of the declared elements must be found in the XML file.
When the XML file above is parsed, it is checked against the Schema file, whose location is specified by the xsi:noNamespaceLocation='...
attribute of the <aConfigFile...
element. Any violations of the rules set forth in the Schema result in an error message specifying the nature of the offense and line and column where it occurs. For example, the following mistakes would be caught during validation:
<WaterRIndex> 1.33xs </WaterRIndex> <!-- stray characters --> <WaterRIndex unit="EeV> 1.33 </WaterRIndex> <!-- should not have a unit --> <dEdXMuon unit="GeV"> 0.2 0.3 </dEdXMuon> <!-- only one double allowed --> <dEdXMuon> 0.2 </dEdXMuon> <!-- forgot the unit --> <PhotonIntLength unit="m"> </PhotonIntLength> <!-- forgot the data -->
Furthermore, if any of the three declared elements is missing from the XML file, the <all>
condition would be violated, and a corresponding error reported. Programming this level of error checking on your own would require significantly more typing (and debugging!) than preparing a Schema file.
If you retrieve the top Branch of a configuration file using the CentralConfig, then Schema validation will be performed if (and only if) you specify a schema file using the xsi:noNamespaceSchemaLocation
attribute of the top element of your XML document. If you leave that attribute out, or if the file it points to is nonexistent, then no validation will be performed. If you use the Reader utility directly to open and parse and XML file, they you can explicitly switch on and off validation. See the doxygen documentation for the Reader class for details.
More detailed explanations and tutorials related XML and Schema abound on the WWW.
|
private |
utl::Reader::Reader | ( | const std::string & | name, |
const EValidationType | validationType = Reader::eNONE |
||
) |
utl::Reader::Reader | ( | const ReaderStringInput & | inputString, |
const EValidationType | validationType = Reader::eNONE |
||
) |
Constructor from an input string (as opposed to file). EValidationType not (yet) used.
inputString | is input XML string. |
validationType | validation option |
Definition at line 45 of file Reader.cc.
References utl::ReaderStringInput::GetInputString().
utl::Reader::Reader | ( | Branch & | branch | ) |
Definition at line 52 of file Reader.cc.
References utl::Branch::GetTopBranch(), and utl::Branch::HasTopBranch().
|
private |
|
inline |
Get the top Branch (represents same entity as document node)
Definition at line 45 of file Reader.h.
References fTopBranch.
Referenced by utl::Branch::Clone(), fwk::CentralConfig::GetTopBranch(), main(), fwk::CentralConfig::ReplaceParameters(), TabularTankResponseNS::TankResponse::TankResponse(), testAtmLowLevelAtmInterface::testGOESDB(), DBConnectionTest::testMasterConnection(), and ReaderTest::testStringParse().
|
private |
|
staticprivate |
|
staticprivate |
|
staticprivate |
|
private |
Definition at line 64 of file Reader.h.
Referenced by GetTopBranch().