CD Special Projects

Rules for Designing Chunks

You must follow these rules for designing and implementing your chunk, or your chunk will not be consistent with the EDM and may break the system. These rules were established by the EDM group to ensure a self-consistent and modular event data model. Not all of these rules can be enforced by the compiler, so designers must be aware of these rules. Component tests should test for compliance with these rules. See the section earlier in this document for a description of how chunks are used.

Rules for implementing chunks

  1. Chunks may not contain smaller units of persistence: no d0_Refs to any object is allowed within a chunk.
    The chunk is the smallest unit of persistent event data. There is no need to design in some other method of controlling the input or output of event data. If you think some item is of sufficient importance, and of appropriate size, to be an atomic object in regards to persistence, then make that object into a chunk in its own right.
  2. Each chunk must contain a closely related set of data, typically the output of one step of reconstruction.
    Each chunk must be cohesive. It is poor design practice to group unrelated data (or behavior) into a single class.
  3. Chunks may not directly refer (by value, reference or pointer) to other chunks or objects within other chunks.
    Each chunk must be self-contained, to help ensure the self-consistency of the event and to prevent excessive physical coupling in the extended EDM. Instead, make use of ChunkIDs and "dumb data" to replace the pointers. See below for the constructs to use in place of pointers. Starting with v00-01-02n-br-16 of the package edm and v00-02-03-br-03 of the package identifiers, the EDM provides "link" classes LinkIndex<>, LinkPtr<T>, etc., which provide a mechanism by which one can point from objects within one chunk to objects within another chunk, without inducing any physical coupling.
  4. Each chunk must record bookkeeping information telling how it was made and on which chunks it depends.
    It is up to the designer of each chunk to determine what record keeping information is important. For brevity, make use of RCPIDs to encode information about reconstructors, EnvIDs to encode information about "reconstruction environment" data (such as calibration data sets), and ChunkIDs to refer to parent chunks.

    Please note that as of this writing, the Calibration and Alignment group have not yet decided how to define and use the EnvID class. The version currently in the EDM is just a placeholder.

  5. The access methods of each chunk must be declared const. Setter functions are discouraged and must not be declared const.
    It is preferable for all chunk data to be set via a constructor. "Set" functions are discouraged. This is to help ensure that "parent" chunks are not modified after "children" are made from them. Such a modification could make it impossible to accurately trace the genesis of some part of the reconstructed event. Since the data model allows any number of instances of any chunk class to be in the same event, rather than modifying an existing chunk, it is always possible to add a new chunk, which differs from the "old" chunk in whatever way is required for the task at hand.

    For the same purpose, it is not allowed to declare member data mutable, except when that data is merely a cache which stores the result of a calculation that can be performed using only the const member functions of the chunk. It is most important to understand the intent of this rule, rather than the details: if one is given access to a const version of your chunk, one should not be able to call any function which modifies the externally observable state of that chunk. Caching the result of a time-consuming calculation is acceptable; such a cache variable would have to be declared mutable. Modifying any variable that corresponds to the physics parameters or bookkeeping parameters of the chunk is forbidden, for the reason expressed in the paragraph above.

  6. Each chunk must  record the appropriate bookkeeping information.
    This is so that users can determine how each chunk was created by querying the chunk itself. Note that the definition of "appropriate information" might be different for each chunk class, and the EDM can provide only the most general information. Each designer should think carefully about his design, and make sure that the information that would be interesting to users is recorded.

Requirements of the AbsChunk interface

In order to meet the AbsChunk interface, you must implement the following member functions:

  1. std::list<ChunkID> parents( ) const
    This member function returns the ChunkIDs of the "parents" of your chunk. The "parents" of a chunk are those chunks which were used in the creation of the "child" chunk.

    For example, ToyClusterColl chunks might be created using a specific ToyCalorimeter chunk, and a specific ToyVertex in a specific ToyVertexColl chunk. The ToyCalorimeter chunk has the energy deposit information, and the ToyVertex gives the z-coordinate from which the transverse energies are calculated. The parents of a specific ToyClusterColl object are the ToyCalorimeter object and the ToyVertexColl object which were used in its creation. The member function std::list<ChunkID> ToyClusterColl::parents() const should return the ChunkIDs of these two objects.

  2. std::list<RCPID> rcps( ) const
    This member function returns the RCPIDs of the RCP objects which describe the reconstructor which made the chunk. Each chunk is created by a single reconstructor object. If that reconstructor object had any parameters which were configurable at run time, it got those parameters from an RCP object, supplied to it by the framework. This reconstructor may have required, as input, other types of chunks, which were in turn created by other reconstructors. In each case, the concise but complete description of a reconstructor is given by the unique RCPID assigned to the RCP object used in its instantiation. This function must return all the RCPIDs which are relevant to the creation of the chunk. It is up to the designer of each chunk class to decide which information is relevant, and which information is not.

    To continue with the example above, we might decide that the parameters of the ToyClusterReco object which created the ToyClusterColl are important to describing the ToyClusterColl, but that the details of the ToyVertexReco object which created the ToyVertexColl are not important. In that case, we would have std::list<RCPID> ToyClusterColl::rcps() const return the RCPID of the RCP object used in the instantiation of the ToyClusterReco object, but not that used for the ToyVertexReco object.

    It is important that the designers (and review committees) pay careful attention to this member function, because this list of RCPIDs will often be used by others. For example, if several clustering algorithms have been run on a single event, the way a user writes code to select the output of a particular algorithm is to use a selector which looks at a chunk's RCPIDs, and which matches the one which returns the RCPIDs which specify the algorithm which the user wanted.

  3. std::list<EnvID> environment( ) const
    This member function returns the EnvIDs of the calibration and alignment objects (or any other similar objects) used in the creation of the given chunk. Again, it is at the discretion of the developer to decide what information is relevant. The Calibration and Alignment group has not yet determined the way in which EnvIDs are to be used, so this class is currently just a placeholder.
  4. std::string type( ) const and static std::string classType( )
    These functions are used by the EDM to ensure type-safety at run time, both when chunks are inserted into the event, and when chunks are accessed. The CHUNK_SETUP macro, defined in the AbsChunk header file is, will generate both of these functions. The same macro also invokes the macro required by DØOM for all persistent classes.
  5. void printChunk (std::ostream& os) const
    This member function is useful for debugging reconstruction code. It prints an ASCII representation of the chunk to the ostream os. It is called by the stream insertion operator ( friend operator<<( ) ) which is defined in AbsChunk. It is not necessary (and may even be counter-productive) to define operator<<( ) for classes that inherit from AbsChunk.

Requirements for persistence

The EDM will allow saving of events, and thus the chunks in the event, to permanent storage. Your chunks must therefore be designed within the strictures of DØOM, the DØ Object Model. The requirements of DØOM are documented in the DØOM User Guide. Recall that the D0_OBJECT_SETUP macro required by DØOM is implemented in CHUNK_SETUP, and must not be repeated.

References Between Chunks

The key concept behind the EDM rules for implementing references between chunks is to use "dumb data" instead of pointers, in order to prevent excessive physical coupling and in order to ensure consistency in the event, even when some chunks are deleted.

  • To refer to a specific chunk, use the ChunkID of the chunk to which you refer.
  • To refer to data within another chunk class use an integer index or other small dumb data type. Starting with v00-01-02n-br-16 of the package edm, the classes LinkIndex<>, LinkPtr<>, and associated classes are provided to allow a chunk (or an object within a chunk) to refer to an object within another chunk.
  • Use an RCPID to refer to an RCP object used in the generation of chunk data.
  • Use an EnvID to refer to a specific database object used in the generation of chunk data.

In designing a chunk or set of related chunks, you may find it useful to first plan (and perhaps even implement) the classes without reference to the EDM. To then make the classes you've designed meet the requirements of the EDM, break the inter-chunk physical coupling by replacing pointers from one chunk to another by "dumb data" indices. A fragmentary example is given below.

Example: References between chunks

Consider starting from the following design, which shows three types of chunks, each of which contains zero or more "physics objects".

UML static diagram

To make this design consonant with the EDM, we implement the associations between the ElectronColl and the other chunks by having the ElectronColl contain two ChunkIDs, one for its associated ClusterColl, and one for its associated TrackColl. We implement the containment of Electrons by the ElectronColl by giving the ElectronColl a member datum that is a std::vector<Electron>, and similarly for Clusters in the ClusterColl and Tracks in the TrackColl. Finally, we implement the association between an Electron and its associated Cluster and Track by giving the Electron two member data, one of which is the integer index of the associated Cluster in std::vector<Cluster> contained in the ClusterColl, and the other of which is the integer index of the associated Track in the std::vector<Track> contained in the TrackColl. (Note that starting with v00-01-02n-br-16 of the package edm, one can use a LinkIndex<T> instead, providing greater convenience for users of your chunk).

In code, the implementation (in part; much required for real chunks is left out for brevity) looks as follows:

  • ClusterColl contains the Clusters.
    class ClusterColl : public AbsChunk {
       private:
          std::vector<Cluster> _clusters;
       // much omitted for brevity ...
    } 
  • TrackColl contains the Tracks.
    class TrackColl : public AbsChunk {
       private:
          std::vector<Track> _tracks;
       // much omitted for brevity ...
    } 
  • ElectronColl contains the Electrons, and has associations to the ClusterColl and TrackColl. We also show a possible implementation of the member function parents( ).
    class ElectronColl : public AbsChunk {
       public:
          std::list<ChunkID> parents( ) const
            { std::list<ChunkID> result;
              result.push_back(_tracks); results.push_back(_clusters);
              return result;
            }
       private:
          ChunkID _tracks;   // ChunkID of parent tracks
          ChunkID _clusters; // ChunkID of parent clusters
          std::vector<Electron> _electrons;
       // much omitted for brevity ...
    } 
  • An Electron has associations to the appropriate Cluster and Track. (In v00-01-02n-br-16 of edm, one can use instead LinkIndex<Track> and LinkIndex<Cluster>.
    class Electron {
       private:
          int _track;   // index of associated track
          int _cluster; // index of associated cluster
          // much omitted for brevity ...
    } 

Example chunk

The class ToyClusterColl (in the reconstructor package) is an example that illustrates how to satisfy all the requirements given above.


This page last modified January 16, 2001 08:54 AM

Back to the EDM tutorial home page
Next: Selector Design
Previous: Reconstructors


This page last updated: January 16, 2001 08:54 AM
Send comments or questions to Marc Paterno and Jim Kowalkowski