Table Of Contents

Previous topic

Gromacs NDX index file format

Next topic

Preprocessor in Python

This Page

Gromacs topology file (ITP) parser

New in version 0.2.5.

Basic reading, manipulating and writing works but can still be a bit awkward. See documentation for ITP.

Warning

The code is still experimental.

Limitations

  • only one [moleculetype] at the moment (need to distinguish by molecule name) or maybe subsequent ones overwrite previous ones — not tested so better only use a single moleculetype (TODO)
  • merges multiple dihedral sections into one
  • probably fails for FEP itps (TODO)
  • does not reproduce positions of comment-only lines in sections (in fact, currently we do not even write them out again, only trailing line comments are kept and the header before the first section)
  • only simple preprocessor directives such as #ifdef POSRES are processed by the Preprocessor. Preprocessor directives are evaluated and stripped so that ITP represents one functional itp file. The directives are not stored and cannot be written out or accessed.

Example

An itp file can be loaded and individual sections accessed (print shows a representation of what it would look like written to a file):

>>> itp = gromacs.fileformats.ITP("benzylhyd5.itp")
>>> print itp.header
>>> ...
>>> print itp.header.moleculetype
[ moleculetype ]
; Name      nrexcl
5FH         3

Each section is an object that contains the parsed data in a data attribute:

>>> print itp.header.moleculetype.data
odict.odict([('name', '5FH'), ('nrexcl', 3)])
>>> print itp.header.moleculetype.data['name']
5FH

The moleculetypes section contains the important subsections that make up the topology:

>>> print itp.header.moleculetype.sections.keys()
['atoms', 'bonds', 'angles', 'dihedrals', 'pairs']

They can be accessed in the same manner. For instance, counting all atom records:

>>> print len(itp.header.moleculetype.atoms)
24

For the atoms, the data structure is a numpy.rec.recarray.

Data can be modified and then finally a new itp file can be written with the ITP.write() method.

>>> itp.write("modified.itp")

User classes

The ITP class always represents one itp file. It contains a tree of parsed subsections that correspond to the subsections in the itp file. The way to drill down is to access the attribute ITP.sections, where the subsections are stored. ITP.sections acts like a dictionary. Each subsection contains again a ITPsection.sections dictionary containing its own subsections. To find the [ atoms ] data:

itp.sections['header'].sections['moleculetype'].sections['atoms'].data

For convenience, one can also use the keys in the sections dictionary directly as attributes. In this way finding the [ atoms ] data is even simpler:

itp.header.moleculetype.atoms.data

For more details on the organization of the data structure see Classes corresponding to ITP sections.

class gromacs.fileformats.itp.ITP(filename=None, **kwargs)

Class that represents a Gromacs topology itp file.

The file is processed by a Preprocessor according to directves in the file and variables defined in the constructor keyword arguments. Thus, the class ITP represents one particularly state of the itp file with any preprocessor constructs excluded. Only a limited set of directives is understood and a SyntaxError is raised in cases where the preprocessor does not know how to handle the file.

The file is parsed into a hierarchy of sections (and subsections), which are accessible in the attribute ITP.sections.

Data of (sub)sections is stored in data, e.g. the [atoms] section

itp.sections['header'].sections['moleculetype'].sections['atoms'].data

contains the list of atoms for molecule

itp.sections['header'].sections['moleculetype'].data['name']

Data attributes are typically (ordered) dictionaries or NumPy recarrays.

The top-level section is called “header” and simply contains all comment lines before the first real parsed section.

Warning

Not all section types are implemented. There can only be a single type of section at each level of the hierarchy.

Only a single [moleculetype] is currently supported.

Subsequent [dihedrals] sections are merged but if another section intersperses two [dihedral] sections then the first one is lost.

Comments are stripped except at the beginning of the file and at the end of data lines.

New in version 0.2.5.

Changed in version 0.3.1: Access to sections by attribute access is possible.

Initialize ITP structure.

Arguments :
filename

read from mdp file

kwargs

#define VAR variables that are used at the pre-processing stage of the itp file, e.g. POSRES = True will activate a #ifdef POSRES section in the itp file.

See also

preprocessor describes details about the implementation of preprocessing of the itp file, including the (limited) syntax understood by the preprocesser.

contains_preprocessor_constructs()

Check if file makes use of any preprocessor constructs.

The test is done by running the file through the Preprocessor (while stripping all empty and lines starting with a comment character. This is compared to the original file, stripped in the same manner. If the two stripped files differ from each other then the preprocessor altered the file and preprocessor directives must have been involved and this function returns True.

parse(stream)

Simple recursive descent parsing of ITP.

stream must have a next() and an unread() method. A call must supply the next line of input or raise EOFError or StopIteration.

read(filename=None, preprocess=True, **defines)

Preprocess, read and parse itp file filename.

Any keywords in defines are use to modify the default preprocessor variables (see gromacs.fileformats.preprocessor.Preprocessor.parse() for details). Setting preprocess = False skips the preprocessing step.

walk_sections(func=None, flat=True)

Generator that applies func to each section.

Traverses all sections depth first. Applies

func(name, section)

User supplied function that uses both the string name and the ITPsection instance and returns a result. Note that any lists and lists of lists will be flattened so it mostly makes sense to return a string or similar.

at each level and iterates over the individual results. With flat = True the resulting list is returned flat (default), otherwise the list represents the tree structure of the itp file.

By default, func is str(section). Thus, the whole itp file can be printed out with

print "\n".join(itp.walk_sections())

(See write() for writing out the ITP.)

write(filename)

Write ITP file to filename

Developer reference

Class ITP makes use of the following classes to parse and represent a topology as stored in an itp file.

Base classes and utility functions

class gromacs.fileformats.itp.ITPsection(parent, **kwargs)

Class to parse and store ITP data.

  • data contains the data of the section

  • sections holds the sub-parsers (including their own data; this odict mirrors the parse tree.

    Note that sections at the same level are merged.

  • parsers contains a dict of the all the parser classes (keyed by section name) that are included in this section; typically overriden.

New in version 0.2.5.

Changed in version 0.3.1: Access to sections by attribute access is possible.

walk_sections(func=None, flat=True)

Generator that applies func to each section and subsections.

Traverses all sections depth first. Applies

func(name, section)

User supplied function that uses both the string name and the ITPsection instance and returns a result. Note that any lists and lists of lists will be flattened so it mostly makes sense to return a string or similar.

at each level and iterates over the individual results. With flat = True the resulting list is returned flat (default), otherwise the list represents the tree structure of the itp file.

By default, func is str(section).

class gromacs.fileformats.itp.ITPdata(*args, **kwargs)

Class to represent most ITP sections.

  1. parse line by line and store data and comments
  2. format data as recarray; uses up columns from left to right
  3. manipulate data in the recarray or write section with section()

New in version 0.2.5.

data

atom data as a numpy.rec.array

The data inside the array can be changed but not appended.

section()

Return a string of the section data in ITP format.

Data entries containing None are stripped from the output. In accordance with Gromacs ITP parsing rules, data columns can only be ommitted from the right.

set_data(data)

data is atom data, stored as numpy.rec.arry

class gromacs.fileformats.itp.OneLineBuffer(getline)

Wrapper around anything with a next() method, to provide an unread().

Provide the next method (or any other function that returns one line of input) as the argument getline.

Implements a one-line buffer, unread() can only move the stream one line back (repeatedly calling unread() does not do anything).

New in version 0.2.5.

next()

Return next line (or previous line if unread() was called)

unread()

Reset stream to previous line, so that next call to next() rereads

gromacs.fileformats.itp.flatten(l)

Generator for flattening a nested list l.

http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists-in-python

New in version 0.2.5.

Classes corresponding to ITP sections

The ITP file is parsed with a simple recursive descending depth-first algorithm. Classes contain a dictionary of sub-section parsers.

The section hierarchy is

class gromacs.fileformats.itp.Header(*args, **kwargs)

The customary comments section before the first section.

Example format:

; itp file for Benzene
; example "header"

New in version 0.2.5.

class gromacs.fileformats.itp.Atomtypes(*args, **kwargs)

ITP [atomtypes] section.

Example format:

[ atomtypes ]
;name  bond_type      mass  charge ptype       sigma     epsilon
c3            c3    0.0000  0.0000 A     3.39967e-01 4.57730e-01
hc            hc    0.0000  0.0000 A     2.64953e-01 6.56888e-02

New in version 0.2.5.

class gromacs.fileformats.itp.Moleculetype(*args, **kwargs)

ITP [ moleculetype ] section

Example format:

[ moleculetype ]
; Name      nrexcl
5FH              3

New in version 0.2.5.

set_moleculename(molname)

Set the molecule name to molname.

  1. changes the name in [moleculetype] section
  2. changes the resname in the whole [atoms] section
class gromacs.fileformats.itp.Atoms(*args, **kwargs)

ITP [ atoms ] section

Example format:

[ atoms ]
; atomnr  atomtype   resnr  resname  atomname  chargegrp   charge       mass
       1  opls_145       1      5FH        C7          7   -0.115   12.01100 ; CA # Benzene C - 12 site JACS,112,4768-90. Use #145B for biphenyl
       2  opls_146       1      5FH       H71          7    0.115    1.00800 ; HA # Benzene H - 12 site.

The data itself are stored in data, a numpy.rec.array. data is a managed attribute but the values inside can be changed. Fields in the array have a maximum size (in particular, comments are cut off after 128 characters).

New in version 0.2.5.

dtypes = [('atomnr', 'i4'), ('atomtype', 'S10'), ('resnr', 'i4'), ('resname', 'S4'), ('atomname', 'S4'), ('chargegrp', 'i4'), ('charge', 'f8'), ('mass', 'f8'), ('atomtypeB', 'S10'), ('chargeB', 'f8'), ('massB', 'f8')]

numpy.dtype columns for the data

fmt = ['%8d', '%9s', '%7d', '%8s', '%9s', '%10d', '%8.3f', '%10.5f', '%9s', '%8.3f', '%10.5f']

output format (space separated), same ordering as columns

set_resname(name)

Changes the resname to name for all atom entries

class gromacs.fileformats.itp.Bonds(*args, **kwargs)

ITP [ bonds ] section

Example format:

[ bonds ]
; ai   aj  funct  r  k
  1    2      1 ; CA-HA # PHE, etc.
  1    3      1 ; CA-CA # TRP,TYR,PHE

New in version 0.2.5.

dtypes = [('ai', 'i4'), ('aj', 'i4'), ('func', 'i4'), ('b0', 'f8'), ('kb', 'f8')]

numpy.dtype columns for the data

fmt = ['%4d', '%4d', '%6d', '%10.5f', '%10.1f']

output format (space separated), same ordering as columns

class gromacs.fileformats.itp.Angles(*args, **kwargs)

ITP [ angles ] section

Example format:

[ angles ]
; ai   aj   ak  funct  theta   cth
   2    1    3      1 ; HA-CA-CA #
   2    1   11      1 ; HA-CA-CA #
   3    1   11      1 ; CA-CA-CA # PHE(OL)

New in version 0.2.5.

dtypes = [('ai', 'i4'), ('aj', 'i4'), ('ak', 'i4'), ('func', 'i4'), ('theta0', 'f8'), ('cth', 'f8')]

numpy.dtype columns for the data

fmt = ['%4d', '%4d', '%4d', '%6d', '%9.3f', '%10.3f']

output format (space separated), same ordering as columns

class gromacs.fileformats.itp.Dihedrals(*args, **kwargs)

ITP [ dihedrals ] section

Example format:

[ dihedrals ]
; ai   aj   ak   al  funct   C0  ...  C5
   2    1    3    4      3 ; HA-CA-CA-HA # aromatic ring (X-CA-CA-X generic proper dihedral)
   2    1    3    5      3 ; HA-CA-CA-CA # aromatic ring (X-CA-CA-X generic proper dihedral)

Note

Multiple adjacent dihedral sections are currently merged into a single dihedral section. This means that if you separated proper and improper dihedrals into two different sections then they will now appear one after another under a single [ dihedrals ] section. If two dihedral sections are separated by another section then the last one overwrites the previous one.

New in version 0.2.5.

dtypes = [('ai', 'i4'), ('aj', 'i4'), ('ak', 'i4'), ('al', 'i4'), ('func', 'i4'), ('c0', <type 'object'>), ('c1', <type 'object'>), ('c2', <type 'object'>), ('c3', <type 'object'>), ('c4', <type 'object'>), ('c5', <type 'object'>)]

numpy.dtype columns for the data

fmt = ['%4d', '%4d', '%4d', '%4d', '%5d', '%9s', '%9s', '%9s', '%9s', '%9s', '%9s']

output format (space separated), same ordering as columns

class gromacs.fileformats.itp.Pairs(*args, **kwargs)

ITP [ pairs ] section

Example format:

[ pairs ]
; ai   aj  funct
   1    6      1
   1    7      1

New in version 0.2.5.

class gromacs.fileformats.itp.Dummy(parent, **kwargs)

Not implemented, just ignores everything