ORIGTILD

By Vinzenz Unger and Anchi Cheng

This program serves a number of different purposes such as data merging, calculation of common phase origins or the refinement of tilt parameter.

The combination of data from different images requires the determination of their common phase origin ($nprog=1,3 - see following sections). This calculation as well as options for the refinement of several image parameter like the tilt geometry ($nprog=3) and amount of beamtilt ($nprog=1,3) are the most important applications for ORIGITLITD. Furthermore, ORIGTILTD serves to merge image data ($nprog=0) and to provide output data files that are suitable for subsequent averaging and lattice line fits. The example given in the protocol section is for a single image only. If more than one image is used, then a separate set of parameter (indicated by bold type) is required for each image. Paying attention to a proper placing for the film identifier numbers (see comments for CTFAPPLY and protocols) can save a lot of time as the program will abort, if the identifier number and the number at the head of the corresponding list of input data are not identical. This ensures that only the intended data are compared and an explicit error message will be written out if this is not the case. Furthermore, it is desirable that all film identifier numbers are chosen unique as otherwise some anomalies will occur in the calculation of reference data for $nprog=1.

In many cases a constant resolution cutoff (setting $res ) may be suitable. However, especially for specimens that are only ordered to 5-10Å, this parameter may differ between images. In this case more generalized settings (e.g. $res1=…, $res2=…, etc) can be defined if one does not want to set the limits separately for each image. The negative number (-1 in the protocol) instead of a regular film identifier number signals the end of the list and must be included to avoid an “end of file” reading error. Current program dimension only allow up to 1000 reflections per image. This may become a serious limitation for well-ordered specimen with large unit cells, in which case one needs to change the appropriate dimension statements in the source code.

Compared to other programs a relatively large number of parameter needs to be adjusted for each image. What parameter need to be changed depends on the general operating mode of the program, specified by $nprog:

To find the rough position of the origin for a new image the equivalent of a whole unit cell (i.e. phase shifts up to 360˚ along h and k) should be searched for the position of best agreement between the new data and the reference. The agreement of the data is represented by the so called phase residual, which is the average of the phase errors for all the reflections in the same resolution band. Hence, the smaller the residual, the better the agreement. To account for a whole unit cell, the number of positions for which phase residuals are calculated (called $array in the protocol) and the maximum increment for the applied phase shifts ($step) should be set so that [$array * $step] ≥ 360. For instance, settings of $array=73 and $step=5 are reasonable starting values. A more precise origin is then obtained by searching on a smaller grid, e.g. $step=0.5 after adjustment of the values for ORIGH and ORIGK to the position found in the earlier run (i.e. $array can remain unchanged). As explained in more detail in the main text a resolution cutoff of ~15Å is recommended for all initial refinements to avoid inaccurate results caused by errors in the CTF-correction or the determination of the tiltgeometry.

If the data are from the image of a tilted crystal one also needs to specify the tilt geometry for the image. The relevant parameters are TAXA and TANGL whose determination was described in the main section of the text (see also Fig.14-16). Furthermore, it is important to specify an appropriate z*-cutoff ($zstar) as soon as data from tilted crystals are present. Reflections that are within ±$zstar in reciprocal space are considered “close enough” to be included in the calculation of an average phase and amplitude which in turn serve as reference for comparison with the new phase and the calculation of an amplitude scale factor (see explanation for $nprog=0). Whether two measurements are “close enough in reciprocal space” only depends on the thickness of the specimen since this will determine how quickly the transform is able to change. In other words, choosing an appropriate value for $zstar ensures that new data are only compared to those parts of the molecular transform that contribute to the specific projection of the structure that is described by the image in question. Unfortunately, the precise specimen thickness will not be known in most cases. Therefore it is prudent to use a value of ≈ [1/2x estimated specimen thickness] as starting estimate for the setting of $zstar. This estimate is not to be confused with the setting of the width of the real space envelope function used by LATLINED (see there for more detail) but is equivalent with the “binsize” used to obtain the initial guess in LATLINED.

The calculation of the phase origin will likely be reliable if it is based on at least 15-20 common spots. Calculations that use fewer spots are very likely to represent a false minimum and hence should not be trusted. This criterion can potentially cause problems when attempting to merge data from tilted crystals. Depending on the specimen thickness and the absolute distance in reciprocal space between the new reflections and the reflections that comprise the reference, only the spots on or very close to the tilt axis may be common at the beginning. In many cases, especially for thick specimen with small unit cell dimensions this will rapidly result in a situation where no reliable phase origin can be identified. For instance, for the gap junction channel specimen that was used for most of the examples in the main text, data from tilted crystals could not be refined against a set of reference data from untilted crystals, if the tilt exceeded ~5˚. Fortunately, this condition relaxed to a margin of about [maximum tilt in reference +10˚] once some 3D data were collected, yet the example emphasizes that the range of tilts for which data need to be collected has to be carefully tuned to meet the characteristics of a particular specimen. For thinner specimen this will be less critical and in many cases one can try to find a reliable origin by initially choosing a larger $zstar value for the initial refinements. However, ultimately there is no better way than collecting overlapping sets of tilt data to produce the redundancy that will allow unambiguous phase origin calculations and provides the basis for reliable refinements of the CTF and tilt geometry.

Taking into account all of the above, it will still be impossible to identify the phase origin in some cases because there are factors other than a simple phase shift or proximity in reciprocal space that may need to be taken into consideration. For instance, if the image was accidentally recorded at over- rather than underfocus, then the standard CTF-correction (which assumes underfocus) would have resulted in the addition of 180˚ to the phases on the wrong bands of the modulation which makes it impossible to define a common phase origin. To circumvent these and similar problems a number of options are available that allow to use data from images that otherwise would need to be (partially) reprocessed. For example, the parameter CTFREV changes the CTF from underfocus to overfocus by adding 180˚ to the phases of all reflections from this particular image. Because this reverses the CTF-correction that had been applied a common origin should now be evident, if over- rather then underfocus was the problem. Similarly, the characteristics of the molecular transform are usually not known at the beginning. This can lead to “problems” in defining a common phase origin between two images because in a number of plane groups there a several, apparently equivalent, possibilities to index the original transform. In some cases one may be fortunate in that a characteristic pattern of the intensities for certain reflections may be useful to index the transforms in a consistent manner. However, in other cases this may not be as easy. The options SGNXCH (flipping of input data around a*), ROT180 (rotating input data by 180˚ about z*) and REVHK (reversing h and k upon input) can be used to investigate and compensate for disagreements that are caused by inconsistent indexing. Depending on the two sided plane group, the options SGNXCH (p2) and REVHK (e.g. p1,p3,p4,p6) can be used to “flip” the image “upside down” and ROT180 is useful in p1 and p3 where the phases of the Friedel mates are related by φ(h,k,l) = -φ(-h,-k,-l). To test for any of these cases or possible combinations, the input card for the appropriate options need to be changed from 0 to 1. It should, however, be noted that these options cannot account for errors in setting TAXA and TANGL. The latter two need to be correct with regard to the h,k axes that were originally chosen.

If the data extend to better than 5Å, then the effects of beamtilt (TILTH, TILTK) can no longer be neglected. However, it is important to remember that beamtilt refinements should only be carried out once a data set (projection or 3D) is available which had the CTF and tilt geometry - even for images of nominally “untilted” crystals - refined to consistency out to ~5-6Å resolution. The beamtilt subroutine will be invoked if logical NBM=T (see protocol). A simple list of merged data points or a fitted 3D data set can be used as reference in this case. This is different from the refinement of the tilt geometries (NTL=T), which requires a set of fitted 3D data as input (see below, NPROG=3) for an iterative refinement. Of course, for NTL=F, values for TAXA and TANGL can be changed manually and several searches are performed to identify settings that result in the best overall phase residual (a jiffy program called ORIGTILTI can be obtained from us to do this where applicable), yet this is somewhat tedious and the main application would be to refine a first rough 3D data set before attempting to fit lattice lines that can be used to refine tilt geometries.

Common to all phase origin refinements is that a suitable IQ-cutoff ($iq) needs to be set to allow reliable calculations. It seems useful to draw attention to the fact that each reflection of the “new” film (i.e. the image that is actually refined) is treated with equal weigth (settings IORIGT=1and NWGT=F) regardless of its signal to noise ratio. This differs from the internal calculation of the reference phase for this program mode. For the reference phase the phase angles of all reflections within ±$zstar are averaged in a manner that relates the contribution of an individual reflection related to its IQ value. Spots with IQ1 and 2 are treated equal while a simple divison by the value of the IQ was chosen as weighting scheme for all remaining spots. Consequently, the contribution of strong measurements is far (!!) less dominant than for the more elaborate weights used in other programs. For instance, a single measurement with an IQ of 1 is completely overruled by as little as four IQ7 measurements and only three IQ7 measurements are needed to outweigh an IQ3 reflection. Accordingly, only the stronger spots should be used for refinements if $nprog=1, because the impact of noise on the refinements can easily result in significant distortions of the real data, especially if the number of independent measurements for each part of the transform is small. Taken together, equal weight of each “new” reflection and a grossly simplified weighting for calculation of the reference phase values causes a rapid increase of the phase residuals if the weaker spots (IQ5-8) are included. An obvious question is, if these “problems” could be at least partially avoided by changing the current settings for IORIGT or NWGT? Currently the answer is no, because the option to use NWGT=T has not been fully implemented in the program and IORIGT itself only controls whether a single image is compared against the average of the remaining data or if the single image serves as reference to compare all relevant data points in the data list. However, the increase in the phase residual due to these limitations is no reason for concern per se as long as the residuals stay below random values and the majority of data that are used for refinement procedures have a significant signal to noise ratio (e.g. IQ ≤ 5). In these cases and provided that the redundancy in the data is large enough the true signal will still control the outcome of any refinement step. With all this in mind it should now be easier to understand, why using MMBOXA instead of MMBOX will prove beneficial for the quality of data sets and will make it easier to merge and refine data. Nevertheless, a keen skepticism is warranted if weaker data are used and one should always examine results of refinements very carefully.

One observation that can often be made upon including weaker and/or higher resolution data is that the phase origin may shift by several degree from the position that was calculated at ~15Å resolution. This prompts the question if these changes are real/beneficial or mainly reflect noise in the data? Generally, these shifts will present a problem (caused by noise) if the corresponding changes of the density distribution in real space are at a scale close to the resolution limits of the specimen. Whether this applies or not depends on the unit cell dimension and the resolution of the specimen. All phase shifts given by ORIGTILTD are calculated to represent changes to the phase of the (1,0) and (0,1) reflections. Assumed that a specimen had unit cell dimensions of a=150Å, b=75Å andγ=90˚ a change of the phase origin by 10˚ along ORIGH and ORIGK would correspond to a movement of the unit cell contents by [10˚*150Å/360˚]=4.2Å and [10˚*75Å/360˚]=2.1Å respectively. Even if the specimen was ordered to a resolution of only ≈8Å the shift in the direction along a would still be too large to be acceptable. This is even more true if the data extend to better than 5Å, where a phase shift of this magnitude would start to average out details. Usually, origin shifts are in the order of 2-5˚ and even after extensive refinements, shifts of ~1˚ are common if the data are refined in mode $nprog=1. Since these shifts are caused by the noise in the data it does not improve the situation if the calculated changes are incorporated. Any change made to one image will prompt changes of similar magnitude in other images, i.e. the data have successfully converged once this point is reached.

SUMMARY: Using option $nprog=1 serves to determine the following parameter ORIGH, ORIGK, SGNXCH, ROT180, REVHK, CTFREV, (TILTH, TILTK, TAXA, TANGL). The following general parameter need to be adjusted for the phase origin search: $ispg (plane group number determined by ALLSPACE), $res, $step, $array, $zstar and $iq. The following settings are fixed for all runs at $nprog=1: NWGT=F, IORIGT=1, NTL=F and SCALE=0 (for all images). Logical NBM=T for beamtilt refinement is possible. Furthermore: the “dummy” file (see nprog=0) used for merging needs to be replaced by a list of reference data (=previous output of ORIGTILTD, $nprog=0). Some more detailed output (list of film data before shift, number of spots used to calculate reference phase, reference phase and a listing of the film data after applying final phase shift) can be obtained by setting ILIST=1.

To merge data, the reference data file is replaced by either a “dummy” file, containing only an identifier number ($dummy) or a properly formated input data set that contains data for a complete asymmetric unit. Using a “dummy-file” is far easier and will be the only solution in most cases because it allows creating a first reference data set in a single step, by bringing the data of the best image of an untilted crystal to the phase origin that is consistent with the highest possible symmetry indicated by ALLSPACE.