Sub-millimeter precise photon interaction position determination in large monolithic scintillators via convolutional neural network algorithms

In this work, we present the development and application of a convolutional neural network (CNN)-based algorithm to precisely determine the interaction position of γ-quanta in large monolithic scintillators. Those are used as an absorber component of a Compton camera (CC) system under development for ion beam range verification via prompt-gamma imaging. We examined two scintillation crystals: LaBr3:Ce and CeBr3. Each crystal had dimensions of 50.8 mm × 50.8 mm × 30 mm and was coupled to a 64-fold segmented multi-anode photomultiplier tube (PMT) with an 8 × 8 pixel arrangement. We determined the spatial resolution for three photon energies of 662, 1.17 and 1.33 MeV obtained from 2D detector scans with tightly collimated 137Cs and 60Co photon sources. With the new algorithm we achieved a spatial resolution for the CeBr3 crystal below 1.11(8) mm and below 0.98(7) mm for the LaBr3:Ce detector for all investigated energies between 662 keV and 1.33 MeV. We thereby improved the performance by more than a factor of 2.5 compared to the previously used categorical average pattern algorithm, which is a variation of the well-established k-nearest neighbor algorithm. The trained CNN has a low memory footprint and enables the reconstruction of up to 104 events per second with only one GPU. Those improvements are crucial on the way to future clinical in vivo applicability of the CC for ion beam range verification.


Introduction
A therapeutic hadron beam used in particle therapy stops inside the patientʼs body, rendering the information on the dose deposition not as easily accessible as it is the case with photon irradiation. Among the numerous range verification techniques presently under study, there are those that rely on the detection of an ionoacoustic signal originating from thermal expansion following localized heating in the dose deposition of a pulsed ion beam (Kellnberger et al 2016), secondary charged particles (in case of carbon ion beams) (Gwosch et al 2013) or delayed annihilation γ-rays such as in positron emission tomography (PET) (Zhu and El Fakhri 2013). Finally, there are methods based on the detection of prompt γ-rays such as multi-slit camera (Smeets et al 2016), prompt-gamma spectroscopy (Verburg and Seco 2014), prompt-gamma timing (Golnik et al 2014), γ-PET, also called triple-γ PET or whole gamma imaging (Lang et al 2014, Manzano et al 2015, Yoshida et al 2020, or Compton camera (CC) based imaging systems (Krimmer et al 2015, Polf et al 2015, Llosá et al 2016, Aldawood et al 2017. In this paper the focus is set on the latter approach. The basic components of a CC system are two detectors: a scatterer and an absorber. Depending on the energy, the initial γ-ray undergoes Compton scattering in the first detector and should then be fully absorbed in the second one. The goal of a single-event reconstruction is to determine the so-called Compton cone, which is a Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. collection of all possible locations from which the detected γ-quantum could have been emitted. By sequentially detecting multiple γ-rays originating from the same point, it is possible to get the source location at the intersection of the reconstructed Compton cones.
The use of monolithic scintillators as an absorber component is an alternative to pixelated detectors, as being well-established in PET scanners, providing better energy resolution, higher sensitivity as well as cheaper and easier manufacturing. All these advantages, however, come at the cost of a more complex derivation of spatial information that has to be carried out with the help of designated algorithms.
Several algorithms have been proposed so far, notably the k-nearest neighbors (kNN) and categorical averaged pattern (CAP) (van Dam et al 2011) algorithms. The main principle of both are lookup tables, here called reference libraries, containing the position dependent detector response, i.e. the scintillation light amplitude distribution across the (segmented) photosensor array. During the reconstruction process, the light amplitude distribution of an unknown event is compared with all reference library entries. Then, according to specific rules, the photon interaction position is determined by the maximum of a 2D histogram, which contains a selectable number of k most similar events. Although both algorithms achieve satisfactory spatial resolution, these methods are not applicable in real-time practice due to their long computational times.
In this paper we propose an algorithm for determining the position of γ-ray interactions in a monolithic scintillation crystal, which is based on supervised machine learning involving convolutional network networks (CNN). We show the development steps starting from the network architecture optimization, through the determination of the optimal database size and suggest an effective training schedule. Our algorithm is applicable to all types of detectors where a photon interaction position can be associated with a characteristic light amplitude distribution. The new method allows improving the positioning accuracy by a factor of about 2.5 compared to the CAP algorithm, has a low memory footprint and opens up prospects for real-time application.

Materials
The CC prototype under development in our group for ion beam range verification via prompt-gamma imaging is a two-stage detector system. The scatterer consists of a stack of six double-sided silicon strip detectors, which provide direct information on the deposited energy as well as interaction positions of both, the incident γ-ray and the scattered electron. The second component, which is the focus of this work, is a single thick monolithic scintillation crystal read out by a multi-anode PMT that absorbs the scattered photon. A characteristic light distribution pattern registered by the PMT correlates with the photon interaction position inside the crystal.
Two monolithic scintillation crystals with dimensions of 50.8 mm × 50.8 mm × 30 mm were examined as potential candidates for the absorber component: LaBr 3 :Ce and CeBr 3 . The crystals were wrapped in a reflective material which, despite a more inhomogeneous spatial response of the detector compared to an absorbing coating, has a significantly better energy and time resolution and thus a better over all performance (Aldawood et al 2015). Both crystals show hygroscopic properties and were therefore protected from humidity within an aluminum housing. Both scintillation materials exhibit short decay times of 16-19 ns, high light yield in the range of 6.3-6.8 × 10 4 ph MeV −1 and an excellent relative energy resolution of 3.5%-4.1% at 662 keV. While LaBr 3 :Ce carries an intrinsic radioactivity of about 2 Bq cm −3 , CeBr 3 , being free of this disadvantage, shows a higher signal-to noise ratio (Saint-Gobain 2016, Scionix 2018).
We used a 64-fold segmented multi-anode PMT with an 8 × 8 pixel arrangement from Hamamatsu (models H8500C and H12700 Hamamatsu 2015), which has been shown to be superior when compared to a higher segmented 256-channel PMT (Viegas 2018). Each pixel has a size of 5.8 mm × 5.8 mm, the outer dimension of the PMT array is 49 mm × 49 mm, therefore slightly smaller than the size of the scintillation crystals. The photosensor and absorber were coupled with an optical grease whose refractive index is similar to that of the absorber material and the photomultiplier entrance window made of borosilicate glass. The signals of the 64 PMT channels are sent to adapter boards via four 16-pin coaxial ribbon cables and transferred further into four Constant Fraction Discriminator modules via LEMO cables. The MCFDs amplify the energy signals and create amplitude-independent logical signals that act as individual gates for the charge digitizer modules. Both types of signals are then fed into charge-to-digital converter modules that integrate the incoming signals over the duration of the individual gates, providing energy information of each channel. In addition to the 64 signals, a signal from the PMT sum-dynode is fed to a separate MCFD module, whose common logic output is sent further to a TRIVA 5 VME trigger unit (Hoffman 2009) as trigger for the data acquisition system, as well as to a Quad Coincidence Logic Unit, that splits this master gate into identical copies which are sent to each MQDC module, opening an acquisition time window to ensure synchronized data acquisition across all modules. The acquired data is sent to the control PC via a VME frontend CPU (RIO-3 operated under the real-time operational system LynxOS). A block of the data acquisition system is presented in figure 1.

Data acquisition
The training of a CNN requires an extensive collection of labeled data (here referred to as events), which are in our case 2D light amplitude distributions registered by a PMT and associated to their respective irradiation position coordinates. We adapted existing reference libraries, which served previously as lookup tables for position reconstruction using the kNN and CAP algorithms. The data acquisition of the light amplitude distributions was performed by irradiating the crystal's front surface with tightly collimated γ-ray sources: 137 Cs emitting 662 keV γ-rays and 60 Co emitting two coincident photons of energies of 1.17 and 1.33 MeV. Our collimator was a 10 cm long DENSIMENT rod with a central 1 mm diameter channel, arranged pointing perpendicular to the absorber surface. The crystal was irradiated on a 2D grid of 102 × 102 reference positions with a 0.5 mm step size (via a scanning system with remotely controlled and motorized (x, y) translation stages). For the LaBr 3 :Ce crystal 800 photopeak events originating from the 137 Cs irradiation and 600 events from 60 Co irradiation (separately for both cobalt energies) were collected at each position. The generation of the aforementioned 137 Cs library (source activity 72 MBq) of 600 × 102 × 102 entries took about 7 d of continuous measurement time. The activity of the 60 Co source was significantly lower (27 MBq), thus the acquisition of 600 × 102 × 102 photopeak events required about 18 d uninterrupted data acquisition. In order to separate scattered events from photopeak events, an energy gate was set to select the photopeak energy. Due to the absence of internal radioactivity for the CeBr 3 and therefore the higher signal-to-noise ratio, we were able to reduce (compared to the LaBr 3 :Ce crystal irradiation) the number of reference library entries to 600 and 480 for the irradiation with 137 Cs and 60 Co sources, respectively. A sketch of the experimental setup is presented in figure 2(a).

Data preprocessing
Each event entry belonging to a reference library consists of a two-dimensional, 8 × 8 feature vector, representing the light amplitude distribution in each channel of the PMT, and a target vector indicating the (x, y) coordinates of the irradiation position. The target vectors were constructed using the one-hot, also known as dummy, encoding scheme (Alkharusi 2012). All input vectors were normalized such that the sum over all PMT channels is equal to one. An exemplary light amplitude distribution, averaged over 400 events registered at the same irradiation position, together with the corresponding target vectors is presented in figure 2 Approximately 20%-30% of the reference events were set aside to serve later as a test set being a proxy for the new unlabeled data. The remaining 70%-80% were used as training and validation data to optimize the model parameters.

Workflow
We categorized the task of finding the interaction position of a γ-ray, based on the PMT response, as an image, i.e. pattern, recognition problem. CNN building blocks were used to design a new reconstruction algorithm, while the large datasets of labeled data enabled us to carry out supervised learning.
CNN algorithms are known for their large number of uncorrelated hyperparameters that must be set prior to the training. An attempt of optimizing them all simultaneously would be equivalent to finding the absolute minimum of a multidimensional function, which due to computational constraints is not possible to be carried out in practice. For that reason, the process of searching their optimum combination was divided into several stages with the following priorities: (1) model specific hyperparameters (number of convolutional layers and kernels), (2) database size (3) optimizer hyperparameters (learning rate and number of epochs for training).

Architecture design
The first stage of the hyperparameter optimization, performed to determine the number of convolutional layers and 3 × 3 kernels building the so-called convolutional block, was carried out through the random search method, meaning that several highly dissimilar network architectures were investigated to determine a smaller sub-space for the second stage of optimization, the grid search. After every convolution rectified linear unit (ReLU) activation (Gonzalez and Woods 2018) and batch normalization (Ioffe and Szegedy 2015) were applied. To prevent the image from shrinking, each layer was zero-padded before performing convolution operation. To generate predictions, separately for the x and y coordinates, the network was split into two output blocks, each made up of two fully connected layers with softmax (Gonzalez and Woods 2018) as the final activation function. The output had the form of two 102-dimensional probability vectors, originating from the 102 reference library irradiation positions along each axis. The coordinate with the maximum probability value was considered as the reconstructed interaction position.

Training
We carried out the training using a categorical cross-entropy loss function, Adam optimizer and introduced a data split into mini-batches of size 400. Even though the Adam optimizer provides a separate learning rate (lr) for each network parameter, which is adapted as learning unfolds, we found it beneficial to divide the training into several phases, gradually decreasing the initial learning rate for the optimizer as follows: lr = {10 −5 , 5 × 10 −6 , 10 −6 } for approximately 100, 50 and 50 epochs, respectively.
We prepared separate networks, for each energy of the incident γ-ray and each crystal. However, to exploit common features of the light distributions, data from all libraries were included at the beginning of the training. The final parameter tuning was carried out with the events of the specific energy and for the dedicated crystal only.

Database optimization
In theory, the neural network performance should improve with increasing database size, but eventually a saturation point is reached and the model does no longer benefit from new training examples. Since reference library acquisition is a lengthy process (7 and 18 d to collect 137 Cs and 60 Co data, respectively), it is advantageous to determine the minimum database size required to achieve the desired model accuracy.
Therefore, the training was carried out for 13 different numbers of photopeak events per irradiation position:

Determination of the algorithm spatial resolution
Interaction positions of events from the test set were predicted by the neural network and compared to the ground-truth. The differences between these two were used to fill a 2D error histogram creating a narrow peak similar in shape to a 2D Gaussian or Lorentzian. The spatial resolution of the algorithm was calculated as the average FWHM of the histogram projections along both axes. An exemplary error histogram and its projection in x direction are depicted in figure 3. In order to quantify the statistical uncertainty of the spatial resolution determination, the test dataset was divided into five subgroups and for each of them a separate error histogram with its FWHM was calculated. From the resulting five data points an average FWHM and its standard deviation were determined. The systematic uncertainty has many sources. Firstly, both hygroscopic absorber crystals are encapsulated in an aluminum housing and the exact position of the crystal edges has to be determined indirectly by the so-called 'edge scan', i.e. 1D scans in x and y directions, respectively, followed by sigmoid fits to the resulting intensity profiles, with a precision of about ±0.3 mm (Liprandi 2018). Secondly, the beam is not infinitesimally narrow, but collimated by a 10cm long tube with a 1 mm diameter opening. Assuming for simplicity, that all photon interactions take place at the crystal front surface, which is 0.2-0.3 mm away from the rear end of the collimator, we have experimentally determined the beam spot size to have approximately 1.2 mm diameter (Binder 2017) with a Gaussian intensity profile.

Architecture design
Using a random search method we determined the upper limit of the number of convolutional layers and 3 × 3 kernels of n conv = 6 and n kern = 4, respectively. More complex architectures did not lead to further improvements. The second stage of optimization was performed employing the grid search technique for all combinations of the following parameter values: n conv = {2, 3, 4, 5, 6} and n kern = {2, 3, 4}. Figure 4 present the corresponding outcomes of the optimization referring to the LaBr 3 :Ce crystal and the three investigated photon energies, illustrated by showing the resulting spatial resolution as expressed by the FWHM value of the error histogram as introduced in section 3.7.
There is no clear criterion for an unambiguous selection of the best set of parameters. Nonetheless, the architecture comprising five convolutional layers of three kernels with ReLU activation and batch normalization almost always provided the best results, was easy to train, and therefore was selected to serve as the final model. Figure 5 shows the sketch of the CNN architecture.
The same training was carried out for the CeBr 3 scintillator. A summary of the algorithm performance, expressed by the achieved spatial resolution via the FWHM of the error histogram, using the best final CNN architecture for both crystals as a function of the initial γ-ray energies is presented in figure 6. Figure 7 presents the spatial resolution achieved with the CNN algorithm as a function of the distance from the crystal central axis (0, 0) for three geometrical regions: central (|x|, |y| 8.5 mm), intermediate (8.5 mm < |x|, |y| 17 mm) and outer (17 mm < |x|, |y| 25.5 mm).

Crystal performance
It can be observed that the spatial resolution improves towards the crystal's central axis. This can be attributed to the less frequent scattering of scintillation light if compared to the crystal edges, distorting the light distribution on the PMT grid and enhancing absorption. The behavior of both 60 Co curves is very similar, Figure 4. Performance of the CNN algorithm for three different energies of incident γ-rays, as a function of the number of layers and kernels (two-blue, three-orange, four-green) for the LaBr 3 :Ce crystal. The spatial resolution was calculated as FWHM of the error histogram. because of their small energy difference of only 0.16 MeV. The network performance for 137 Cs is inferior to the higher energies of 60 Co due to the lower photon energies. In fact, less energetic γ-rays tend to interact further away from the PMT's entrance window, increasing the likelihood of scattering and attenuation of the scintillation light on its path to the photosensor.

Database optimization
We determined the optimum number of events per irradiation position required for an efficient training to be n epp|train = 400 (350 for the training set and 50 for validation). Using smaller databases resulted in overfitting or led to suboptimal performance. Above that point the spatial resolution (FWHM) did not improve further.
For the optimization of the test set, error histograms corresponding to n epp|test 15 suffered from too low statistics, so the use of a linear interpolation to determine the FWHM of these peaks could not be justified. It is necessary to use a validation dataset with n epp|test 20. To get the uncertainty of the spatial resolution determination we used five sets with n epp|test = 20 events. Table 1 summarizes the comparison between the CAP and CNN algorithms for the LaBr 3 :Ce scintillation crystal concerning the computational time, memory footprint and the best spatial resolution achieved with both techniques.
Apart from the superior positioning accuracy, the CNN algorithm is up to 10 4 times faster and has a much lower memory footprint compared to the CAP algorithm. This can be understood by analyzing the working principles of the CAP algorithm, which assumes comparisons of the unknown event with all entries from the extensive reference library being time and memory consuming. In contrary, the CNN requires a single forward pass through a fairly simple network.

Discussion
The determination of the photon interaction position inside a monolithic scintillation crystal has been performed so far using algorithms like kNN or its optimized version CAP. However, those suffered from long computational times and high memory requirements. This motivated us to develop an alternative reconstruction procedure that works faster and does not sacrifice the algorithm positioning performance. Even though the method presented here was developed for two large 50.8 mm × 50.8 mm × 30 mm monolithic LaBr 3 :Ce and CeBr 3 scintillation crystals, coupled to 64-fold segmented photosensors with an 8 × 8 pixel arrangement, it can be applied to all kinds of detectors where a photon interaction position is associated with a characteristic light amplitude distribution.
The newly designed CNN consists of five convolutional layers, each comprising three 3 × 3 kernels with ReLU activation, batch normalization and zero padding applied. To generate predictions, separately for the x and y coordinates, the network was split into two output blocks, each made up of two fully connected layers with softmax as the final activation function. The coordinates with the highest probability are considered as the reconstructed position. Using more complicated architectures increased the training complexity and did not lead to an improvement in the performance.
Employing this method, we achieved an accuracy of the position determination reaching an optimum value of 0.9(±0.2)mm for CeBr 3 at 60 Co photon energies of 1.17 and 1.33 MeV and under the same conditions for LaBr 3 :Ce a resolution of 0.96 (±0.02)mm. This accuracy is superior by more than a factor of 2.5 compared to the best value of 2.9(0.1) mm achieved by the CAP algorithm for the LaBr 3 :Ce scintillator at 1.3 MeV (Liprandi 2018). The lack of internal radioactivity in CeBr 3 explains its better performance if compared to LaBr 3 :Ce. We observed a trend of slightly better performance for increasing photon energies which can be explained, firstly, by the larger penetration power at higher γ-ray energies, causing more energetic photons to interact closer to the PMT entrance window and secondly, by more secondary scintillation photons generated by higher energetic primary photons.
To the best of our knowledge, the CNN algorithm as presented here achieves the best reported spatial resolution for 2D position determination inside large monolithic scintillators.
On the computational side the new method is compatible with both CPUs and GPUs and allows to determine the interaction position of about 10 4 events per second using a single GPU node (Nvidia Quadro P1000), which, when considering multi-mode instrumentation, will be sufficient for a potential real-time application in prompt-gamma imaging. This is a significant improvement over the CAP algorithm, that required a computer cluster and was able to reconstruct only about one event per second per single CPU node.
Exploiting large datasets of labeled data, that were originally acquired to serve as lookup tables for the kNN and CAP algorithms, we carried out supervised training. At each of 102 × 102 irradiation positions 460-800 photopeak events were collected and each of them was assigned two labels, corresponding to the x and y coordinates. Considering this task as a regression problem, in contrast to the classification paradigm, required a long training and did not lead to satisfying results.
An extensive database is crucial to provide a representative training set to train models that generalize well and are resistant to overfitting. Yet, the data acquisition is a time consuming process (takes up to a couple of weeks) and if possible should be kept at minimum. For this reason we set a lower threshold of 400 and 100 (5 × 20) events per irradiation position for training and testing, respectively. This facilitates a fast training without overfitting and smooth error histograms necessary for FWHM determination.
The accuracy of the network is best in the central area of the crystal and deteriorates slightly towards the outside regions. This reflects the larger probability of scattering and attenuation of the scintillation light at the scintillator edges compared to the more central interactions.
The error histograms derived from the comparison of calculated and ground-truth interaction positions exhibit a non-Gaussian shape (see figure 3) and thus their FWHM is not linearly related to the standard deviation σ (as in the case of normal distribution where FWHM 2 2 ln 2 s = ). However, the FWHM of the error histogram still serves as a valid measure of the network performance, enabling a quantitative comparison  (2) between different models. The performance of the previously utilized CAP and kNN algorithms was determined in the same way (also based on non-Gaussian peaks with the corresponding error histograms). Therefore, it is possible to perform a direct comparison between the previous, well-established methods and the new CNN algorithm.
We designed an optimum training schedule. Despite using the Adam optimizer, we suggest a gradual lowering of the initial learning rate value to enable faster convergence. In the beginning, while the parameters are still far from the optimum values, a larger learning rate value enables reaching bigger changes, while with a lower learning rate at the end of the training more precise adjustments are possible. We have noticed a benefit from adapting transferred learning, and instead of random parameter initialization we reused partially adjusted weights and biases. We adapted models trained on data originating from an irradiation of a second crystal of the same dimensions or including different γ-ray energies. The benefit lies in the reduction of training time as well as in a final model more robust and less prone to noisy data.
In our present study the CNN was trained with perpendicularly impinging gamma-ray beamlets. It will be left to further computational and experimental studies to test our assumption if the incidence angle of Compton-scattered photons does not influence the performance of the presented CNN algorithm.
The present reconstruction method can be implemented directly to determine the interaction position inside the here described absorber component of the CC for gamma quanta energies between 662 keV and 1.33 MeV. The interaction position reconstructed here is used later to calculate the Compton cones and their intersection in order to determine the origin of the primary gamma quanta. To make the algorithm applicable also for higher or lower energies, an additional training with events of corresponding energies should be performed.

Conclusions
We have developed a new CNN algorithm that provides an efficient way to determine precisely photon interaction positions in large monolithic scintillators, while preserving all benefits of the monolithic crystal (good energy resolution and timing). The algorithm outperforms previously used and well-established methods in terms of spatial resolution, memory requirements and speed. Those three factors are of particular importance for a future clinical application of the CC for ion beam range verification via prompt-gamma imaging.
To improve further the accuracy of the interaction position determination, which is currently limited by the γ-source diameter given by the collimator diameter of 1 mm, our future work will focus on performing a new reference scan using a collimator with an opening of 0.6 mm. Additionally, we will concentrate on deriving the depth of interaction information by incorporating into the training data a new reference library consisting of events originating from an irradiation of the crystal's side surfaces, thus extending the preset 2D methodology into three-dimensions. Further work should focus also on scaling the presented method to higher gamma energies preferably up to the multi-MeV range which is relevant for prompt-gamma Compton imaging in particle therapy ion beam range verification.