On the mean-field limit for the Vlasov-Poisson-Fokker-Planck system

We rigorously justify the mean-field limit of a $N$-particle system subject to the Brownian motion and interacting through a Newtonian potential in $\mathbb{R}^3$. Our result leads to a derivation of the Vlasov-Poisson-Fokkker-Planck (VPFP) equation from the microscopic $N$-particle system. More precisely, we show that the maximal distance between the exact microscopic trajectories and trajectories following the the mean-field is bounded by $N^{-\frac{1}{3}+\varepsilon}$ ($\frac{1}{63}\leq\varepsilon<\frac{1}{36}$) for a system with blob size $N^{-\delta}$ ($\frac{1}{3}\leq\delta<\frac{19}{54}-\frac{2\varepsilon}{3}$) up to a probability $1-N^{-\alpha}$ for any $\alpha>0$. Moreover, we prove the convergence rate between the empirical measure associated to the particle system and the solution of the VPFP equations. The technical novelty of this paper is that our estimates crucially rely on the randomness coming from the initial data and from the Brownian motion.


Introduction
Systems of interacting particles are quite common in physics and biosciences, and they are usually formulated according to first principles (such as Newton's second law). For instance, particles can represent galaxies in cosmological models [1], molecules in a fluid [34], or ions and electrons in plasmas [61]. Such particle systems are also relevant as models for the collective behavior of certain animals like birds, fish, insects, and even micro-organisms (such as cells or bacteria) [5,13,48]. In this paper, we are interested in the classical Newtonian dynamics of N indistinguishable particles interacting through pair interaction forces and subject to Brownian noise. Denote by x i ∈ R 3 and v i ∈ R 3 the position and velocity of particle i. The evolution of the system is given by the following stochastic differential equations (SDEs), where k(x) models the pairwise interaction between the individuals, and {B i } N i=1 are independent realizations of Brownian motions which count for extrinsic random perturbations such as random collisions against the background. In the presence of friction, model (1) is known as the interacting Ornstein-Uhlenbeck model in the probability or statistical mechanics community. In particular, we refer readers to [49,58] by Olla, Varadhan and Tremoulet for the scaling limit of the Ornstein-Uhlenbeck system. In this manuscript, we take the interaction kernel to be the Coulombian kernel for some real number a. The case a > 0 corresponds, for example, to the electrostatic (repulsive) interaction of charged particles in a plasma, while the case a < 0 describes the attraction between massive particles subject to gravitation. We refer readers to [38,61] for the original modelings. Since the number N of particles is large, it is extremely complicated to investigate the microscopic particle system (1) directly. Fortunately, it can be studied through macroscopic descriptions of the system based on the probability density for the particles on phase space. These macroscopic descriptions are usually expressed as continuous partial differential equations (PDEs). The analysis of the scaling limit of the interacting particle system to the macroscopic continuum model is usually called the mean-field limit. For the second order particle system (1), it is expected to be approximated by the following Vlasov-Poisson-Fokker-Planck (VPFP) equations where f (x, v, t) : (x, v, t) ∈ R 3 × R 3 × [0, ∞) → R + is the probability density function in the phase space (x, v) at time t, and is the charge density introduced by f (x, v, t). We denote by E(x, t) := k * ρ(x, t) the Coulombian or gravitational force field. The intent of this research is to show the mean-field limit of the particle system (1) towards the Vlasov-Poisson-Fokker-Planck equations (3). In particular, we quantify how close these descriptions are for a given N . Where σ = 0 (there is no randomness coming from the noise), mean-field limit results for interacting particle systems with globally Lipschitz forces have been obtained by Braun and Hepp [9] and Dobrushin [16]. Bolley, Cañizo and Carrilo [5] presented an extension of the classical theory to the particle system with only locally Lipschitz interacting force. Such case concerning kernels k ∈ W 1,∞ loc are also used in the context of neuroscience [6,57]. The last few years have seen great progress in mean-field limits for singular forces by treating them with an N -dependent cut-off. In particular, Hauray and Jabin [33] discussed mildly singular force kernels satisfying |k(x)| ≤ C |x| α with α < d − 1 in dimensions d ≥ 3. For 1 < α < d − 1, they performed the mean-field limit for typical initial data, where they chose the cut-off to be N − 1 2d . For α < 1, they prove molecular chaos without cut-off. Unfortunately, their method fails precisely at the Coulomb threshold when α = d − 1. More recently, Boers and Pickl [4] proposed a novel method for deriving mean-field equations with interaction forces scaling like 1 |x| 3λ−1 (5/6 < λ < 1), and they were able to obtain a cut-off as small as N − 1 d . Furthermore, Lazarovici and Pickl [40] extended the method in [4] to include the Coulomb singularity and they obtained a microscopic derivation of the Vlasov-Poisson equations with a cut-off of N −δ (0 < δ < 1 d ). More recently, the cut-off parameter was reduced to as small as N − 7 18 in [24] by using the second order nature of the dynamics. Where σ > 0, the random particle method for approximating the VPFP system with the Coulombian kernel was studied in [27], where the initial data was chose on a mesh and the cut-off parameter can be N −δ (0 < δ < 1 d ). Most recently, Carrilo et.al. [12] also investigated the singular VPFP system but with the i.i.d. initial data, and obtained the propagation of chaos through a cut-off of N −δ (0 < δ < 1 d ), which was a generalization of [40]. We also note that Jabin and Wang [35] rigorously justified the mean-field limit and propagation of chaos for the Vlasov systems with L ∞ forces and vanishing viscosity (σ N → 0 as N → ∞) by using a relative entropy method. Lastly, for a general overview of this topic we refer readers to [13,32,36,56].
When the interacting kernel k is singular, it poses problems for both theory and numerical simulations. An easy remedy is to regularize the force with an N -dependent cut-off parameter and get k N . The delicate question is how to choose this cut-off. On the one hand, the larger the cut-off is, the smoother k N will be and the easier it will be to show the convergence. However, the regularized system is not a good approximation of the actual system. On the other hand, the smaller the cut-off is, the closer k N is to the real k, thus the less information will be lost through the cut-off. Consequently, the necessary balance between accuracy (small cut-off) and regularity (large cut-off) is crucial. The analyses we reviewed above tried to justify that. In this manuscript, we set σ > 0. Compared with the recent work [12], the main technical innovation of this paper is that we fully use the randomness coming from the initial conditions and the Brownian motions to significantly improve the cut-off. Note that in [12] the size of cut-off can be very close to but larger than N − 1 d . However we manage to reduce the cut-off size to be smaller than N − 1 d (see Remark 1.4), which is a sort of average minimal distance between N particles in dimension d. This manuscript significantly improves the ideas presented in [10]. There the potential is split up into a more singular and less singular part. The less singular part is controlled in the usual manner while the mixing coming from the Brownian motion is used to estimate the more singular part. The technical innovation in the present paper is that the possible number of particles subject to the singular part of the interaction can be bounded due to the fact that the support of the singular part is small using a Law of Large Numbers argument. Again using the Law of Large Numbers based on the randomness coming from the Brownian motion, we show that the leading order of the singular part of the interaction can be replaced by its expectation value. This step is a key point of the present manuscript. The replacement by the expectation value, i.e. the integration of the force against the probability density, gives the regularization of the singular part and gives a significant improvement of our estimates. This is carried out in Lemma 3.3, the proof of which can be found in section 5. [10] and the present paper are, to our knowledge, so far the only results where the mixing from the Brownian motion has been used in the derivation of a mean-field limit for an interacting many-body system.
As a companion of (1), some also consider the first order stochastic system As before, one can expect that as the number of the particles N goes to infinity we can get the continuous description of the dynamics as the following nonlinear PDE where f (x, t) is now the spatial density.
The particle system (5) has many important applications. One of the best known classical applications is in fluid dynamics with the Biot-Savart kernel It can be treated by the well-known vortex method introduced by Chorin in 1973 [14]. The convergence of the vortex method for two and three dimensional inviscid (σ = 0) incompressible fluid flows was first proved by Hald et al. [25,26], Beale and Majda [2,3]. When the effect of viscosity is involved (σ > 0), the vortex method is replaced by the so called random vortex method by adding a Brownian motion to every vortex. The convergence analysis of the random vortex method for the Navier-Stokes equation was given by [23,46,47,51] in the 1980s. For more recent results we refer to [17,20,37,54]. Another well-known application of the system (5) is to choose the interaction to be the Poisson kernel where C d > 0 and k is set to be attractive. Now the system (5) coincides with the particle models to approximate the classical Keller-Segel (KS) equation for chemotaxis [39,52]. We mainly refer to [10,18,29,30,31,43,44] for the mean-field limit of the KS system. Concerning the size of the cut-off, more specifically, [43] chose the cut-off to be (ln N ) − 1 d , which was significantly improved in [29], where the cut-off size can be as small as N − 1 d(d+1) log(N ). In [10,30], the cut-off size was almost optimal, coming fairly close to N − 1 d . Many techniques used in this manuscript are adapted from these papers. For the Poisson-Nernst-Planck equation (k is set to be repulsive), [43] proved the mean-field limit without a cut-off.
The rest of the introduction will be split into three parts: We start with introducing the microscopic random particle system in Section 1.1. Then we present some results on the existence of the macroscopic mean-field VPFP equations in Section 1.2. Lastly, our main theorem will be stated in Section 1.3, where we prove the closeness of the approximation of solutions to VPFP equations by the microscopic system.

Microscopic random particle system
We are interested in the time evolution of a system of N -interacting Newtonian particles with noise in the N → ∞ limit. The motion of the system studied in this paper is described by trajectories on phase space, i.e. a time dependent Φ t : R → R 6N . We use the notation where x t j stands for the position of the j th particle at time t and v t j stands for the velocity of the j th particle at time t. The system is a Newtonian system with a noise term coupled to the velocity, whose evolution is governed by a system of SDEs of the type where k is the Coulomb kernel (2) modeling interaction between particles and B t i are independent realizations of Brownian motions.
We regularize the kernel k by a blob function 0 ≤ ψ(x) ∈ C 2 (R 3 ), supp ψ(x) ⊆ B(0, 1) and , then the Coulomb kernel with regularization has the form Thus one has the regularized microscopic N -particle system for i = 1, 2 · · · , N Here the initial condition Φ 0 of the system is independently, identically distributed (i.i.d.) with the common probability density given by f 0 . And the corresponding regularized VPFP equations are

Existence of classical solutions to the Vlasov-Poisson-Fokker-Planck system
The existence of weak and classical solutions to VPFP equations (3) and related systems has been very well studied. Degond [15] first showed the existence of a global-in-time smooth solution for the Vlasov-Fokker-Planck equations in one and two space dimensions in the electrostatic case. Later on, Bouchut [7,8] extended the result to three dimensions when the electric field was coupled through a Poisson equation, and the results were given in both the electrostatic and gravitational case. Also, Victory and O'Dwyer [59] showed existence of classical solutions for VPFP equations when the spacial dimension is less than or equal to two, and local existence for all other dimensions. Then Bouchut in [7] proved the global existence of classical solutions for the VPFP system (3) in dimension d = 3. His proof relied on the techniques introduced by Lions and Perthame [42] concerning the existence to the Vlasov-Poisson system in three dimensions. The long time behavior of the VPFP system was studied by Ono and Strauss [50], Carpio [11] and Carrillo et al. [55].
The existence results in [59] and [7] are most appropriate for this work. We summarize them in the following theorem, which is also used in [27, Theorem 2.1]. Theorem 1.1. (Classical solutions of the VPFP equations) Let the initial data 0 ≤ f 0 (x, v) satisfies the following properties: Then for any T > 0, the VPFP equations (3) admits a unique classical solution on [0, T ].
Remark 1.1. The proof of the above theorem given in [59] and [7] indicates that the map is a continuous map from [0, T ] to W 1,∞ (R 3 ). This implies that initial smooth data remains smooth for all time intervals [0, T ]. So if we assume the initial data satisfies the following for any k ≥ 1 Then the unique classical solution f maintains the regularity on [0, T ] for any k ≥ 1: The present paper also needs the uniform-in-time L ∞ bound of the charge density ρ: which was obtained in [53] by means of the stochastic characteristic method under the assumption the f 0 is compactly supported in velocity. We also note that [12] provided a proof of the localin-time L ∞ bound for ρ by employing Feynman-Kac's formula and assuming the initial data has polynomial decay.
In this paper, we assume that the initial data f 0 satisfies the following assumption: 2. there exists a m 0 > 6, such that The above assumption makes sure that we have the regularity needed for this article: for any where C f0 depends only on Note that the charge density ρ satisfies Thus we have m 0 We also note that equivalently one can estimate a bound for f N and ρ N uniformly in N .
The assumption that f 0 is compactly supported in the velocity variable is not required for the existence of the VPFP system. However it is used to get the L ∞ bound of the charge density ρ (see in [53]) and also in the proof of Lemma 3.1 (see in (85)). Remark 1.3. All our estimates below are also possible in the presence of sufficiently smooth external fields. Due to the fluctuation-dissipation principle it is more natural to add an external, velocitydependent friction force to the system.

Statement of the main results
Our objective is to derive the macroscopic mean-field PDE (3) from the regularized microscopic particle system (12). We will do this by using probabilistic methods as in [10,30,29,40]. More precisely, we shall prove the convergence rate between the solution of VPFP equations (3) and the empirical measure associated to the regularized particle system Φ t satisfying (12). We assume that the initial condition Φ 0 of the system is independently, identically distributed (i.i.d.) with the common probability density given by f 0 .
Given the solution f N to the mean-field equation (13), we first construct an auxiliary trajectory Ψ t from (13). Then we prove the closeness between Φ t and Ψ t . For the auxiliary trajectory we shall consider again a Newtonian system with noise, however, this time not subject to the pair interaction but under the influence of the external mean field Here we let Ψ t have the same initial condition as Φ t (i.i.d. with the common density f 0 ). Since the particles are just N identical copies of evolution, the independence is conserved. Therefore the Ψ t are distributed i.i.d. according to the common probability density f N . We remark that the VPFP equation (13) is Kolmogorov's forward equation for any solution of (24), and in particular their probability density f N solves (13). This i.i.d. property will play a crucial role below, where we shall use the concentration inequality (see in Lemma 2.5) on some functions depending on Ψ t . Our main result states that the N -particle trajectory Φ t starting from Φ 0 (i.i.d. with the common density f 0 ) remains close to the mean-field trajectory Ψ t with the same initial configuration Φ 0 = Ψ 0 during any finite time [0, T ]. More precisely, we prove that the measure of the set where the maximal distance max T ] exceeds N −λ2 decreases exponentially as the number N of particles grows to infinity. Here the distance Φ t − Ψ t ∞ is measured by (12) and (24) respectively with the initial data Φ 0 = Ψ 0 , which is i.i.d. sharing the common density f 0 that satisfies Assumption 1.1. Then for any α > 0 and 0 < λ 2 < 1 3 , there exists some 0 < λ 1 < λ2 3 and a N 0 ∈ N which both depend only on α, T and C f0 , such that for N ≥ N 0 , the following estimate holds with the cut-off index δ ∈ 1 3 , min λ1+3λ2+1 Remark 1.4. In particular, for any 1 63 ≤ ε < 1 36 , choosing λ 2 = 1 3 − ε and λ 1 = 1 9 − ε, we have a convergence rate N − 1 3 +ε with a cut-off size of N −δ ( 1 3 ≤ δ < 19 54 − 2ε 3 ). In other words, the cut-off parameter δ can be chosen very close to 19 54 and in particular larger than 1 3 , which is a significant improvement over previous results in the literature.
Strategy of the proof. The strategy is to obtain a Gronwall-type inequality for max where K N (X t ) and K N (X t ) are defined as One can compute If the force k N is Lipschitz continuous with a Lipschitz constant independent of N , the desired convergence follows easily [9,16]. However the force considered here becomes singular as N → ∞, hence it does not satisfy a uniform Lipschitz bound. The first term in (27) is already a sufficient bound in view of Gronwall's Lemma.
• By the Law of Large Numbers, carried out in detail for our purpose here in Proposition 3.1, we show for any where for convenience we abused the notation a b to denote a ≤ b except for an event with probability approaching zero.
This direct error estimate can be seen as a consistency of the two evolutions in high probability.
• In Proposition 3.2, we show that the propagation of errors, coming from the second term in (27), is stable. This stability is important to be able to close the Gronwall argument. We show that for any T > 0 holds under the condition that Here it is crucial to ensure the constant λ 3 satisfies 2δ − 1 ≤ −λ 3 < −λ 2 < 0. The function of this additional condition will be clear later (see Remark 3.2).
To get this improvement of the cutoff parameter compared to previous results in the literature, we make use of the mixing caused by the Brownian motion. Therefore we split potential K N := K N 1 + K N 2 , where K N 2 is chosen to have a wider cut-off of order N −λ2 > N −δ . The less singular part K N 2 is controlled in the usual manner [4,30,29,40] (see in estimate (117)).
Thus we are left with the force K N 1 . We shall first estimate the number of particles that will be present in the support of K N 1 . Since the latter is small, this number will always be very small compared to N .
Under the condition (30) we can not track the particles of the Newtonian time evolution with an accuracy larger than N −λ2 . Thus -without using the Brownian motion -we have to assume the worst case scenario, which is all particles giving the maximal possible solution to the force, i.e. sitting close to the edge of the cutoff region, and all forces summing up, i.e. all particles sitting on top of each other.
But the Brownian motion in our system will lead to mixing. For a short time interval the effect of mixing will be much larger than the effect of the correlations coming from the pair interaction, and we can make use of the independence of the Brownian motions. This mixing, which happens on a larger spacial scale than the range of the potential, causes the particles to be distributed roughly equally over the support of the interaction resulting in a cancellation of the leading order of K N 1 . It follows for the more singular part K N 1 that This is mainly carried out in Section 5 (the proof of Lemma 3.3).
To quantify the convergence of probability measures, we give a brief introduction on the topology of the p-Wasserstein space. In the context of kinetic equations, it was first introduced by Dobrushin [16]. Consider the following probability space with finite p-th moment: We denote the Monge-Kantorovich-Wasserstein distance in P p (R d ) as follows where Λ(µ, ν) is the set of joint probability measures on R d ×R d with marginals µ and ν respectively and (X, Y ) are all possible couples of random variables with µ and ν as respective laws. For notational simplicity, the notation for a probability measure and its probability density is often abused. So if µ, ν have densities ρ 1 , ρ 2 respectively, we also denote the distance as W p p (ρ 1 , ρ 2 ). For further details, we refer the reader to the book of Villani [60].
Following the same argument as [40,Corollary 4.3], Theorem 1.2 implies molecular chaos in the following sense: More precisely, under the assumptions of Theorem 1.2, for any α > 0, there exists some constants C > 0 and N 0 > 0 depending only on α, T and C f0 , such that for N ≥ N 0 , the following estimate holds where λ 2 is used in Theorem 1.2.
Another result from Theorem 1.2 is the derivation of the macroscopic mean-field VPFP equations (3) from the microscopic random particle system (12). We define the empirical measure associated to the microscopic N -particle systems (12) and (24) respectively as The following theorem shows that under additional moment control assumptions on f 0 , the empirical measure µ Φ (t) converges to the solution of VPFP equations (3) in W p distance with high probability. Theorem 1.3. Under the same assumption as in Theorem 1.2, let f t be the unique solution to the VPFP equations (3) with the initial data satisfying Assumption 1.1 and µ Φ (t) be the empirical measure defined in (38) with Φ t being the particle flow solving (12). Let p ∈ [1, ∞) and assume that there exists m > 2p such that Then for any T > 0 and κ < min{ 1 6 , 1 2p , δ}, there exists a constant C 1 depending only on T and C f0 and constants C 2 , C 3 depending only on m, p, κ, such that for all N ≥ e where δ and λ 2 are used in Theorem 1.2.
This theorem provides a derivation of the VPFP equations from an interacting N -particle system, bridging the gap between the microscopic descriptions in terms of agent based models and macroscopic or hydrodynamic descriptions for the particle probability density.

Preliminaries
In this section we collect the technical lemmas that are used in the proofs of the main theorems. Throughout this manuscript, generic constants will be denoted generically by C (independent of N ), even if they are different from line to line. We use · p for the

Local Lipschitz bound
First let us recall some estimates of the regularized kernel k N defined in (11): The estimate (i) has been proved in [63, Lemma 2.1] and (ii) follows from [2, Lemma 5.1]. As for (iii), it is a direct result of Young's inequality.
Next we define a cut-off function N , which will provide the local Lipschitz bound for k N .
and L N : We summarize our first observation of k N and N in the following lemma: where k N is the regularization of the Coulomb kernel (2) and N satisfies Definition 2.1.
Proof. Let us first consider the case |y| < 2N −λ2 . It follows from the bound from Lemma 2.1 and the decrease of N that Next we consider the case |y| ≥ 2N −λ2 . It follows that |x| ≥ N −λ2 and thus by Lemma 2.1 (i) where in the last step we used |x| ≥ (41) and (42) finishes the proof.
Recall the notations and we have the local Lipschitz continuity of K N : for some C > 0 independent of N .
The following observations of k N and N turn out to be very helpful in the sequel: and Proof. We only prove one of the estimates above, since all the estimates can be obtained through the same procedure. One can estimate We estimate the first term where B(r) denotes the ball with radius r in R 3 . The second term is bounded by It is easy to compute the last term Collecting estimates (50), (51) and (52), one has

Law of Large Numbers
Also, we need the following concentration inequality to provide us the probability bounds of random variables: . Then for any α > 0, the sample meanZ = 1 where C α depends only on C and α.
The proof can be seen in [23, Lemma 1], which is a direct result of Taylor's expansion and Markov's inequality.
Recall the notation We can introduce the following version of the Law of Large Numbers: Lemma 2.6. At any fixed time t ∈ [0, T ], suppose that X t satisfies the mean-field dynamics (24), K N and K N are defined in (43) and (55) respectively, L N and L N are introduced in Definition 2.1. For any α > 0 and 1 3 ≤ δ < 1, there exist a constant C 1,α > 0 depending only on α, T and C f0 such that and Proof. We can prove this lemma by using Lemma 2.5. Due to the exchangeability of the particles, we are ready to bound Since x t 1 and x t j are independent when j = 1 and k N (0) = 0, let us consider x t 1 as given and denote To use Lemma 2.5, we need a bound for the variance Since it follows from Lemma 2.4 that it suffices to bound and where we have used k N 2 ≤ CN δ 2 in Lemma 2.1 (iii). Hence one has So the hypotheses of Lemma 2.5 are satisfied with g(N ) = CN 4δ−1 . In addition, it follows from (ii) in Lemma 2.1 that |Z j | ≤ CN 2δ ≤ C N g(N ). Hence, using Lemma 2.5, we have the probability bound Similarly, the same bound also holds for all other indexes i = 2, · · · , N , which leads to Let C 1,α be the constant C(α, T, C f0 ) in (66), then we conclude (56).
To prove (57), we follow the same procedure as above It is easy to show that E [Z j ] = 0. To use Lemma 2.5, we need a bound for the variance. One computes that and where we have used the estimates of N in Lemma 2.4. Hence one has So the hypotheses of Lemma 2.5 are satisfied with g(N ) = CN 6δ−1 . In addition, it follows from Definition 2.1 that |Z j | ≤ CN 3δ ≤ C N g(N ). Hence, we have the probability bound by Lemma 2.5, which leads to Thus, (57) follows from (72).

Consistency
In order to obtain the consistency error for the entire time interval, we divide [0, T ] into M + 1 subintervals with length ∆τ = N − γ 3 for some γ > 4 and τ k = n∆τ , k = 0, · · · , M + 1. The choice of γ will be clear from the discussion below. Here the choice of ∆τ is only for the purpose of proving consistency and it can be sufficiently small. Note that it is different from ∆t in the proof of stability in the next subsection.
First, we establish the following lemma on the traveling distance of X t in a short time interval [τ k , τ k+1 ]: where C B depends only on T and C f0 . where The estimate of I k So we have max To estimate I k 2 (t), recall a basic property of Brownian motion [21, Chap. 1.2]: which leads to which leads to where we used the fact that n ≤ T ∆t = T N γ 3 . Lastly, we prove the estimate of I k 3 (t). It is obvious that and it follows from (78) that Moreover, it follows from the assumption in Theorem 1.
Then one has It follows from (74) that which leads to Then it follows from (77), (81) and (88) that for γ > 4, which completes the proof of (73).
Now we can prove the consistency error for the entire time interval [0, T ].
Proposition 3.1. (Consistency) For any T > 0, let (X t , V t ) satisfy the mean-field dynamics (24) with initial density f 0 (x, v), K N and K N be defined in (43) and (55) respectively. For any α > 0 and 1 3 ≤ δ < 1, there exist a constant C 2,α > 0 depending only on α, T and C f0 such that P max and P max Proof. Denote the events: and where C B and C 1,α are used in Lemma 2.6 and Lemma 3.1 respectively. According to Lemma 2.6 and Lemma 3.1, one has for any α > 0 and γ > 4. Furthermore, we denote by Lemma 2.6. Also, under the event B τ k , it holds that due to the fact that 3δ + 1 < 4 < γ. In the second inequality we have used the local Lipschitz bound of used the uniform control of max In the third inequality we have used (91) and (96). This yields that holds under the event M k=0 B τ k ∩ C τ k ∩ H. Therefore it follows from (93) and (95) that Denote C 2,α to be the constant C(α, T, C f0 ) in (101). Since α > 0 is arbitrary and so is α , (89) holds true. The proof of (90) can be done similarly.

Stability
In this subsection we obtain the stability result.
Remark 3.1. This proposition is one of the crucial statements in our paper. Proving propagation of chaos for systems like the one we consider under the assumptions of Lipschitz-continuous forces is standard, as explained in the introduction. The forces we consider are more singular. However our techniques allow us to show that the Lipschitz condition encoded in the definition of S holds typically, i.e. with probability close to one. In this sense, Proposition 3.2 is only helpful if we find an argument that A T holds. But as long as we have good estimates on the difference of the forces and thus the growth of max we are in fact able to control A T . This control is done by a generalization of Gronwalls Lemma, which will be introduced in our next step (Lemma 3.4).
Proof. Let α > 0. First, we write S T (Λ) as the intersection of non-overlapping sets {S n (Λ)} M n=0 , where S n (Λ). Note that here the choice of ∆t is for the purpose of proving stability and it is different from ∆τ in the proof of consistency.
To prove this proposition, we split the interaction force k N into k N = k N 1 + k N 2 , where k N 2 is the result of choosing a wider cut-off of order N −λ2 > N −δ in the force kernel k and which means that for k N 2 and N 2 we choose δ = λ 2 in (11) and (40) respectively. Following the approach in [10], we introduce the following auxiliary trajectory We consider the above auxiliary trajectory with two different initial phases. For any 1 ≤ n ≤ M and t ∈ [t n , t n+1 ], we consider the auxiliary trajectory starting from the initial phase where (x ) satisfies (12) at time t n−1 . However when n = 1, i.e. t ∈ [0, t 1 ], the initial phase of the auxiliary trajectory is chosen to be ( which has the distribution f 0 . Moreover in the latter case the distribution of ( x t i , v t i ) is exactly f N t , which solves the regularized VPFP equations (13) with the initial data f 0 .
For later reference let us estimate the difference X t − X t ∞ and V t − V t ∞ . Using the equations of these trajectories, we have for t ∈ [t n , t n+1 ], and where C depends only on T and C f0 . Summarizing, we get under the event A T defined in (102). Then for any t ∈ [t n , t n+1 ], one splits the error First, let us compute I 1 : where we have used the local Lipschitz bound of K N 2 under the event A T (see in Lemma 2.3). Furthermore, we denote Since Proposition 3.1 also holds for the case λ 2 < 1 3 , one has Under the event B 2 , it holds that since λ 2 < 1 3 , where L N 2 (X t ) ∞ ≤ C log(N ) follows from Lemma 2.4. Hence, one has under event A T ∩ B 2 .
To estimate I 2 , notice that by triangle inequality and (111) one has under the event A T , which leads to Here the bound And a similar estimate leads to We denote the event It has been proved in Proposition 3.1 that Then under the event B 3 it follows that since L N (X t ) ∞ ≤ C log(N ) and 1 3 ≤ δ < 1. Thus, we have under the event A T ∩ B 3 . The estimate of I 3 is a result of Lemma 3.2. Indeed, we denote the event so by Lemma 3.2 one has that for any 0 ≤ n ≤ M Furthermore, it holds that under the event G n , where we have used the fact that k N 1 1 ≤ CN −λ2 . Indeed, it is easy to compute that Collecting (117), (126) and (129) yields that under the event B 2 ∩ B 3 ∩ A T ∩ G n , where C depends on α, T and C f0 . To distinguish it from other constants we will denote this C by C 3,α . This implies B 2 ∩ B 3 ∩ A T ∩ G n ⊆ S n (C 3,α ), which yields that It follows that where we used the estimates in (115), (124) and(128). Here α is arbitrary and so is α . (108) and (24) respectively. When 1 ≤ n ≤ M , the two different initial phases are chosen to be (X tn−1 , V tn−1 ) and (X tn−1 , X tn−1 ) at time t = t n−1 , and when n = 0 the two different initial phases are chosen to be (X 0 , V 0 ) and (X 0 , V 0 ) at time t = 0. Then for any α > 0, there exists a C 4,α > 0 depending only on α, T and C f0 such that for N sufficiently large it holds that where we require t n+1 − t n = N −λ1 with 0 < λ 1 < λ2 3 and 0 < λ 2 < 1 3 . Here where k N 1 is defined in (106).
Lemma 3.2 is used in the proof of Proposition 3.2. It follows from the following estimate of the term in (132) at any fixed time t ∈ [t n , t n+1 ], a statement which will later be generalized to hold for the maximum of max t ∈ [t n , t n+1 ]. Lemma 3.3. Under the same assumptions as in Lemma 3.2, for any α > 0, there exists C 5,α > 0 depending only on α, T and C f0 such that for N sufficiently large it holds that for any fixed time t ∈ [t n , t n+1 ] The proof of Lemma 3.3 is carried out in Section 5. The novel technique in the proof used the fact that k N 1 has a support with the radius N −λ2 (small). This means that in order to contribute to the interaction, x t j (or x t j ) has to get close enough (less than N −λ2 ) to x t i (or x t i ). Due to the effect of Brownian motion we get mixing of the positions of the particles over the whole support of k N 1 . Using a Law of Large Numbers argument one can show that the leading order of the interaction can in good approximation be replaced by the respective expectation value. Due to symmetry of k N 1 this expectation value is zero. Significant fluctuations of the interaction k N 1 have very small probability.
The proof of Lemma 3.2. We follow the similar procedure as in Proposition 3.1. We divide [t n , t n+1 ] into M + 1 subintervals with length ∆τ = N − γ 3 for some γ > 4 and τ k = k∆τ , k = 0, · · · , M + 1. Recall the event H as in (91) and denote the event It follows from Lemma 3.1 that for any γ > 4. Furthermore we denote the event in (133), then it follow from Lemma 3.3 that For all t ∈ [τ k , τ k+1 ], under the event G τ k ∩ H ∩ H, we obtain Therefore it follows from (135) and (137) that Denote C 4,α to be the constant C(α, T, C f0 ) in (138). Since α > 0 is arbitrary and so is α , (132) holds true. This completes the proof of Lemma 3.2.

Convergence and the proof of Theorem 1.2
In this section, we achieve the convergence by using the consistency from Proposition 3.1 and the stability from Proposition 3.2. To do this, we first prove the following Gronwall-type inequality.
Lemma 3.4. For any T > 0, let e(t) be a non-negative continuous function on [0, T ] with the initial data e(0) = 0 and λ 2 , λ 3 be two universal constants satisfying 0 < λ 2 < λ 3 . Assume that for any 0 < T 1 ≤ T the function e(t) satisfies the following differential inequality that holds with C > 0 independent of N > 0 provided that max holds. Then e(t) is uniformly bounded on [0, T ]. Furthermore there is a N 0 ∈ N depending only on C and T such that for all N ≥ N 0 max  (139) holds we can control the growth of e(t) via Gronwall's inequality to make sure that (141) remains valid on an even larger interval.
Proof. We prove the lemma by contradiction: we assume that there is a t ∈ [0, T ] with e(t) ≥ N −λ2 and show that for N ≥ N 0 with some N 0 ∈ N specified below, we get a contradiction. It follows that the infimum over all times t where e(t) is larger than or equal to N −λ2 exists and we define We get by continuity of e(t) together with e(0) = 0 that T * > 0, e(T * ) = N −λ2 and max 0≤t≤T * Since (140) implies (141), we get for T 1 = T * that Gronwall's Lemma gives that Since e C √ log(N )T * and log 2 (N ) are asymptotically bounded by any positive power of N , we can find a N 0 ∈ N depending only on C and T * such that for any N ≥ N 0 and hence e(T * ) < N −λ2 for any N ≥ N 0 .
Thus we get a contradiction to (142) for all N ≥ N 0 and the lemma is proven.
We now return to the proof of Theorem 1.2. Denote the event and consider the quantity e(t) defined as Recall that To prove the theorem we will show that under the assumptions C T and A c T ∪ S T (C 3,α ) it follows that max Let us explain why proving (145) under the assumptions C T and A c T ∪S T (C 3,α ) proves the theorem: Since C T is the consistency in Proposition 3.1, i.e. P (C c T ) ≤ N −α , and by Proposition 3.2 one has It follows that for any α > 0, there exists some N 0 ∈ N such that P max To prove the statement (145) we use Lemma 3.4. We will show that for any 0 < T 1 ≤ T , under the additional assumption A T1 , the following differential inequality holds for some λ 3 > λ 2 . Then Lemma 3.4 states that in fact (145) holds which, as explained above, proves Theorem 1.2. Note that since e(0) = 0, according to the general Gronwall's inequality in Lemma 3.4, the assumption A T1 can be removed. Since A T ⊆ A T1 , we have to prove (146) under the assumption that A T ∩C T ∩(A c T ∪S T (C 3,α )) = A T ∩ C T ∩ S T (C 3,α ) holds. Let us recall the assumptions C T , S T (C 3,α ) and A T for easier reference. They hold if Notice that for any 0 < T 1 ≤ T Using the fact that d x ∞ dt ≤ dx dt ∞ , one has for all t ∈ (0, T 1 ] It follows that where in the first inequality we used assumptions (147) and (148) and in the second inequality we used the fact that Here we denote Notice that for one has −λ 3 < −λ 2 . In other words, we obtain that for λ 2 < λ 3 which verifies (146) and the theorem is proven.

Proof of Theorem 1.3
In order to prove the error estimate between f t and µ Φ (t), let us split the error into three parts The Theorem 1.3 is proven once we obtain the respective error estimates of those three parts.
where p ∈ [1, ∞), N > 3 and C 1 depends only on T and C f0 . The proof is inspired by the method of Leoper [45]. Note that here we can't follow the method in [40] directly since the support of f N and f are not compact in our present case.
•The second term W p (f N t , µ Ψ (t)). This term concerns the sampling of the mean-field dynamics by discrete particle trajectories. The convergence rate has been proved in [40,Corollary 9.4] by using the concentration estimate of Fournier and Guillin [19]. We summarize the result as follows: let p ∈ [1, ∞), κ < min{δ, 1 6 , 1 2p } and N > 3. Assume that there exists m > 2p such that Then there exist constants C 2 and C 3 such that it holds P max •The third term W p (µ Ψ (t), µ Φ (t)). (160) •Convergence of W p (f t , µ Φ (t)). Collecting estimates (158), (159) and (161) and choosing κ < min{δ, 1 6 , 1 2p }, it follows that P max where C 5 depends only on T and C f0 , and C 6 , C 7 depend only on m, p, κ. We can simplify this result by demanding N ≥ e , which yields N 1−3λ2 ≥ (1 + log(N ))e C5 √ log(N ) . Hence we conclude that P max 5 The proof of Lemma 3.3 In this section, we present the proof of Lemma 3.3, which provides the distance between K N 1 ( X t ) and K N 1 (X t ) (t ∈ [t n , t n+1 ]), where ( X t , V t ), (X t , V t ) satisfying (107)-(108) and (24) respectively with two different initial phases (X tn−1 , V tn−1 ) and (X tn−1 , X tn−1 ) at time t = t n−1 when 1 ≤ n ≤ M , or (X 0 , V 0 ) and (X 0 , V 0 ) at time t = 0 when n = 0. To do this, we introduce the following stochastic process: For time 0 ≤ s ≤ t and a := (a x , a v ) ∈ R 6N , let Z a,N t,s := (Z a,N x,t,s , Z a,N v,t,s ) be the process starting at time s at the position (a x , a v ) and evolving from time s up to time t according to the mean-field force K N : and Note that here (Z a,i,N x,t,s , Z a,i,N v,t,s ), i = 1, · · · , N are independent. Furthermore (Z a,N x,t,s , Z a,N v,t,s ) has the strong Feller property (see [22] Definition (A)), implying in particular that it has a transition probability density u a,N t,s which is given by the product u a,N t,s := N i=1 u a,i,N t,s . Hence each term u a,i,N t,s is the transition probability density of (Z a,i,N x,t,s , Z a,i,N v,t,s ) and is also the solution to the linearized equation for t >: where ρ N = R 3 f N (t, x, v)dv, and f N solves the regularized VPFP equations (13) with initial condition f 0 .
Consider now the process Z a,N t,s and Z b,N t,s for two different starting points a, b ∈ R 6N . It is intuitively clear that the probability density u a,i,N t,s and u b,i,N t,s are just a shift of each other. The next lemma gives an estimate for the distance between any two densities in terms of the distance between the starting points a and b and the elapsed time t−s. The proof is carried out in Appendix A.
Lemma 5.1. There exists a positive constant C depending only on C f0 and T such that for each N ∈ N, any starting points a, b ∈ R 6N and any time 0 < t T , the following estimates for the transition probability densities u a,i,N t,s resp. u b,i,N t,s of the processes Z a,i,N t,s resp. Z b,i,N t,s given by ( 164) hold for t − s < min{1, T − s}: The norm · p,q denotes the p-norm in the x and q-norm in the v-variable, i.e. for any f : To this end one assumes ∆t = t n+1 − t n = N −λ1 . Next we define for t ∈ [t n , t n+1 ] the random sets and Here M t tn is at time t n the set of indices of those particles x tn j which are in the ball of radius where C * will be defined later. Here S t tn indicates the event where the number of particles inside the set M t tn is smaller than 2C * N 3N −λ2 + log(N )∆t 3 2

2
, and the event S t tn is introduced to help estimate P(S t tn ). Our next lemma provides the probability estimate of the event where particle x t j (or x t j ) is close to x t 1 (or x t 1 ) (distance smaller than N −λ2 ) during a short time interval t − t n , which contributes to the interaction of k N 1 defined in (106), since the support of k N 1 has radius N −λ2 . (107) and (108) on t ∈ [t n , t n+1 ] and the random set M t tn satisfy (168), then for any α > 0, there exists some constant N 0 > 0 depending only on α, T and C f0 such that for all N ≥ N 0 it holds This means that for some particle index j outside M t tn , x t j for some t ∈ [t n , t n+1 ] such that x t 1 − x t j < N −λ2 (i.e. x t j contributes to the interaction of k N 1 ) with probability less than N −α . Here P is understood to be taken on the initial condition x tn j .
Proof. Let (1, j) be fixed and a t 1 := (a t 1,x , a t 1,v ), b t j := (b t j,x , b t j,v ) ∈ R 6 satisfy the stochastic differential equations with the initial data a tn 1 = 0 and b tn j = 0. Here B t j is the same as in (107). It follows from the evolution equation (107) that And by the same argument one has For j ∈ (M t tn ) c for some t ∈ [t n , t n+1 ], i.e.
x tn Hence P min where we used a t x = t tn a s v ds and b t x = t tn b s v ds in the second inequality. In the same way we can argue that P min Splitting up this Wiener process into its three spacial components we get where in the last equality we used the reflection principle based on the Markov property [41]. Recall that the time evolution of a t 1,v and b t j,v are standard Brownian motions, i.e. the density is a Gaussian with standard deviation σ t = σ(t − t n ) 1 2 . Due to the independence of a t 1,v and b t j,v , c t j,1 is also normal distributed with the standard deviation of order (t − t n ) 1 2 . Hence for N sufficiently large, following from (177), it holds that Proof of Lemma 3.3. We show that under the event A T defined in (102), for any α > 0 there exists a C α depending only on α, T and C f0 such that at any fixed time t ∈ [t n , t n+1 ] This is done under the event A T in three steps: (1) We prove that for any t ∈ [t n , t n+1 ] the number of particles inside M t tn is larger than with probability less than N −α . Note that M * is used as a bound in the definition of (170) and (171). For any t ∈ [t n , t n+1 ] we prove that P card (M t tn ) > M * = P((S t tn ) c ) ≤ P((S (2) We prove that at any fixed time t ∈ [t n , t n+1 ], particles outside M t tn contribute to the interaction of k N 1 with probability less than N −α , namely (3) According to step (2) above, at any fixed time t ∈ [t n , t n+1 ], particles outside M t tn do not contribute to the interaction of k N 1 with high probability, so we only consider particles that are inside M t tn . And we know already from step (1) above that the number of particles inside M t tn is larger than M * , with low probability. To prove (178), we only need to prove at any fixed time t ∈ [t n , t n+1 ], where the event X (M t tn ) is defined by •Step 1: To prove the first part of (180), note that on the event A T defined in (102) and assuming that t ∈ [t n , t n+1 ] Hence Note that the law of (x j tn , v j tn ) has a density f N (x, v, t n ). For any j ∈ {2, . . . , N } the probability to find j ∈ M t tn for any t ∈ [t n , t n+1 ] is given by where the center Ξ t of the ball is given by Ξ t = x tn 1 + (t − t n )(v tn 1 − v), and the radius of the ball is given by which then satisfies the following transport equation Then one has where the center Ξ t 0 of the ball is given by Ξ t 0 = x tn 1 + (t − t n )v tn 1 , in particular the integration area is independent of v. It follows that the probability of finding j ∈ M t tn for any t ∈ [t n , t n+1 ] is equivalent to Since on the set A T the distance d of the particles x 1 and x 1 cannot be larger than N −λ2 , it follows that, given that the event A T holds, a particle x j is in the solid ball only if the particle x j is in the ball with dashed lines, i.e. with radius R = 3N −λ2 + log(N )(∆t) 3/2 around x 1 (see for example particles x 3 and x 3 ). Thus M t tn ⊆ M tn . Controlling M t tn by M tn will be helpful to estimate the number of particles inside these sets. The x j are distributed independently, and the probability of finding any of these x j inside the solid ball is small due to the small volume of the ball. This helps to estimate the number of particles in the set M tn (see Step 1). Particles outside the ball, i.e. indices not in M tn do not contribute to the interaction k 1 . This comes from the fact that in order to get a sufficiently small distance for x 1 to interact, they have to travel a long distance during the short time interval (t − t n ): the distance log(N )(∆t) 3/2 (recall that the support of k 1 has radius N −λ2 ). Due to the Brownian motion, this is possible, of course, but the probability to travel that far will be smaller than any polynomial in N . This argument is worked out in Step 2. The main contribution thus comes from Step 3. Knowing that the number of particles in M tn is quite small helps to estimate this term.
Next, we compute for 0 < s ≤ ∆t where we have chosen It follows that which leads to max because of (20), where C 2 depends only on T , and C f0 . It follows from (188) that where we define C * := C 2 ( 4 3 π) 2 3 , which depends only on T and C f0 . The probability of finding k particles inside the set M t tn is thus bounded from above by the binomial probability mass function with parameter p at position k, i.e. for any natural number 0 ≤ A ≤ N and any t ∈ [t n , t n+1 ] Binomially distributed random variables have mean N p and standard deviation N p(1 − p) < √ N p, and the probability to find more than N p + a √ N p particles in the set M t tn is exponentially small in a, i.e. there is a sufficiently large N for any α > 0 and any t ∈ [t n , t n+1 ] such that This is because of the central limit theory and so the binomial distribution can be seen as a normal distribution when N is sufficiently large. Since p ≥ CN −3λ2 , we get that √ N p > CN  (1−3λ2) ) particles is the set M t tn is smaller than any polynomial in N , i.e. there is a C α for any α > 0 and any t ∈ [t n , t n+1 ] such that •Step 2: For (181) it is sufficient to show that for any α > 0 there is a sufficiently large N such that for some j ∈ (M t tn ) c P max The total probability we have to control in (181) is at maximum the N -fold value of this. The key to prove that is Lemma 5.2. To have an interaction k N 1 ( x t 1 − x t j ) = 0 for all t ∈ [t n , t n+1 ] the distance between particle 1 and particle j has to be reduced to a value smaller than N −λ2 . Due to the Brownian motion, this is possible, but suppressed. Due to the fast decay of the Gaussian it is very unlikely that k N 1 ( x t 1 − x t j ) = 0. The probability is smaller than any polynomial in N (see Lemma 5.2).The same holds true for k N 1 (x t 1 − x t j ). In more detail: due to the cut-off N −λ2 we introduced for k N where we used the fact that (M t tn ) c ⊆ (M t tn ) c in the last inequality. With Lemma 5.2 we get the bound for (181).
•Step 3: To get (182) we prove that for any natural number where the event X (M t tn ) is defined in (183). This can be recast without relabeling j as Lemma 5.3. Let Z 1 , · · · , Z M be independent random variables with E[|Z i |] ≤ CM −2 and |Z i | ≤ C for any i ∈ {1, · · · , M }. Then for any α > 0, it holds that where C α depends only on C and α.
Proof. We first split the random variables Z i = Z a i + Z b i such that Z a i and Z b i are sequences of independent random variables with This can be achieved by defining Here we choose γ such that P(|Z a i | > 0) = M −1 . Applying Markov's inequality, one computes This implies that γ ≤ CM −1 . For the sum of Z b i we get the trivial bound Thus the lemma follows if we can show that where C α has been changed. Let Then it follows that Noticing that X i are i.i.d. Bernoulli random variables with P(X i = 1) = P(|Z a i | > 0) = M −1 , we get ) .
Thus one chooses M large enough and concludes (198), which proves the lemma.
Using the lemma above, now we proceed to prove (195). Define It follows that |Z j | is bounded and where we use the fact u a,N t,tn−1 ∞,1 ≤ CN which leads to (195) for M = N δ+ λ 2 due to the fact that 0 < λ 1 < 2 3 λ 2 . Thus we are left to prove (195) for the case This can be done by Lemma 2.5, which we repeat below for easier reference: and |Z i | ≤ C M g(M ). Then for any α > 0, the sample meanZ = 1 where C α depends only on C and α. For Then following the same argument as in (64), the condition is satisfied. We can also deduce that Applying Lemma 2.5 we obtain at any fixed time t ∈ [t n , t n+1 ] and similarly It is left to control the difference where M satisfies (204). This can be done by using Lemma 5.1. For any t ∈ [t n , t n+1 ], when 1 ≤ n ≤ M we write a = ( X tn−1 , V tn−1 ) = (X tn−1 , V tn−1 ) and b = (X tn−1 , V tn−1 ). Then it follows that where ρ a,1,N t,tn−1 (x 1 ) = R 3 u a,1,N t,tn−1 (x 1 , v 1 )dv 1 . Here we have used the fact that when 1 ≤ n ≤ M u a,j,N t,tn−1 − u b,j,N t,tn−1 ∞,1 ≤ C|a i − b i |((t − t n−1 ) −6 + 1) ≤ C|a i − b i |N 6λ1 by Lemma 5.1 since N −λ1 ≤ t − t n−1 ≤ 2N −λ1 . When n = 1, since a = ( X 0 , V 0 ) = (X 0 , V 0 ) = (X 0 , V 0 ) = b, one has Collecting (206), (207) and (208) we get (195) for M satisfying (204), which finishes the proof of (195) for any M . Hence we conclude (182).
•Step 4: Now we prove (178). To see this, we split the summation N j =1 into two parts: the part where j ∈ M t tn and the part where j ∈ (M t tn ) c where X (M t tn ) is defined in (183) and For the part in the event X ((M t tn ) c ) where j ∈ (M t tn ) c , it follows from (181) that Thus we have Next we split the summation j∈M t tn in the event X (M t tn ) (183) into two cases: the case where card (M t tn ) ≤ M * and the case where card (M t tn ) > M * . Here M * is defined in (179).
where in the last inequality we used (182). According to (180), for any t ∈ [t n , t n+1 ] one has P card (M t tn ) > M * ≤ N −α , Therefore it follows from (213) that Together with (212), it implies Finally, since the particles are exchangeable, the same result holds for changing ( x t 1 , x t 1 ) in (217) into ( x t i , x t i ), i = 2, · · · , N , which completes the proof of Lemma 3.3.
respectively. As a direct result from (226) and (227), one has | · | j G ∞,1 ≤ C 1 t (9−3j)/2 | · | j ∇ v G ∞,1 ≤ C 1 t 5−3j/2 . (228) and which leads to We use the trajectory c to change the frame of inertia that we use to look at u d,i,N t,s for d ∈ {a, b}, i.e. we define for any t > 0 the density w a,i,N t,0 on phase space by w a,i,N t,0 ((x, v)) := u a,i,N t,0 ((x, v) + c t ) .
From the evolution equation of u d,i,N t,s for d ∈ {a, b} and c t one gets directly Before proceeding we would like to explain the advantage of looking at w instead of u first on a heuristic level. The difficulties arise when dealing with short periods of time. There the u d , d ∈ {a, b} are roughly given by a Gaussian around the center at 1 2 (a + b), respectively the w d are roughly given by a Gaussian around the center at 0. Here the force term of w -which is zero at x = 0 -suppresses the last term of (243). Thus w will be very close to the heat-kernel G t of our time evolution.
Using (243) and the properties of the heat kernel we get  and using (241), we can find a constant C such that Using the properties of the heat kernel (220), (222) and Young's inequality in (247), we get Applying a generalized Gronwall's inequality with weak singularities [28, Lemma 7.1.1] leads to Further (247) gives for any 1 ≤ p ≤ ∞ and t ∈ [0, T ] η N t,0 p,1 ≤ | · |G t · − ∇ v G t−s * η N s,0 p,1 ds.
We use this formula starting at p 1 = 1 and setting p k+1 = 10p k 10−p k . Therefore, starting with our estimate for η a,i,N t,0 1,1 (see (249)) we can then iteratively estimate the L p norms of η N t,0 for higher exponents, i.e.