The intensity λ {\displaystyle \lambda } of a counting process is a measure of the rate of change of its predictable part. If a stochastic process { N ( t ) , t ≥ 0 } {\displaystyle \{N(t),t\geq 0\}} is a counting process, then it is a submartingale, and in particular its Doob-Meyer decomposition is
where M ( t ) {\displaystyle M(t)} is a martingale and Λ ( t ) {\displaystyle \Lambda (t)} is a predictable increasing process. Λ ( t ) {\displaystyle \Lambda (t)} is called the cumulative intensity of N ( t ) {\displaystyle N(t)} and it is related to λ {\displaystyle \lambda } by
Given probability space ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )} and a counting process { N ( t ) , t ≥ 0 } {\displaystyle \{N(t),t\geq 0\}} which is adapted to the filtration { F t , t ≥ 0 } {\displaystyle \{{\mathcal {F}}_{t},t\geq 0\}} , the intensity of N {\displaystyle N} is the process { λ ( t ) , t ≥ 0 } {\displaystyle \{\lambda (t),t\geq 0\}} defined by the following limit:
The right-continuity property of counting processes allows us to take this limit from the right.[1]
In statistical learning, the variation between λ {\displaystyle \lambda } and its estimator λ ^ {\displaystyle {\hat {\lambda }}} can be bounded with the use of oracle inequalities.
If a counting process N ( t ) {\displaystyle N(t)} is restricted to t ∈ [ 0 , 1 ] {\displaystyle t\in [0,1]} and n {\displaystyle n} i.i.d. copies are observed on that interval, N 1 , N 2 , … , N n {\displaystyle N_{1},N_{2},\ldots ,N_{n}} , then the least squares functional for the intensity is
which involves an Ito integral. If the assumption is made that λ ( t ) {\displaystyle \lambda (t)} is piecewise constant on [ 0 , 1 ] {\displaystyle [0,1]} , i.e. it depends on a vector of constants β = ( β 1 , β 2 , … , β m ) ∈ R + m {\displaystyle \beta =(\beta _{1},\beta _{2},\ldots ,\beta _{m})\in \mathbb {R} _{+}^{m}} and can be written
where the λ j , m {\displaystyle \lambda _{j,m}} have a factor of m {\displaystyle {\sqrt {m}}} so that they are orthonormal under the standard L 2 {\displaystyle L^{2}} norm, then by choosing appropriate data-driven weights w ^ j {\displaystyle {\hat {w}}_{j}} which depend on a parameter x > 0 {\displaystyle x>0} and introducing the weighted norm
the estimator for β {\displaystyle \beta } can be given:
Then, the estimator λ ^ {\displaystyle {\hat {\lambda }}} is just λ β ^ {\displaystyle \lambda _{\hat {\beta }}} . With these preliminaries, an oracle inequality bounding the L 2 {\displaystyle L^{2}} norm ‖ λ ^ − λ ‖ {\displaystyle \|{\hat {\lambda }}-\lambda \|} is as follows: for appropriate choice of w ^ j ( x ) {\displaystyle {\hat {w}}_{j}(x)} ,
with probability greater than or equal to 1 − 12.85 e − x {\displaystyle 1-12.85e^{-x}} .[2]