diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd
index 63346dbfb..247d19b44 100644
--- a/src/functions-reference/embedded_laplace.qmd
+++ b/src/functions-reference/embedded_laplace.qmd
@@ -4,23 +4,29 @@ pagetitle: Embedded Laplace Approximation
 
 # Embedded Laplace Approximation
 
-The embedded Laplace approximation can be used to approximate certain
-marginal and conditional distributions that arise in latent Gaussian models.
-A latent Gaussian model observes the following hierarchical structure:
+The embedded Laplace approximation can be used to approximate certain marginal
+and conditional distributions that arise in latent Gaussian models.
+Embedded Laplace replaces explicit sampling of high-dimensional Gaussian latent
+variables with a local Gaussian approximation.
+In doing so, it marginalizes out the latent Gaussian variables.
+Inference can then be performed on the remaining, often low-dimensional,
+parameters.  The embedded Laplace approximation in Stan is best suited for
+latent Gaussian models when jointly sampling over all model parameters is
+expensive and the conditional posterior of the Gaussian latent variables is
+reasonably close to Gaussian.
+
+For observed data $y$, latent Gaussian variables $\theta$, and hyperparameters $\phi$, a latent Gaussian model observes the following hierarchical structure:
 \begin{eqnarray}
   \phi &\sim& p(\phi), \\
   \theta &\sim& \text{MultiNormal}(0, K(\phi)), \\
   y &\sim& p(y \mid \theta, \phi).
 \end{eqnarray}
-In this formulation, $y$ represents the
-observed data, and $p(y \mid \theta, \phi)$ is the likelihood function that
-specifies how observations are generated conditional on the latent Gaussian
-variables $\theta$ and hyperparameters $\phi$.
+In this formulation, $p(y \mid \theta, \phi)$ is the likelihood function that
+specifies how observations are generated conditional on $\theta$ and $\phi$.
 $K(\phi)$ denotes the prior covariance matrix for the latent Gaussian variables
-$\theta$ and is parameterized by $\phi$.
-The prior $p(\theta \mid \phi)$ is restricted to be a multivariate normal
-centered at 0. That said, we can always pick a likelihood that offsets $\theta$,
-which is equivalently to specifying a prior mean.
+$\theta$ and is parameterized by $\phi$. The prior on $\theta$ is centered at 0,
+however an offset can always be added when specifying the likelihood function
+$p(y \mid \theta, \phi)$.
 
 To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either
 use a standard method, such as Markov chain Monte Carlo, or we can follow
@@ -34,7 +40,7 @@ are typically available in closed form and so they must be approximated.
 The marginal posterior can be written as  $p(\phi \mid y) \propto p(y \mid \phi) p(\phi)$,
 where  $p(y \mid \phi) = \int p(y \mid \phi, \theta) p(\theta) \text{d}\theta$
 is called the marginal likelihood.  The Laplace method approximates
-$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at
+$p(y \mid \phi, \theta) p(\theta)$ with a normal distribution centered at the mode,
 $$
   \theta^* = \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi),
 $$
@@ -53,7 +59,7 @@ using one of Stan's algorithms. The marginal posterior is lower
 dimensional and likely to have a simpler geometry leading to more
 efficient inference. On the other hand each marginal likelihood
 computation is more costly, and the combined change in efficiency
-depends on the case.
+depends on the application.
 
 To obtain posterior draws for $\theta$, we sample from the normal
 approximation to $p(\theta \mid y, \phi)$ in `generated quantities`.
@@ -62,7 +68,11 @@ then $p(\theta \mid y, \phi)$ produces samples from the joint posterior
 $p(\theta, \phi \mid y)$.
 
 The Laplace approximation is especially useful if $p(y \mid \phi, \theta)$ is
-log-concave. Stan's embedded Laplace approximation is restricted to the case
+log-concave, e.g., Poisson, binomial, negative-binomial, and Bernoulli.
+(The normal distribution is also log concave, however when the likelihood is
+normal, marginalization can be performed exactly and does not required an
+approximation.)
+Stan's embedded Laplace approximation is restricted to the case
 where the prior $p(\theta \mid \phi)$ is multivariate normal.
 Furthermore, the likelihood $p(y \mid \phi, \theta)$ must be computed using
 only operations which support higher-order derivatives
@@ -74,33 +84,36 @@ In the `model` block, we increment `target` with `laplace_marginal`, a function
 that approximates the log marginal likelihood $\log p(y \mid \phi)$.
 The signature of the function is:
 
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, function covariance\_function, tuple(...), vector theta\_init covariance\_arguments): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal }!{\tt (function likelihood\_function, tuple(...) likelihood\_arguments, int hessian_block_size, function covariance\_function, tuple(...)): real}|hyperpage}
 
-<!-- real; laplace_marginal; (function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments); -->
+`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, int hessian_block_size, function covariance_function, tuple(...) covariance_arguments)`
 
-`real` **`laplace_marginal`**`(function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments)`
-
-Which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
+which returns an approximation to the log marginal likelihood $p(y \mid \phi)$.
 {{< since 2.37 >}}
 
-This function takes in the following arguments.
+The embedded Laplace functions accept two functors whose user defined arguments are passed in as tuples to `laplace_marginal`.
 
-1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables `theta`
-2. `likelihood_arguments` - A tuple of the log likelihood arguments whose internal members will be passed to the covariance function
-3. `covariance_function` - Prior covariance function
-4. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function
+1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables $\theta$.
+The subsequent arguments are user defined.
+  - `real likelihood_function(vector theta, likelihood_arguments_1, likelihood_arguments_2, ...)`
+2. `likelihood_arguments` - A tuple of arguments whose internal members are be passed to the log likelihood function.
+This tuple does NOT include the latent variable $\theta$.
+3. `hessian_block_size` - the block size of the Hessian of the log likelihood, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+4. `covariance_function` - A function that returns the covariance matrix of the multivariate normal prior on $\theta$.
+  - `matrix covariance_function(covariance_argument_1, covariance_argument_2, ...)`
+5. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function.
 
 Below we go over each argument in more detail.
 
 ## Specifying the log likelihood function {#laplace-likelihood_spec}
 
 The first step to use the embedded Laplace approximation is to write down a
-function in the `functions` block which returns the log joint likelihood
+function in the `functions` block which returns the log likelihood
 $\log p(y \mid \theta, \phi)$.
 
 There are a few constraints on this function:
 
-1. The function return type must be `real`
+1. The function return type must be `real`.
 
 2. The first argument must be the latent Gaussian variable $\theta$ and must
 have type `vector`.
@@ -124,7 +137,7 @@ as data or parameter.
 
 The tuple after `likelihood_function` contains the arguments that get passed
 to `likelihood_function` *excluding $\theta$*. For instance, if a user defined
-likelihood uses a real and a matrix the likelihood function's signature would
+likelihood uses a real and a matrix, the likelihood function's signature would
 first have a vector and then a real and matrix argument.
 
 ```stan
@@ -149,6 +162,13 @@ for example,
 real likelihood_function(vector theta, data vector x, ...)
 ```
 
+In addition to the likelihood function, users must specify the block size
+of the Hessian, $\partial^2 \log p(y \mid \theta, \phi) / \partial \theta^2$.
+The Hessian is often block diagonal and this structure can be taken advantage of for fast computation.
+For example, if $y_i$ only depends on $\theta_i$, then the Hessian is diagonal and `hessian_block_size=1`.
+On the other hand, if the Hessian is not block diagonal, we can always set
+`hessian_block_size=n` where $n$ is the size of $\theta$.
+
 ## Specifying the covariance function
 
 The argument `covariance_function` returns the prior covariance matrix
@@ -159,20 +179,6 @@ It's return type must be a matrix of size $n \times n$ where $n$ is the size of
 matrix covariance_function(...)
 ```
 
-<!-- In the `model` block, we increment `target` with `laplace_marginal`, a function
-that approximates the log marginal likelihood $\log p(y \mid \phi)$.
-This function takes in the
-user-specified likelihood and prior covariance functions, as well as their arguments.
-These arguments must be passed as tuples, which can be generated on the fly
-using parenthesis.
-We also need to pass an argument $\theta_0$ which serves as an initial guess for
-the optimization problem that underlies the Laplace approximation,
-$$
-  \underset{\theta}{\text{argmax}} \ \log p(\theta \mid y, \phi).
-$$
-The size of $\theta_0$ must be consistent with the size of the $\theta$ argument
-passed to `likelihood_function`. -->
-
 The `...` represents a set of optional
 variadic arguments. There is no type restrictions for the variadic arguments
 `...` and each argument can be passed as data or parameter. The variables
@@ -198,51 +204,77 @@ It also possible to specify control parameters, which can help improve the
 optimization that underlies the Laplace approximation, using `laplace_marginal_tol`
 with the following signature:
 
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-<!-- real; laplace_marginal_tol; (function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch); -->
-\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
-
-`real` **`laplace_marginal_tol`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
+```stan
+real laplace_marginal_tol(function likelihood_function, tuple(...),
+  hessian_block_size,
+  function covariance_function, tuple(...),
+  tuple(vector theta_init, real tol, int max_steps, int solver,
+        int max_steps_linesearch, int allow_fallback))
+```
 
 Returns an approximation to the log marginal likelihood $p(y \mid \phi)$
 and allows the user to tune the control parameters of the approximation.
 
-* `theta_init`: the initial guess for the Newton solver when finding the mode
+* `theta_init`: the initial guess for a Newton solver when finding the mode
 of $p(\theta \mid y, \phi)$. By default, it is a zero-vector.
 
 * `tol`: the tolerance $\epsilon$ of the optimizer. Specifically, the optimizer
 stops when $||\nabla \log p(\theta \mid y, \phi)|| \le \epsilon$. By default,
-the value is $\epsilon = 10^{-6}$.
+the value is $\epsilon \approx 1.49 \times 10^{-8}$, which is the square-root of machine precision.
 
 * `max_num_steps`: the maximum number of steps taken by the optimizer before
 it gives up (in which case the Metropolis proposal gets rejected). The default
-is 100 steps.
+is 500 steps.
 
-* `hessian_block_size`: the size of the blocks, assuming the Hessian
-$\partial \log p(y \mid \theta, \phi) \ \partial \theta$ is block-diagonal.
-The structure of the Hessian is determined by the dependence structure of $y$
-on $\theta$. By default, the Hessian is treated as diagonal
-(`hessian_block_size=1`). If the Hessian is not block diagonal, then set
-`hessian_block_size=n`, where `n` is the size of $\theta$.
-
-* `solver`: choice of Newton solver. The optimizer used to compute the
+* `solver`: choice of Newton solver. The optimizer underlying the
 Laplace approximation does one of three matrix decompositions to compute a
-Newton step. The problem determines which decomposition is numerical stable.
-By default (`solver=1`), the solver makes a Cholesky decomposition of the
-negative Hessian, $- \partial \log p(y \mid \theta, \phi) / \partial \theta$.
-If `solver=2`, the solver makes a Cholesky decomposition of the covariance
-matrix $K(\phi)$.
-If the Cholesky decomposition cannot be computed for neither the negative
-Hessian nor the covariance matrix, use `solver=3` which uses a more expensive
-but less specialized approach.
+Newton step. The problem determines which decomposition is numerically stable.
+By default (`solver=1`), the solver attempts a Cholesky decomposition of the
+negative Hessian of the log likelihood,
+$- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$.
+This operation is legal if the negative Hessian is positive-definite,
+which will always be true when the likelihood is log concave.
+If `solver=2`, the solver makes a Cholesky decomposition of the covariance matrix $K(\phi)$.
+Since a covariance matrix is always positive-definite, compute its
+Cholesky decomposition is always a legal operation, at least in theory.
+In practice, we may not be able to compute the Cholesky decomposition of the
+negative Hessian or the covariance matrix, either because it does not exist or
+because of numerical issues.
+In that case, we can use `solver=3` which uses a more expensive but less
+specialized approach to compute a Newton step.
 
 * `max_steps_linesearch`: maximum number of steps in linesearch. The linesearch
-method tries to insure that the Newton step leads to a decrease in the
-objective function. If the Newton step does not improve the objective function,
-the step is repeatedly halved until the objective function decreases or the
-maximum number of steps in the linesearch is reached. By default,
-`max_steps_linesearch=0`, meaning no linesearch is performed.
+adjusts to step size to ensure that a Newton step leads to an increase in
+the objective function (i.e., $f(\theta) = p(\theta \mid \phi, y)$).
+If a standard Newton step does not improve the objective function,
+the step is adjusted iteratively until the objective function increases
+or the maximum number of steps in the linesearch is reached.
+By default, `max_steps_linesearch=1000`.
+Setting `max_steps_linesearch=0` results in no linesearch.
+
+* `allow_fallback`: If user set solver fails, this flag determines whether to fallback to the next solver. For example, if the user specifies `solver=1` but the Cholesky decomposition of the negative Hessian $- \partial^2 \log p(y \mid \theta, \phi) / \partial^2 \theta$ fails, the optimizer will try `solver=2` instead.
+By default, `allow_fallback = 1` (TRUE).
+
+The embedded Laplace approximation's options have a helper callable `generate_laplace_options(int theta_size)` that will generate the tuple for you. This can be useful for quickly setting up the control parameters in the `transformed data` block to reuse within the model.
+
+```stan
+tuple(vector[theta_size], real, int, int, int, int, int) laplace_ops = generate_laplace_options(theta_size);
+// Modify solver type
+laplace_ops.5 = 2;
+// Turn off fallthrough
+laplace_ops.7 = 0;
+```
+
+The arguments stored in the `laplace_ops` tuple are,
+```
+laplace_ops = {theta_init,
+               tol,
+               max_num_steps,
+               hessian_block_size,
+               solver,
+               max_steps_linesearch,
+               allow_fallback}
+```
 
 {{< since 2.37 >}}
 
@@ -253,18 +285,18 @@ approximation of $p(\theta \mid \phi, y)$ using `laplace_latent_rng`.
 The signature for `laplace_latent_rng` follows closely
 the signature for `laplace_marginal`:
 
-<!-- vector; laplace_latent_rng; (function likelihood_function, tuple(...), function covariance_function, tuple(...)); -->
-\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init): vector}|hyperpage}
+<!-- vector; laplace_latent_rng; (function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...)); -->
+\index{{\tt \bfseries laplace\_latent\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...)): vector}|hyperpage}
 
-`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...))`<br>\newline
+`vector` **`laplace_latent_rng`**`(function likelihood_function, tuple(...), hessian_block_size, function covariance_function, tuple(...))`<br>\newline
 
-Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$.
+Draws samples from the Laplace approximation to the conditional posterior $p(\theta \mid y, \phi)$.
 {{< since 2.37 >}}
 
 Once again, it is possible to specify control parameters:
-\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): vector}|hyperpage}
+\index{{\tt \bfseries laplace\_latent\_tol\_rng }!{\tt (function likelihood\_function, tuple(...), int hessian_block_size, function covariance\_function, tuple(...), tuple(...) laplace_ops): vector}|hyperpage}
 
-`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
+`vector` **`laplace_latent_tol_rng`**`(function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...), tuple(...) laplace_ops)`<br>\newline
 Draws approximate samples from the conditional posterior $p(\theta \mid y, \phi)$
 and allows the user to tune the control parameters of the approximation.
 {{< since 2.37 >}}
diff --git a/src/functions-reference/functions_index.qmd b/src/functions-reference/functions_index.qmd
index 870df9d52..ad18c3d42 100644
--- a/src/functions-reference/functions_index.qmd
+++ b/src/functions-reference/functions_index.qmd
@@ -1713,7 +1713,7 @@ pagetitle: Alphabetical Index
 
 <a id='laplace_latent_rng' href='#laplace_latent_rng' class='anchored unlink'>**laplace_latent_rng**:</a>
 
- - <div class='index-container'>[`(function likelihood_function, tuple(...), function covariance_function, tuple(...)) : vector`](embedded_laplace.qmd#index-entry-6d0685309664591fc32d3e2a2304af7aa5459e1c) <span class='detail'>(embedded_laplace.html)</span></div>
+ - <div class='index-container'>[`(function likelihood_function, tuple(...), int hessian_block_size, function covariance_function, tuple(...)) : vector`](embedded_laplace.qmd#index-entry-cacb1d5344e246ec89c460e00a6f1065f4a8a1c1) <span class='detail'>(embedded_laplace.html)</span></div>
 
 
 <a id='laplace_latent_tol_bernoulli_logit_rng' href='#laplace_latent_tol_bernoulli_logit_rng' class='anchored unlink'>**laplace_latent_tol_bernoulli_logit_rng**:</a>
@@ -1731,11 +1731,6 @@ pagetitle: Alphabetical Index
  - <div class='index-container'>[`(array[] int y, array[] int y_index, vector m, function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch) : vector`](embedded_laplace.qmd#index-entry-97f692748ebfabf0574b7a8e2edaceb940ee4b6b) <span class='detail'>(embedded_laplace.html)</span></div>
 
 
-<a id='laplace_marginal' href='#laplace_marginal' class='anchored unlink'>**laplace_marginal**:</a>
-
- - <div class='index-container'>[`(function likelihood_function, tuple(...) likelihood_arguments, function covariance_function, tuple(...) covariance_arguments) : real`](embedded_laplace.qmd#index-entry-6da15f0ed076016d814cdc278127896f99d29633) <span class='detail'>(embedded_laplace.html)</span></div>
-
-
 <a id='laplace_marginal_bernoulli_logit' href='#laplace_marginal_bernoulli_logit' class='anchored unlink'>**laplace_marginal_bernoulli_logit**:</a>
 
  - <div class='index-container'>[distribution statement](embedded_laplace.qmd#index-entry-1d93ab799518d0aac88e63c01f9655f36c7cbeb6) <span class='detail'>(embedded_laplace.html)</span></div>
@@ -1781,11 +1776,6 @@ pagetitle: Alphabetical Index
  - <div class='index-container'>[`(array[] int y | array[] int y_index, vector m, function covariance_function, tuple(...)) : real`](embedded_laplace.qmd#index-entry-c092314a5f45deef27ce0e8930a7d28c87ca601d) <span class='detail'>(embedded_laplace.html)</span></div>
 
 
-<a id='laplace_marginal_tol' href='#laplace_marginal_tol' class='anchored unlink'>**laplace_marginal_tol**:</a>
-
- - <div class='index-container'>[`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch) : real`](embedded_laplace.qmd#index-entry-0f4bd0330deef2db7884dc5a4c933f181e1f2a8c) <span class='detail'>(embedded_laplace.html)</span></div>
-
-
 <a id='laplace_marginal_tol_bernoulli_logit' href='#laplace_marginal_tol_bernoulli_logit' class='anchored unlink'>**laplace_marginal_tol_bernoulli_logit**:</a>
 
  - <div class='index-container'>[distribution statement](embedded_laplace.qmd#index-entry-92d43f7c3643c7d85966bbfc0b2a71546facb37c) <span class='detail'>(embedded_laplace.html)</span></div>
diff --git a/src/reference-manual/laplace.qmd b/src/reference-manual/laplace.qmd
index 092b8c479..06ac96432 100644
--- a/src/reference-manual/laplace.qmd
+++ b/src/reference-manual/laplace.qmd
@@ -14,14 +14,14 @@ to the constrained space before outputting them.
 
 Given the estimate of the mode  $\widehat{\theta}$,
 the Hessian $H(\widehat{\theta})$ is computed using
-central finite differences of the model functor. 
+central finite differences of the model functor.
 Next the algorithm computes the Cholesky factor of the negative inverse Hessian:
 
 $R^{-1} = \textrm{chol}(-H(\widehat{\theta})) \backslash \mathbf{1}$.
 
 Each draw is generated on the unconstrained scale by sampling
 
-$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$ 
+$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$
 
 and defining draw $m$ to be
 
diff --git a/src/reference-manual/laplace_embedded.qmd b/src/reference-manual/laplace_embedded.qmd
index 3912ccc8a..5a9fdd542 100644
--- a/src/reference-manual/laplace_embedded.qmd
+++ b/src/reference-manual/laplace_embedded.qmd
@@ -4,173 +4,191 @@ pagetitle: Embedded Laplace Approximation
 
 # Embedded Laplace Approximation
 
-Stan provides functions to perform an embedded Laplace
-approximation for latent Gaussian models, following the procedure described
-by @RasmussenWilliams:2006 and @Rue:2009. This approach is often refered to
-as the integrated or nested Laplace approximation, although the exact details
-of the method can vary. The details of Stan's implementation can be found in
-references [@Margossian:2020; @Margossian:2023].
-
-A standard approach to fit a latent Gaussian model would be to perform inference
-jointly over the latent Gaussian variables and the hyperparameters.
-Instead, the embedded Laplace approximation can be used to do *approximate*
-marginalization of the latent Gaussian variables; we can then
-use any inference over the remaining hyperparameters, for example Hamiltonian
-Monte Carlo sampling.
-
-Formally, consider a latent Gaussian model,
+The embedded Laplace approximation replaces explicit sampling of high-dimensional Gaussian latent variables with a local Gaussian approximation, marginalizing them out so that inference proceeds over the remaining hyperparameters alone.
+This approach is often referred to as the integrated or nested Laplace approximation, although the exact details of the method can vary.
+The details of Stan's implementation can be found in references [@Margossian:2020; @Margossian:2023].
+
+A standard approach to fit a latent Gaussian model would be to perform inference jointly over the latent Gaussian variables and the hyperparameters.
+Instead, the embedded Laplace approximation can be used to do *approximate* marginalization of the latent Gaussian variables; we can then use any inference over the remaining hyperparameters.
+By marginalizing out the latent variables, the sampler explores a lower-dimensional, better-behaved marginal posterior.
+Individual iterations are more expensive (each requires an inner optimization), but the sampler typically needs far fewer iterations to achieve the same effective sample size.
+
+For complete function signatures and the built-in likelihood wrappers (Poisson, Negative Binomial, Bernoulli), see the [Embedded Laplace functions reference](../functions-reference/embedded_laplace.qmd).
+For worked examples with full data blocks, see the [Gaussian Processes chapter](../stan-users-guide/gaussian-processes.qmd#poisson-gp-using-an-embedded-laplace-approximation).
+
+
+## Latent Gaussian models
+
+The embedded Laplace approximation is used for latent Gaussian models. A latent Gaussian model is defined by three main components:
+
+- $\phi$: hyperparameters (e.g., GP kernel length-scale and magnitude, or variance components in a hierarchical model),
+- $\theta$: latent Gaussian variables (the high-dimensional quantity to be marginalized out),
+- $y$: observed data.
+
+These components are related through a hierarchical structure.
+The hyperparameters $\phi$ are given a prior $p(\phi)$.
+The latent variables $\theta$ have a multivariate normal prior with covariance matrix $K(\phi)$.
+The observations $y$ are generated according to a likelihood $p(y \mid \theta, \phi)$.
+The prior on $\theta$ is centered at zero; an offset can be incorporated into the likelihood function if a non-zero mean is needed.
+
 \begin{eqnarray*}
   \phi & \sim & p(\phi) \\
   \theta & \sim & \text{Multi-Normal}(0, K(\phi)) \\
   y & \sim & p(y \mid \theta, \phi).
 \end{eqnarray*}
-The motivation for marginalization is to bypass the challenging geometry of the joint
-posterior $p(\phi, \theta \mid y)$. This geometry (e.g. funnels) often frustrates
-inference algorithms, including Hamiltonian Monte Carlo sampling and approximate
-methods such as variational inference. On the other hand, the marginal posterior
-$p(\phi \mid y)$ is often well-behaved and in many cases low-dimensional.
-Furthermore, the conditional posterior $p(\theta \mid \phi, y)$ can be well
-approximated by a normal distribution, if the likelihood $p(y \mid \theta, \phi)$
-is log concave.
 
+The generative model above defines a joint distribution over all three quantities, $p(\phi, \theta, y) = p(\phi) \, p(\theta \mid \phi) \, p(y \mid \theta, \phi)$.
+After observing data $y$, Bayes' theorem gives the joint posterior $p(\phi, \theta \mid y) \propto p(\phi) \, p(\theta \mid \phi) \, p(y \mid \theta, \phi)$.
+
+Sampling directly from the joint posterior $p(\phi, \theta \mid y)$ of this model is often difficult.
+Challenging geometries (e.g., funnels) frustrate inference algorithms, including Hamiltonian Monte Carlo and variational inference.
+However, the marginal posterior $p(\phi \mid y)$ is often well-behaved and low-dimensional, making it much easier to sample.
+The Laplace approximation allows the conditional posterior $p(\theta \mid \phi, y)$ to be approximated by a normal distribution when the likelihood $p(y \mid \theta, \phi)$ is log concave.
 
 ## Approximation of the conditional posterior and marginal likelihood
 
-The Laplace approximation is the normal distribution that matches the mode
-and curvature of the conditional posterior $p(\theta \mid y, \phi)$.
-The mode,
+The two-step inference strategy for using embedded laplace in a latent Gaussian model requires approximations to both the conditional posterior $p(\theta \mid y, \phi)$ and the marginal likelihood $p(y \mid \phi)$.
+The Laplace approximation is the normal distribution that matches the mode and curvature of the conditional posterior $p(\theta \mid y, \phi)$.
+The mode, defined as the value of $\theta$ that maximizes the conditional posterior, is estimated by a Newton solver,
 $$
   \theta^* = \underset{\theta}{\text{argmax}} \ p(\theta \mid y, \phi),
 $$
-is estimated by a Newton solver. Since the approximation is normal,
-the curvature is matched by setting the covariance to the negative Hessian
-of the log conditional posterior, evaluated at the mode,
+
+Since the approximation is normal, the curvature is matched by setting the covariance to the negative Hessian of the log conditional posterior, evaluated at the mode,
+
 $$
   \Sigma^* = -  \left . \frac{\partial^2}{\partial \theta^2}
     \log p (\theta \mid \phi, y) \right |_{\theta =\theta^*}.
 $$
-The resulting Laplace approximation is then,
+
+The resulting Laplace approximation is a multivariate normal centered at the mode with covariance given by the inverse curvature,
+
 $$
 \hat p_\mathcal{L} (\theta \mid y, \phi) = \text{Multi-Normal}(\theta^*, \Sigma^*)
 \approx p(\theta \mid y, \phi).
 $$
-This approximation implies another approximation for the marginal likelihood,
+
+This approximation also yields an approximation to the marginal likelihood, obtained by evaluating the prior, likelihood, and approximate posterior at the mode $\theta^*$,
+
 $$
    \hat p_\mathcal{L}(y \mid \phi) := \frac{p(\theta^* \mid \phi) \
    p(y \mid \theta^*, \phi) }{ \hat p_\mathcal{L} (\theta^* \mid \phi, y) }
    \approx p(y \mid \phi).
 $$
-Hence, a strategy to approximate the posterior of the latent Gaussian model
-is to first estimate the marginal posterior
-$\hat p_\mathcal{L}(\phi \mid y) \propto p(\phi) p_\mathcal{L} (y \mid \phi)$
+
+Hence, a strategy to approximate the posterior of the latent Gaussian model is to first estimate the marginal posterior $\hat p_\mathcal{L}(\phi \mid y) \propto p(\phi) p_\mathcal{L} (y \mid \phi)$
 using any algorithm supported by Stan.
-Approximate posterior draws for the latent Gaussian variables are then
-obtained by first drawing $\phi \sim \hat p_\mathcal{L}(\phi \mid y)$ and
-then $\theta \sim  \hat p_\mathcal{L}(\theta \mid \phi, y)$.
+Approximate posterior draws for the latent Gaussian variables are then obtained by first drawing $\phi \sim \hat p_\mathcal{L}(\phi \mid y)$ and then $\theta \sim  \hat p_\mathcal{L}(\theta \mid \phi, y)$.
 
 
 ## Trade-offs of the approximation
 
-The embedded Laplace approximation presents several trade-offs with standard
-inference over the joint posterior $p(\theta, \phi \mid y)$. The main
-advantage of the embedded Laplace approximation is that it side-steps the
-intricate geometry of hierarchical models. The marginal posterior
-$p(\phi \mid y)$ can then be handled by Hamiltonian Monte Carlo sampling
-without extensive tuning or reparameterization, and the mixing time is faster,
-meaning we can run shorter chains to achieve a desired precision. In some cases,
-approximate methods, e.g. variational inference, which
-work poorly on the joint $p(\theta, \phi \mid y)$ work well on the marginal
-posterior $p(\phi \mid y)$.
-
-On the other hand, the embedded Laplace approximation presents certain
-disadvantages. First, we need to perform a Laplace approximation each time
-the log marginal likelihood is evaluated, meaning each iteration
-can be expensive. Secondly, the approximation can introduce non-negligible
-error, especially with non-conventional likelihoods (note the prior
-is always multivariate normal). How these trade-offs are resolved depends on
-the application; see @Margossian:2020 for some examples.
+The embedded Laplace approximation presents several trade-offs with standard inference over the joint posterior $p(\theta, \phi \mid y)$.
+The main advantage of the embedded Laplace approximation is that it side-steps the intricate geometry of hierarchical models.
+The marginal posterior $p(\phi \mid y)$ can then be handled by Hamiltonian Monte Carlo sampling without extensive tuning or reparameterization, and the mixing time is faster, meaning we can run shorter chains to achieve a desired precision.
+One additional benefit is that approximate methods, e.g. variational inference, which work poorly on the joint $p(\theta, \phi \mid y)$ work well on the marginal posterior $p(\phi \mid y)$.
+
+On the other hand, the embedded Laplace approximation presents certain disadvantages.
+First, we need to perform a Laplace approximation each time the log marginal likelihood is evaluated, meaning each iteration can be expensive.
+Secondly, the approximation can introduce non-negligible error, especially with non-conventional likelihoods (note the prior is always multivariate normal).
+How these trade-offs are resolved depends on the application; see @Margossian:2020 for some examples.
 
+### When the approximation is appropriate
+
+The quality of the Laplace approximation depends on how close the true conditional posterior $p(\theta \mid y, \phi)$ is to Gaussian.
+
+**Works well.** Log-concave likelihoods i.e. A Poisson with log link or negative binomial with log link.
+These produce unimodal conditional posteriors when combined with a Gaussian prior.
+The approximation error is typically negligible for these likelihoods, especially with moderate-to-large counts [@Kuss:2005; @Vanhatalo:2010; @Cseke:2011; @Vehtari:2016].
+
+**Works adequately.** Bernoulli with logit link.
+The curvature information from binary observations is weaker, so the Gaussian approximation is less accurate than for count data.
+The embedded Laplace is still useful when $\theta$ is high-dimensional and joint sampling is infeasible; see @Vehtari:2016 and @Margossian:2020 for discussion.
+
+**Not appropriate.** The normal distribution is log-concave, but when the likelihood is normal, marginalization can be performed exactly and no approximation is needed.
+For likelihoods that are not log-concave in $\theta$, the conditional posterior may be multimodal and the Laplace approximation will miss modes.
+When $\theta$ is low-dimensional (a few dozen or fewer), the overhead of the inner optimization may not pay for itself and standard joint HMC sampling is often adequate.
 
 ## Details of the approximation
 
+When the embedded Laplace approximation does not converge or produces unexpected results, the solver configuration may need adjustment.
+This section describes the internals of the Newton solver and the options available for tuning it.
+
 ### Tuning the Newton solver
 
-A critical component of the embedded Laplace approximation is the Newton solver
-used to estimate the mode $\theta^*$ of $p(\theta \mid \phi, y)$. The objective
-function being maximized is
+A critical component of the embedded Laplace approximation is the Newton solver used to estimate the mode $\theta^*$ of $p(\theta \mid \phi, y)$.
+The objective function being maximized is the log joint density of the prior and likelihood.
+
 $$
 \Psi(\theta) = \log p(\theta \mid \phi) + \log p(y \mid \theta, \phi),
 $$
-and convergence is declared if the change in the objective is sufficiently
-small between two iterations
+
+Convergence is declared when the change in the objective between successive iterations falls below a *tolerance* $\Delta$.
+
 $$
 | \Psi (\theta^{(i + 1)}) - \Psi (\theta^{(i)}) | \le \Delta,
 $$
-for some *tolerance* $\Delta$. The solver also stops after reaching a
-pre-specified *maximum number of steps*: in that case, Stan throws an exception
-and rejects the current proposal. This is not a problem, as
-long as these exceptions are rare and confined to early phases of the warmup.
 
-The Newton iteration can be augmented with a linesearch step to insure that
-at each iteration the objective function $\Psi$ decreases. Specifically,
-suppose that
+The solver also stops after reaching a pre-specified *maximum number of steps*.
+In that case, Stan throws a warning, but still returns the last iteration's parameters.
+If you see this warning you should check the diagnostics to understand why the solver failed to converge.
+
+To help with cases where the Newton step does not lead to a decrease in the objective function, the Newton iteration is augmented with a wolfe line-search to ensure that at each iteration the objective function $\Psi$ decreases.
+Specifically, suppose the objective increases after a Newton step, indicating the step overshot a region of improvement.
+
 $$
 \Psi (\theta^{(i + 1)}) < \Psi (\theta^{(i)}).
 $$
-This can indicate that the Newton step is too large and that we skipped a region
-where the objective function decreases. In that case, we can reduce the step
-length by a factor of 2, using
+
+This can indicate that the Newton step $\alpha$ at iteration $i$ is too large and that we skipped a region where the objective function decreases.
+In that case, we can fallback to a Wolfe line search to find a step size which satisfies the Wolfe conditions. The wolfe line search attempts to find a search direction $p_i$ and step size $\alpha_k$ such that an accepted step both increases our objective while ensuring the slope of the accepted step is flatter than our previous position. Together these help push the algorithm towards a minimum.
+
+$$
+f(x_i + \alpha_k p_i) \le f(x_i) + c_1 \alpha_k \nabla f(x_i)^T p_i
+-p^T_i \Delta f(x_i + \alpha_k p_i) \le -c_2 p^T_i \Delta f(x_i)
+$$
+
 $$
   \theta^{(i + 1)} \leftarrow \frac{\theta^{(i + 1)} + \theta^{(i)}}{2}.
 $$
-We repeat this halving of steps until
-$\Psi (\theta^{(i + 1)}) \ge \Psi (\theta^{(i)})$, or until a maximum number
-of linesearch steps is reached. By default, this maximum is set to 0, which
-means the Newton solver performs no linesearch. For certain problems, adding
-a linsearch can make the optimization more stable.
 
+We repeat this halving of steps until $\Psi (\theta^{(i + 1)}) \ge \Psi (\theta^{(i)})$, or until a maximum number of linesearch steps is reached.
+For certain problems, adding a linesearch can make the optimization and solver more stable.
+
+### Solver Strategies
 
-The embedded Laplace approximation uses a custom Newton solver, specialized
-to find the mode of $p(\theta \mid \phi, y)$.
-A keystep for efficient optimization is to insure all matrix inversions are
-numerically stable. This can be done using the Woodburry-Sherman-Morrison
-formula and requires one of three matrix decompositions:
+The embedded Laplace approximation uses a custom Newton solver, specialized to find the mode of $p(\theta \mid \phi, y)$.
+A key step for efficient optimization is to ensure all matrix inversions are numerically stable.
+This can be done using the Woodbury-Sherman-Morrison formula and requires one of three matrix decompositions:
 
-1. Cholesky decomposition of the Hessian of the negative log likelihood
-$W = - \partial^2_\theta \log p(y \mid \theta, \phi)$
+1. Cholesky decomposition of the Hessian of the negative log likelihood $W = - \partial^2_\theta \log p(y \mid \theta, \phi)$.
 
 2. Cholesky decomposition of the prior covariance matrix $K(\phi)$.
 
 3. LU-decomposition of $I + KW$, where $I$ is the identity matrix.
 
-The first solver (1) should be used if the negative log likelihood is
-positive-definite. Otherwise the user should rely on (2). In rarer cases where
-it is not numerically safe to invert the covariance matrix $K$, users can
-use the third solver as a last-resort option.
+The first solver (1) should be used if the negative log likelihood is positive-definite.
+Otherwise the user should rely on (2).
+In rarer cases where it is not numerically safe to invert the covariance matrix $K$, users can use the third solver as a last-resort option.
 
 
 ### Sparse Hessian of the log likelihood
 
-A key step to speed up computation is to take advantage of the sparsity of
-the Hessian of the log likelihood,
+A key step to speed up computation is to take advantage of the sparsity of $H$, the Hessian of the log likelihood with respect to the latent variables,
 $$
   H = \frac{\partial^2}{\partial \theta^2} \log p(y \mid \theta, \phi).
 $$
-For example, if the observations $(y_1, \cdots, y_N)$ are conditionally
-independent and each depends on only depend on one component of $\theta$,
-such that
+For example, if the observations $(y_1, \cdots, y_N)$ are conditionally independent and each depends on only one component of $\theta$, the log likelihood decomposes into a sum of per-observation terms,
 $$
   \log p(y \mid \theta, \phi) = \sum_{i = 1}^N \log p(y_i \mid \theta_i, \phi),
 $$
-then the Hessian is diagonal. This leads to faster calculations of the Hessian
-and subsequently sparse matrix operations. This case is common in Gaussian
-process models, and certain hierarchical models.
+and the Hessian is diagonal.
+This leads to faster calculations of the Hessian and subsequently sparse matrix operations.
+This case is common in Gaussian process models, and certain hierarchical models.
 
-Stan's suite of functions for the embedded Laplace approximation are not
-equipped to handle arbitrary sparsity structures; instead, they work on
-block-diagonal Hessians, and the user can specify the size $B$ of these blocks.
-The user is responsible for working out what $B$ is. If the Hessian is dense,
-then we simply set $B = N$.
+Stan's suite of functions for the embedded Laplace approximation exploits block-diagonal structure in the Hessian, where the user specifies the block size B.
+The user can specify the size $B$ of these blocks.
+The user is responsible for working out what $B$ is. If the Hessian is dense, then we simply set $B = N$.
+The diagonal case above corresponds to B = 1.
+Arbitrary sparsity patterns beyond block-diagonal structure are not currently supported.
 
-NOTE: currently, there is no support for sparse prior covariance matrix.
-We expect this to be supported in future versions of Stan.
diff --git a/src/stan-users-guide/gaussian-processes.qmd b/src/stan-users-guide/gaussian-processes.qmd
index 58f05db3b..ffbde0880 100644
--- a/src/stan-users-guide/gaussian-processes.qmd
+++ b/src/stan-users-guide/gaussian-processes.qmd
@@ -476,7 +476,19 @@ parameters {
   vector[N] eta;
 }
 model {
-  // ...
+  vector[N] f;
+  {
+    matrix[N, N] L_K;
+    matrix[N, N] K = gp_exp_quad_cov(x, alpha, rho);
+
+    // diagonal elements
+    for (n in 1:N) {
+      K[n, n] = K[n, n] + delta;
+    }
+
+    L_K = cholesky_decompose(K);
+    f = L_K * eta;
+  }
   rho ~ inv_gamma(5, 5);
   alpha ~ std_normal();
   a ~ std_normal();
@@ -510,7 +522,7 @@ the marginal likelihood as follows:
 $$
   \hat p_\mathcal{L}(y \mid \rho, \alpha, a)
     = \frac{p(f^* \mid \alpha, \rho) p(y \mid f^*, a)}{
-    \hat p_\mathcal{L}(f \mid \rho, \alpha, a, y)},
+    \hat p_\mathcal{L}(f^* \mid \rho, \alpha, a, y)},
 $$
 where $f^*$ is the mode of $p(f \mid \rho, \alpha, a, y)$, obtained via
 numerical optimization.
@@ -533,7 +545,24 @@ functions {
 }
 ```
 
-We then increment `target` in the model block with the approximation to
+The embedded Laplace relies on calculations of the log likelihood's Hessian,
+$\partial^2 \log p(y \mid f, a, \rho, \alpha) / \partial f^2$, and these
+calculations can be much faster when the Hessian is sparse. In particular,
+it is expected that the Hessian is block diagonal. In the `transformed data`
+block we can specify the block size of the Hessian.
+```stan
+transformed data {
+  int hessian_block_size = 1;
+}
+```
+For example, if $y_i$ depends only on $f_i$, then the Hessian of the log
+likelihood is diagonal and the block size is 1. On the other hand, if the
+Hessian is not sparse, then we set the hessian block size
+to $N$, where $N$ is the dimension of $f$. Currently, Stan does not check
+the block size of the Hessian and so the user is responsible for correctly
+specifying the block size.
+
+Finally, we increment `target` in the model block with the approximation to
 $\log p(y \mid \rho, \alpha, a)$.
 ```stan
 model {
@@ -541,7 +570,7 @@ model {
   alpha ~ std_normal();
   sigma ~ std_normal();
 
-  target += laplace_marginal(ll_function, (a, y),
+  target += laplace_marginal(ll_function, (a, y), hessian_block_size,
                              cov_function, (rho, alpha, x, N, delta));
 }
 ```
@@ -549,51 +578,56 @@ Notice that we do not need to construct $f$ explicitly, since it is
 marginalized out. Instead, we recover the GP function in `generated quantities`:
 ```stan
 generated quantities {
-  vector[N] f = laplace_latent_rng(ll_function, (a, y),
+  vector[N] f = laplace_latent_rng(ll_function, (a, y), hessian_block_size,
                                    cov_function, (rho, alpha, x, N, delta));
 }
 ```
 
 Users can set the control parameters of the embedded Laplace approximation,
 via `laplace_marginal_tol` and `laplace_latent_tol_rng`. When using these
-functions, the user must set *all* the control parameters.
+functions, the user must set *all* the control options and store them in a tuple.
+These control parameters mostly concern the numerical optimizer used to find
+the mode $f^*$ of $p(f \mid \rho, \alpha, a)$.
 ```stan
 transformed data {
-// ...
+  tuple(vector[N], real, int, int, int, int) laplace_ops;
+  laplace_ops.1 = rep_vector(0, N);  // starting point for Laplace optimizer
+  laplace_ops.2 = 1.49e-8;           // tolerance for optimizer
+  laplace_ops.3 = 500;               // maximum number of steps for optimizer.
+  laplace_ops.4 = 1;              // solver type being used.
+  laplace_ops.5 = 1000;           // max number of steps for linesearch.
+  laplace_ops.6 = 1;              // allow_fallback (1: TRUE, 0: FALSE)
+```
+If users want to depart from the defaults for only some of the control
+parameters, a tuple with the default values (as above) can be created with the
+helper callable `generate_laplace_options()`, and the specific control
+parameter can then be modified,
+```stan
+transformed data {
+  tuple(vector[N], real, int, int, int, int, int) laplace_ops =
+    generate_laplace_options(N);
 
-  vector[N] f_init = rep_vector(0, N);  // starting point for optimizer.
-  real tol = 1e-6;             // optimizer's tolerance for Laplace approx.
-  int max_num_steps = 1e3;     // maximum number of steps for optimizer.
-  int hessian_block_size = 1;  // when hessian of log likelihood is block
-                               // diagonal, size of block (here 1).
-  int solver = 1;              // which Newton optimizer to use; default is 1,
-                               // use 2 and 3 only for special cases.
-  max_steps_linesearch = 0;    // if >= 1, optimizer does a lineseach with
-                               // specified number of steps.
+  laplace_ops.2 = 1e-6; // make tolerance of the optimizer less strict.
 }
-
-// ...
-
+```
+The tuple `laplace_ops` is then passed is then passed to `laplace_marginal_tol`
+and `laplace_rng_tol`.
+```stan
 model {
 // ...
 
-  target += laplace_marginal(ll_function, (a, y),
-                             cov_function, (rho, alpha, x, N, delta),
-                             f_init, tol, max_num_steps, hessian_block_size,
-                             solver, max_steps_linesearch);
+  target += laplace_marginal_tol(ll_function, (a, y), hessian_block_size,
+                                 cov_function, (rho, alpha, x, N, delta),
+                                 laplace_ops);
 }
 
 generated quantities {
-  vector[N] f = laplace_latent_rng(ll_function, (a, y),
+  vector[N] f = laplace_latent_rng(ll_function, (a, y), hessian_block_size,
                                    cov_function, (rho, alpha, x, N, delta),
-                                   f_init, tol, max_num_steps,
-                                   hessian_block_size, solver,
-                                   max_steps_linesearch);
+                                   laplace_ops);
 }
 
 ```
-For details about the control parameters, see @Margossian:2023.
-
 
 Stan also provides support for a limited menu of built-in functions, including
 the Poisson distribution with a log link and and prior mean $m$. When using such
@@ -696,13 +730,19 @@ functions {
 
 // ...
 
+transformed data {
+  int hessian_block_size = 1;
+}
+
+// ...
+
 model {
-  target += laplace_marginal(ll_function, (a, z),
+  target += laplace_marginal(ll_function, (a, z), hessian_block_size,
                              cov_function, (rho, alpha, x, N, delta));
 }
 
 generated quantities {
-  vector[N] f = laplace_latent_rng(ll_function, (a, z),
+  vector[N] f = laplace_latent_rng(ll_function, (a, z), hessian_block_size,
                                    cov_function, (rho, alpha, x, N, delta));
 }
 ```