Dynamics (flake.dynamics)

Overdamped Langevin dynamics for a rigid cluster on a substrate.

Physics

Equations of motion:

\[\begin{split}\eta_t \frac{dx_\mathrm{cm}}{dt} &= F_x^\mathrm{sub}(x,y,\theta) + F_x^\mathrm{ext} + \xi_x(t) \\ \eta_t \frac{dy_\mathrm{cm}}{dt} &= F_y^\mathrm{sub}(x,y,\theta) + F_y^\mathrm{ext} + \xi_y(t) \\ \eta_r \frac{d\theta}{dt} &= \tau^\mathrm{sub}(x,y,\theta) + \tau^\mathrm{ext} + \xi_r(t)\end{split}\]

where \(\eta_t = \eta N\) and \(\eta_r = \eta \sum_i r_i^2\).

Fluctuation-dissipation theorem:

\[\langle \xi_i(t)^2 \rangle = 2 k_B T \, \eta_i \quad \text{(variance per unit time)}\]

Integration

Euler-Maruyama step:

\[\delta_i = \frac{F_i^\mathrm{total}}{\eta_i} \, dt + \sqrt{\frac{2 k_B T}{\eta_i \, dt}} \; \xi_i \, dt, \quad \xi_i \sim \mathcal{N}(0,1)\]

The displacement variance is \(\langle \delta_i^2 \rangle = 2 k_B T / \eta_i \cdot dt = 2 D_i \, dt\) (correct FDT).

At \(k_B T = 0\) the noise term vanishes and this reduces to explicit Euler gradient descent. The step size dt is fixed; check convergence by halving it. Avoid \(k_B T = 0\) exactly: a deterministic trajectory at a saddle point is sensitive to floating-point rounding. Use kBT=1e-8 (for epsilon=1) instead.

JIT path vs Python path

When stop_fn and output_fn are both None, the inner EM loop runs entirely in Numba JIT via _md_loop_njit — no Python/JIT boundary per step. calc_en_f must have been produced by substrate_from_params, which attaches _jit_core and _jit_params to the closure.

When stop_fn or output_fn is provided, the Python loop path is used instead. This is ~6–11× slower but supports arbitrary Python callbacks.

Public API

run_md – Euler-Maruyama integrator for all \(k_B T \geq 0\).

flake.dynamics.run_md(pos, calc_en_f, en_params, eta, Fx=0.0, Fy=0.0, Tau=0.0, kBT=0.0, dt=0.0001, n_steps=10000, theta0=0.0, pos_cm0=None, print_every=100, stop_fn=None, output_fn=None, seed=12345)

Run overdamped Langevin MD for a rigid cluster on a substrate.

Uses the Euler-Maruyama integrator for all kBT >= 0. At kBT=0 the noise term vanishes and the step reduces to explicit Euler gradient descent.

Every print_every steps a state snapshot is emitted. If output_fn is provided it is called with (step, t, state_dict) and run_md returns None; otherwise snapshots accumulate in memory and are returned as a dict of arrays.

Performance note:

When stop_fn and output_fn are both None, the inner loop runs in Numba JIT (_md_loop_njit) with no Python overhead per step. When either callback is provided, the slower Python loop runs instead. For maximum throughput, avoid callbacks and post-process the returned trajectory dict.

Args:

pos: (N, 2) ndarray – cluster positions in the cluster frame (CM at origin). calc_en_f: callable – substrate energy function from substrate_from_params.

Must expose _jit_core and _jit_params attributes.

en_params: list – extra arguments for calc_en_f (usually []). eta: float – single-particle drag coefficient. Fx: float – constant external force along x (default 0). Fy: float – constant external force along y (default 0). Tau: float – constant external torque (default 0). kBT: float – thermal energy; 0 for a T=0 run. dt: float – timestep. n_steps: int – number of MD steps. theta0: float – initial orientation in degrees. pos_cm0: (2,) array-like – initial CM position (default (0, 0)). print_every: int – emit a snapshot every this many steps. stop_fn: callable or None – stop_fn(step, state_dict) -> bool;

forces Python loop path.

output_fn: callable or None – output_fn(step, t, state_dict) -> None;

forces Python loop; run_md returns None.

seed: int – RNG seed.

Returns:

None if output_fn is provided. Otherwise a dict with keys 't' (time), 'pos_cm' (CM position), 'theta' (degrees), 'energy', 'force', 'torque', 'vel_cm', 'omega'.

Raises:

ValueError: if eta_r == 0 and (Tau != 0 or kBT > 0). NotImplementedError: if calc_en_f lacks _jit_core/_jit_params

and no callbacks are provided.

Sweep infrastructure (flake.sweep)

Parameter sweeps over MD simulations for rigid-cluster systems.

This module is the time-dependent analogue of maps.py: instead of scanning a static energy landscape, it runs one full MD trajectory per parameter point and collects observables.

Public API

sweep_md – run one MD simulation per point in a sweep spec. grid_sweep – Cartesian-product sweep spec from axis ranges. line_sweep – explicit point-by-point sweep spec from a column table. force_sweep – sweep force magnitude at a fixed angle. concat_sweeps – merge multiple specs, removing duplicates. last_state – post_fn factory: extract last snapshot. mean_velocity – post_fn factory: mean |v_cm| over a tail window. drift_velocity – post_fn factory: net CM displacement / total time. drift_omega – post_fn factory: net angular displacement / total time. load_sweep – reload all runs from a saved sweep directory. load_sweep_points – load only the runs matching a list of grid points. filter_sweep – remove None-result entries from load_sweep output.

I/O convention

Each run writes two files:

outdir/run_NNNN/traj.h5 – trajectory arrays (HDF5 via h5py) outdir/run_NNNN/params.yaml – run parameters (YAML via PyYAML)

Neither h5py nor yaml is imported at module level; both are guarded inside the functions that use them.

flake.sweep.concat_sweeps(*specs)

Concatenate multiple sweep specs, removing duplicates.

Duplicates are detected by dict content (all key-value pairs equal). Order is preserved; later occurrences of a duplicate are dropped.

Args:

specs: sweep spec lists (output of grid_sweep, line_sweep, etc.).

Returns:

list of dict.

Example:

spec = concat_sweeps(
    force_sweep(np.linspace(0, 0.3, 10), phi_deg=0.0),
    force_sweep(np.linspace(0, 0.3, 10), phi_deg=30.0),
)
flake.sweep.drift_omega()

Return a post_fn that computes the mean angular drift rate.

Defined as (theta_final - theta_initial) / (t_final - t_initial), in degrees per time unit. Zero when the cluster is rotationally pinned.

Returns:

callable post_fn(traj_dict, run_params) -> float.

Example:

results = sweep_md(…, post_fn=drift_omega()) omega_drift = np.array([r[‘result’] for r in results])

flake.sweep.drift_velocity()

Return a post_fn that computes the 2D drift velocity of the CM.

Defined as (pos_cm_final - pos_cm_initial) / (t_final - t_initial). This is the physically meaningful quantity for translational depinning: zero vector when the cluster is pinned, nonzero above threshold.

Returns:

callable post_fn(traj_dict, run_params) -> (2,) float64 ndarray.

Example:

results = sweep_md(…, post_fn=drift_velocity()) vdrift = np.array([r[‘result’] for r in results]) # shape (n_runs, 2)

flake.sweep.filter_sweep(loaded, warn_fraction=0.1)

Remove None-result entries from load_sweep output.

Emits a WARNING if the fraction of None entries exceeds warn_fraction (default 10%).

Args:

loaded: list from load_sweep. warn_fraction: float – threshold for the warning.

Returns:

list with None-result entries removed.

flake.sweep.force_sweep(F_vals, phi_deg)

Sweep the external force magnitude at a fixed angle.

Decomposes F into Fx = F * cos(phi), Fy = F * sin(phi).

Args:

F_vals: array-like – force magnitudes to sweep. phi_deg: float – force direction in degrees from x-axis.

Returns:

list of dict with keys ‘Fx’ and ‘Fy’.

Example:

spec = force_sweep(np.linspace(0, 0.5, 20), phi_deg=30.0)

flake.sweep.grid_sweep(axes)

Cartesian product of parameter ranges.

Args:
axes: dict mapping parameter name -> list or array of values.

Example: {‘Fx’: [0.0, 0.1, 0.2], ‘Tau’: [0.0, 1.0]} gives 6 points covering all (Fx, Tau) combinations.

Returns:

list of dict, one per grid point.

Example:
spec = grid_sweep({‘Fx’: np.linspace(0, 0.5, 10),

‘Tau’: [0.0, 1.0]})

flake.sweep.last_state(keys=None)

Return a post_fn that extracts the last snapshot from the trajectory.

Args:
keys: list of str or None – which traj_dict keys to extract.

If None, extracts: ‘pos_cm’, ‘theta’, ‘energy’, ‘vel_cm’, ‘omega’.

Returns:

callable post_fn(traj_dict, run_params) -> dict of last values.

Example:

results = sweep_md(…, post_fn=last_state([‘pos_cm’, ‘theta’]))

flake.sweep.line_sweep(table)

Explicit list of points from a column table.

Each row is one run; each key is one column. Equivalent to the old ‘setf’ file approach: load a text file, pass the columns as arrays.

Args:
table: dict mapping parameter name -> list or array of values.

All lists must have the same length. Example: {‘Tau’: [0, 1, 2], ‘Fx’: [0, 0, 0]} gives 3 points.

Returns:

list of dict, one per row.

Raises:

ValueError: if lists have different lengths.

Example:

data = np.loadtxt(‘setf.dat’) # columns: Tau, Fx, Fy spec = line_sweep({‘Tau’: data[:,0], ‘Fx’: data[:,1], ‘Fy’: data[:,2]})

flake.sweep.load_sweep(outdir)

Load all runs from a sweep output directory.

Scans for run_NNNN/ subdirectories in numeric order. Every directory that matches the pattern produces one entry in the returned list – runs are never silently dropped. Missing or incomplete runs produce an entry with result=None and a WARNING; the caller is responsible for filtering (see filter_sweep).

Args:

outdir: str – directory written by sweep_md with save=True.

Returns:

list of dict, one per run_NNNN directory found. Each dict has keys 'params' (run parameters or None), 'result' (full traj dict or None), and 'run_dir'.

Example:

raw     = load_sweep('output/my_sweep')
results = filter_sweep(raw)
for r in results:
    print(r['params']['Fx'], r['result']['energy'][-1])
flake.sweep.load_sweep_points(outdir, grid_points, rtol=1e-05, atol=0.0)

Load only the runs whose parameters match a given list of grid points.

Each grid point is a dict of key-value pairs that must match the params.yaml of a run. A run matches a grid point when every key in the grid point is present in run_params and the values agree within the specified tolerance (np.isclose for floats, exact equality for everything else).

The function reads only params.yaml for non-matching runs (cheap); traj.h5 is loaded only for the runs that pass the filter.

Args:

outdir: str – sweep output directory (same as passed to sweep_md). grid_points: list of dict – each dict is one requested point.

Examples:

[{‘Fx’: 32.32}] – single Fx [{‘Fx’: 0.1, ‘Tau’: 5000}] – joint point [{‘Fx’: v} for v in [10, 20, 30]] – list of Fx grid_sweep({‘Fx’: Fx_vals, ‘Tau’: Tau_vals}) – 2D grid

rtol: float – relative tolerance for float comparisons

(np.isclose). Default 1e-5.

atol: float – absolute tolerance for float comparisons.

Default 0.0 (pure relative).

Returns:
list of dict, one per matched run, each with keys:

‘params’: dict of run parameters from params.yaml. ‘result’: full traj dict, or None if traj.h5 absent. ‘run_dir’: absolute path to the run directory.

Returned in disk order (sorted by run_NNNN index).

Raises:

ValueError: if grid_points is empty.

Example:

# Load only Fx in [10, 20, 30] from a force sweep: runs = load_sweep_points(

‘sweep_Fx_out’, [{‘Fx’: v} for v in [10.0, 20.0, 30.0]],

) for r in runs:

print(r[‘params’][‘Fx’], r[‘result’][‘pos_cm’][-1])

# Load a 2D grid subset using grid_sweep: runs = load_sweep_points(

‘sweep_Fx_tau_out’, grid_sweep({‘Fx’: [10.0, 20.0], ‘Tau’: [1000.0, 2000.0]}),

)

flake.sweep.mean_velocity(fraction=0.2)

Return a post_fn that computes mean CM speed over the last fraction of the trajectory.

Useful for pinned-vs-sliding detection: a pinned cluster has mean speed ~ 0; a sliding cluster has mean speed > 0.

Args:

fraction: float in (0, 1] – use last (fraction * n_snapshots) steps.

Returns:

callable post_fn(traj_dict, run_params) -> float (mean of |vel_cm| over the window).

Example:

results = sweep_md(…, post_fn=mean_velocity(fraction=0.5))

flake.sweep.sweep_md(pos, calc_en_f, en_params, sweep_spec, base_md_kwargs=None, post_fn=None, n_jobs=1, backend='loky', outdir='.', save=True, save_traj=True, overwrite=False, verbose=True)

Run one MD simulation per point in sweep_spec.

Each point in sweep_spec is a dict of keyword arguments that override base_md_kwargs for that run. Keys absent from the point dict keep the base_md_kwargs value. Keys not in run_md’s signature raise ValueError at construction time (not at run time).

All runs are independent (same starting state unless pos_cm0 is in the sweep spec). For adiabatic continuation (each run starts where the previous ended), chain run_md calls manually.

Args:

pos: (N, 2) ndarray – cluster positions (reference frame). calc_en_f: callable – substrate energy function. en_params: list – extra arguments for calc_en_f. sweep_spec: list of dict – one dict per run point. Each dict

overrides base_md_kwargs for that run. Duplicate points (same dict content) are silently removed after a warning.

base_md_kwargs: dict or None – default keyword arguments for run_md.

If None, run_md defaults are used. Do NOT include pos, calc_en_f, en_params here.

post_fn: callable or None
– post_fn(traj_dict, run_params) -> dict or scalar.

Called after each run. If None, the full trajectory is stored in ‘result’.

n_jobs: int – joblib parallel workers. Do NOT set n_jobs > 1

if run_md itself already uses parallelism. Default 1 (serial).

backend: str – joblib backend. Default ‘loky’ (spawn-based

separate processes, no GIL). Each worker recompiles @njit functions (~1-2 s overhead), so speedup requires runs of at least ~30 s each to amortise that cost. ‘threading’ does NOT help here: the EM loop has enough Python-level work per step (numpy ops, RNG) that threads serialize on the GIL and may be slower than serial.

outdir: str – directory for per-run output. Created if absent. save: bool – if True, write per-run output to outdir.

Each run writes:

outdir/run_NNNN/traj.h5 (if save_traj=True) outdir/run_NNNN/params.yaml

save_traj: bool – if True (default), write the full trajectory

to traj.h5 for each run. Set False when post_fn already extracts all needed information, to avoid storing gigabytes of unused data. params.yaml is always written.

overwrite: bool – if True, re-run and overwrite existing traj.h5

files (disables resume). Default False.

verbose: bool – if True (default), report progress. Uses tqdm

if installed (works in TTY and batch logs alike); otherwise prints one line per run to stdout. Set False to suppress all progress output.

Returns:

list of dict, one per run point (in the same order as sweep_spec after deduplication). Each dict contains:

‘params’: the full run_md kwargs used for this run. ‘result’: post_fn output if post_fn given, else the full traj dict. ‘run_dir’: path to the run directory (if save=True, else None).

Raises:
ValueError: if any key in sweep_spec or base_md_kwargs is not a valid

run_md keyword argument.

ImportError: if n_jobs != 1 and joblib is not installed.

Example:

spec = grid_sweep({‘Fx’: np.linspace(0, 0.5, 10), ‘kBT’: [1e-8]}) results = sweep_md(pos, calc_en_f, en_params, spec,

base_md_kwargs={‘eta’: 1.0, ‘n_steps’: 50000}, post_fn=drift_velocity(), n_jobs=4, backend=’threading’, outdir=’sweep_out’)