... utilization 10% and 12.3GB RAM Training after 40 minutes with 500 runs - first output. Process canceled manually. Checkpoint 500 is transferred to wormhole.app. Fourth attempt to further reduce save_steps to 200 Training after 1.05 minutes, checkpoint 200 generated Training after 12.05 minutes, ...