LECTURE: Building Co-located GPU Rigs Powered by Sustainable Renewable Energy & Battery Storage

PhilipSmithLawrence 0 views 7 slides Oct 01, 2025
Slide 1
Slide 1 of 7
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7

About This Presentation

Building Co-located GPU Rigs Powered by Sustainable Renewable Energy & Battery Storage.

Sixty GPU rigs humming beneath a solar canopy and a two‑megawatt‑hour battery bank redefine what 'always on' compute looks like.

For more information, please contact Philip Smith-Lawrence


Slide Content

Building Co-located GPU Rigs Powered by
Sustainable Renewable Energy & Battery
Storage






Sixty GPU rigs humming beneath a solar canopy and a two‑megawatt‑hour battery bank redefine
what 'always on' compute looks like. Picture the hum — faint, steady — like a beehive of silicon.
Now imagine that hum sustained not by a carbon bill but by sun and wind and stored electrons.
That image is useful because it forces a practical question: how do we translate the romance of
renewables into the very unromantic requirements of rack power, thermal limits, and job
scheduling? Today we unpack that translation: the hardware that makes a GPU rig actually
compute, the spatial and network choreography of co‑location at scale, the renewable generation
and grid interface that furnish electrons, the battery and UPS architecture that keep those
electrons flowing when clouds gather, and the monitoring and operational practices that turn all
these moving parts from brittle prototypes into a resilient, efficient system.

Start with the building block: the GPU rig itself. A GPU rig for heavy compute is not a gaming
box with a flashy case and RGB lights; it is a purpose‑built tool. At its heart are the GPUs —
parallel processors optimized for matrix math and large‑scale floating point work. When I say
GPUs here, I'm thinking in terms of datacenter‑class cards or high‑end consumer models tuned
for throughput: high memory capacity, sustained FP32/FP16 performance, and robust cooling
tolerances. Around those GPUs sit the supporting cast: a CPU that coordinates I/O and drivers,
a motherboard that provides PCIe lanes and power delivery, system RAM sized for dataset
staging, and a storage subsystem for local scratch and caching. Each choice carries tradeoffs.
More PCIe lanes let you populate more cards per motherboard; more RAM reduces paging to
disk; faster NVMe lowers I/O stalls — but all of that increases power draw and heat output.

Take a concrete assembly scenario. Suppose we choose a four‑GPU node using high‑memory
GPUs. The motherboard must expose at least x16/x8 PCIe slots with reliable bifurcation. The
CPU should have enough PCIe lanes and I/O capacity — many server‑class CPUs, from both
AMD’s EPYC line and Intel’s Xeon families, advertise high lane counts for exactly this reason. If
you assemble four GPUs that each draw, say, 300–350 watts under load, the chassis and power
delivery must accommodate a sustained 1.2–1.4 kW per node. That means high‑efficiency PSUs,
redundant rails if you want fault tolerance, and thoughtful distribution of power cables to PDUs
to avoid single‑point overloads. Cooling becomes central: air cooling can work if airflow is
engineered — ducting, directed fans, blanking panels — but as density increases, liquid cooling
often becomes more efficient. Liquid systems reduce GPU junction temperatures and can lead to
more predictable sustained performance under heavy workloads. That’s why in denser racks we
often see custom water loops or rear‑door heat exchangers. But water systems introduce
plumbing, leak mitigation, and maintenance regimes. So the assembly decision — air versus
liquid — can’t be made purely on thermal grounds; it must factor in operational capacity for
maintenance, spare parts, and site personnel expertise.

Now layer in a short case: a rig assembled around an AMD Ryzen workstation CPU, a
server‑class motherboard with four full‑length PCIe slots, 128 GB of DDR4 RAM, and four GPUs
each rated for roughly 320 W sustained draw. Power is delivered by dual 80 PLUS Platinum
PSUs configured for redundancy. During sustained ML training runs we observe GPU junction
temps that creep toward thresholds; fans spin up, noise rises, and if we were in a standard
air‑cooled rack, performance could throttle. Installing a closed‑loop liquid cooling kit that
transfers heat to a rear radiator dropped GPUs by 10–15°C under identical loads, allowing
uninterrupted throughput at rated frequencies. Bottlenecks shifted: CPU‑side I/O became more
relevant, prompting an NVMe‑over‑fabric approach for scratch storage. The lesson: improving
one subsystem often moves the bottleneck elsewhere; system design is iterative.

Potential counterarguments: Isn’t it cheaper to use many single‑GPU cheap rigs? Not always.
Small rigs increase per‑unit overheads — more motherboards, PSUs, and network ports — and
they fragment maintenance. Conversely, very high density can increase cooling and power
distribution complexity. The practical sweet spot balances density, maintainability, and power
efficiency. Also, specific GPU models matter: pick cards with robust BIOS and thermal telemetry
support; that telemetry is gold when integrating monitoring systems.

We slide naturally from single‑node considerations to rack‑scale choreography. Co‑located
facilities demand planning for space, power, and connectivity. Sixty rigs mean tens of kilowatts
per rack and hundreds of kilowatts facility wide. Rack layout is not random; airflow, cable
routing, weight distribution, and PDU placement all matter. For instance: if each rig averages 1.4
kW, ten rigs per 42U rack yield 14 kW rack load. That affects floor loading, cooling capacity, and
PDU selection. PDUs must support current draws with headroom, provide per‑outlet metering,
and ideally support remote switching for per‑node power cycling. Cable management isn’t
decorative — poor practice leads to airflow obstruction and makes hot aisle/cold aisle
containment ineffective.

Networking at this scale becomes a live constraint. Compute clusters that share datasets or run
distributed training need high throughput and low latency. That typically means deploying
top‑of‑rack switches with high‑capacity uplinks — multiple 100GbE ports aggregated, fiber
trunks between racks, and spine‑leaf architectures for predictable latency. There’s a temptation
to oversubscribe uplinks to save cost. But for tightly coupled parallel jobs, oversubscription can
turn into idle GPUs waiting for gradients. So you must align network topology with expected
workloads. If most jobs are embarrassingly parallel, you can tolerate higher oversubscription; if
you often run distributed model training, design for lower oversubscription and more direct
links.


Imagine a deployment: six racks, each with 10 rigs, connected via a leaf switch that uplinks to a
spine fabric offering multiple 100GbE channels. Fiber handles trunking to avoid signal
degradation and electromagnetic noise. The network team configures QoS to prioritize control
plane and parameter server traffic, while monitoring tools detect microbursts that can cause
packet loss. When a distributed job stalls, the first skills you need are not software but network
instrumentation and an understanding of how RDMA or IP‑based transfers are performing.
That sharpens a recurring point: compute, network, and storage are a triad; performance
problems often cross boundaries.

This naturally asks: where do the electrons come from? Renewable integration begins with a
realistic accounting of demand and resource variability. Solar and wind are intermittent; their
capacity factors differ by location and technology. For a compute farm consuming, for example,
840 kW steady (sixty rigs at 14 kW per rack aggregate — we’re simplifying to keep numbers
tractable), designing on‑site generation requires an honest view of average output versus peak
draw. On a sunny day, a large photovoltaic array might exceed instantaneous demand, but at
night or during calm weather you’ll need supplemental sources: the grid, curtailment
agreements, or storage.

Designing on‑site generation involves capacity sizing and realistic yield modeling. A
one‑megawatt array does not mean one megawatt continuously. Photovoltaic arrays
produce at their rated capacity under specific irradiance and temperature conditions.
Capacity factors — the average percentage of rated output over time — often fall in the
10–25% range for solar in many climates and higher for wind in suitable sites. That is
why planners use a combination: solar for predictable daytime production and wind to
capture complementary periods. Grid interconnection plays a role too: net metering or
export limits shape whether excess generation is sold back or curtailed. Interconnection
agreements can be complex: utilities require studies to ensure stability, and sometimes
additional equipment like inverters with ride‑through capabilities and reactive power
control may be required.

A practical example: suppose you target 840 kW average consumption. If your site has
good solar insolation, a 1 MWp PV array coupled with a capacity factor that yields, on
average, around 250–400 kW over 24 hours might be realistic — but not enough to
cover continuous demand alone. You add wind with a turbine sized to capture evening
and night winds, and you accept that grid imports will be necessary during low
renewable periods. That’s fine — the goal is to minimize grid energy and carbon
intensity, not necessarily to be islanded forever. However, if the objective is continuous
operation during grid failures, you need storage sized for desired outage duration.

Which brings us to battery storage and UPS design. Batteries smooth intermittency,
provide ride‑through, and can shave peak imports. For compute operations you choose
robust lithium‑ion chemistries with an appropriate battery management system. Two
critical sizing questions: how long do you need to ride out outages, and what
depth‑of‑discharge delivers acceptable cycle life? Suppose the requirement is to keep
sixty rigs operating for two hours at full load during an outage. If the facility draws 840
kW, two hours implies 1.68 MWh of usable energy. To limit depth of discharge to 80%
for lifecycle concerns, you would select a nominal bank of around 2.1 MWh — roughly
what a 2 MWh design might imply when rounded and accounting for inverter losses. In
practice, designers also incorporate a reserve buffer for control margin, BMS headroom,
and inverter efficiency (often 95–98% for modern systems).

Integration requires more than raw energy. The UPS must transition seamlessly
between grid and battery, or better, allow the batteries to supply power without
interruption. Large‑scale systems use bi‑directional inverters tied to both the battery
and the facility bus, often organized in modular strings. The BMS monitors cell voltages,
temperatures, state of charge, and flags balancing needs. Thermal management for
batteries is itself a system concern: batteries prefer controlled, moderate temperatures;
extreme heat accelerates degradation, and cold reduces effective capacity. So battery
rooms often have dedicated HVAC and fire suppression systems with designs that are
compliant with local regulations. Safety cannot be an afterthought.

Consider a detailed sequence when the grid trips. The grid disconnects; the UPS detects
the drop, and in sub‑cycle timescales a static transfer switch or inverter bridge isolates
the external grid and energizes the facility bus with battery inverters. Loads continue.
The BMS sequences battery modules to avoid overcurrent, while the site management
system reduces nonessential loads — maybe throttling noncritical GPUs or delaying
scheduled jobs. That coordinated shedding extends battery life and ensures critical
operations continue. Some operations prefer graceful degradation: reduce GPU clocks
or shift tasks to nodes with greater headroom, rather than abrupt shutdowns. Those
strategies buy time and protect long‑running jobs from catastrophic failure.

Edge cases and complications: Batteries carry failure modes — cell imbalance, thermal
runaway, and aging. Routine capacity testing, cell replacement plans, and tracking of
cycle counts are essential. Also, when you discharge batteries frequently to shave peaks,
you accelerate degradation. So financial modeling must include replacement schedules
tied to expected DoD and cycle life. And regulatory landscapes matter: transportation
constraints for spent batteries, recycling rules, and site permitting for energy systems all
influence project feasibility.

We’ve assembled hardware, power, and storage. The final layer is monitoring,
maintenance, and efficiency orchestration — the software and human practices that
keep the whole ensemble healthy. Real‑time telemetry is non‑negotiable: GPU power
draw, device temperatures, PDU outlet currents, rack inlet/outlet temperatures, battery
state of charge, inverter status, and network statistics all need to stream into a
centralized platform. That platform should support alerting thresholds, historical
trending, anomaly detection, and automated responses: if a rack begins to exceed
targeted inlet temp, the system can increase chiller setpoints, throttle noncritical
workloads, or reassign jobs.

Think of a scenario: the platform flags an unusual pattern — several rigs show
concurrent increases in power draw and GPU temperature, but only in one rack. The
immediate candidates are airflow obstruction, a failing fan tray, or a local cooling loop
loss. The monitoring system correlates PDU outlet currents and detects a small but
rising imbalance. The site operator inspects and finds a misrouted cable obstructing
flow to the front of the rack — fixed in minutes, avoided downtime. This is mundane, but
pervasive: many outages trace to small physical missteps amplified by load.

Operational strategies include workload scheduling aligned to renewable generation. If
you can schedule energy‑intensive training runs during peak solar output, you reduce
net imports and battery cycling. Dynamic load balancing can migrate jobs to racks
where renewable headroom exists or where batteries are being replenished. That
requires orchestration across the cluster manager, job scheduler, and energy
management system. The scheduler must be energy‑aware, tagging jobs with flexibility
attributes: is this batch work delay‑tolerant, or latency‑sensitive? Energy‑aware
scheduling might delay non‑critical jobs during low‑renewable periods or preferentially
run them when on‑site generation is abundant.

Maintenance regimes deserve attention. Hardware replacement windows, fan and filter
changes, PSU capacitors aging tests, battery capacity verifications, and inverter
firmware updates must be scheduled without endangering computing SLAs.
Redundancy helps: N+1 power topology, hot‑swap capable nodes, and spare GPU
inventory all reduce the risk that a single failure cascades. Regular drills for grid outage
response ensure that automatic systems behave as intended. Too many groups treat
resilience as theory until a real event reveals gaps; rehearsals expose those gaps in
controlled conditions.

Now, let’s synthesize some cross‑cutting lessons. First: every systems decision ripples.
Opting for denser racks affects cooling, which affects power, which affects battery sizing.
Second: build instrumentation before you need it. Telemetry is cheap relative to
unplanned downtime. Third: design for graceful degradation; it is far easier to throttle
than to restore state after a crash. Fourth: align economic models with equipment
lifecycle realities — battery replacements, GPU refresh cycles, and inverter maintenance
must be in cost analyses. Finally: human processes matter. A brilliant design fails
without trained staff, clear procedures, and a culture that treats small anomalies
seriously.

Before wrapping up, consider three compact, realistic thought experiments. First: If you
could increase on‑site generation by 25% at a reasonable cost, would you do it? The
decision rests on how often that extra capacity would reduce battery cycling or grid
import costs, and on interconnection limits. Second: If a critical job cannot be paused
and the grid fails for six hours, do you design for that worst case? Often the right answer
is a hybrid: allocate a subset of capacity to critical, always‑on services with dedicated
batteries sized for long ride‑through, while letting less critical batch jobs be preemptible.
Third: If a regulator restricts export to the grid during peak production, how do you
value excess during midday? You may pivot to shifting workloads or adding inexpensive
demand‑response loads that can soak up surplus generation rather than curtailing it.

One practical checklist to carry forward — not as a set of steps, but as a mindset:
quantify steady and peak demand; choose node architecture with an eye to
maintainability; design rack power and cooling with realistic thermal models; size
generation using capacity factors and site profiles; pick battery capacity with DoD and
lifecycle in mind; instrument extensively; and run operational rehearsals. Each element
must be defensible: you should be able to justify a PDU rating, a cooling margin, or a
battery size with data, not a hunch.

Alright- final thoughts. Engineering a co-located GPU farm that leans on renewables is
an exercise in systems thinking. It’s about balancing physics (heat and electrons),
economics (CAPEX and OPEX), and operations (maintenance and personnel). It's
messy in the best possible way: trade-offs everywhere, no silver bullet. If you approach it
like modular puzzles - define interfaces, instrument those interfaces, and plan for
graceful failure - you get a facility that is not merely sustainable, but resilient and cost
effective in practice. The hum from racks then becomes quite literally, a well kept
promise: a compute that respects both performance and planet.

For more information please contact Philip Smith-Lawrence [email protected]