Seamless High-Resolution Terrain Reconstruction: A Prior-Based Vision Transformer Approach

Osher Refaeli1,*   Tal Svoray1   Ariel Nahlieli1

1Ben-Gurion University of the Negev

*Corresponding author: osher@bgu.ac.il

Abstract

High-resolution elevation data is essential for hydrological modeling, hazard assessment, and environmental monitoring; however, globally consistent, fine-scale Digital Elevation Models (DEMs) remain unavailable. Very high-resolution single-view imagery enables the extraction of topographic information at the pixel level, allowing the reconstruction of fine terrain details over large spatial extents. In this paper, we present single-view-based DEM reconstruction shown to support practical analysis in GIS environments across multiple sub-national jurisdictions. Specifically, we produce high-resolution DEMs for large-scale basins, representing a substantial improvement over the 30 m resolution of globally available Shuttle Radar Topography Mission (SRTM) data. The DEMs are generated using a prior-based monocular depth foundation (MDE) model, extended in this work to the remote sensing height domain for high-resolution, globally consistent elevation reconstruction. We fine-tune the model by integrating low-resolution SRTM data as a global prior with high-resolution RGB imagery from the National Agriculture Imagery Program (NAIP), producing DEMs with near LiDAR-level accuracy. Our method achieves a 100x resolution enhancement (from 30 m to 30 cm), exceeding existing super-resolution approaches by an order of magnitude. Across two diverse landscapes, the model generalizes robustly, resolving fine-scale terrain features with a mean absolute error of less than 5 m relative to LiDAR and improving upon SRTM by up to 18 %. Hydrological analyses at both catchment and hillslope scales confirm the method's utility for hazard assessment and environmental monitoring, demonstrating improved streamflow representation and catchment delineation. Finally, we demonstrate the scalability of the framework by applying it across large geographic regions.

Key Features

100 × Resolution Enhancement

We enhance the spatial resolution of predicted DEMs by a factor of 100, from 30 m to 30 cm, surpassing previous attempts by an order of magnitude.

Global Prompting

We leverage freely available SRTM DEMs as absolute-height prompts, ensuring a globally consistent elevation context.

Seamless Terrain Products

We blend patch-wise Vision Transformer predictions into seamless mosaics that are ready for slope, aspect, and flow-routing analyses.

Resource-Efficient

Processing ≈ 150 km² h-1 on a single GPU; and achieving up to an 18% improvement in vertical accuracy compared with the original SRTM dataset.

Visual Preview

Urban

Urban RGB (1120×1120)

RGB

Urban Elevation (1120×1220)

Elevation

Urban Aspect (1120×1220)

Aspect

Urban Hillshade (1120×1220)

Hillshade

Urban Slope (1120×1220)

Slope

Vegetated

Vegetated RGB (1120×1120)

RGB

Vegetated Elevation (1120×1220)

Elevation

Vegetated Aspect (1120×1220)

Aspect

Vegetated Hillshade (1120×1220)

Hillshade

Vegetated Slope (1120×1220)

Slope

Bare

Bare RGB (1120×1120)

RGB

Bare Elevation (1120×1220)

Elevation

Bare Aspect (1120×1220)

Aspect

Bare Hillshade (1120×1220)

Hillshade

Bare Slope (1120×1220)

Slope

Acknowledgment

We thank the Ministry of Agriculture Chief Scientist (grant 16-17-0005, 2022) and the Negev Scholarship of the Kreitman School, Ben-Gurion University of the Negev, for supporting Osher Rafaeli’s PhD studies.

Cite Us


@misc{rafaeli2025prompt2demhighresolutiondemsurban,
      title={Prompt2DEM: High-Resolution DEMs for Urban and Open Environments from Global Prompts Using a Monocular Foundation Model}, 
      author={Osher Rafaeli and Tal Svoray and Ariel Nahlieli},
      year={2025},
      eprint={2507.09681},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09681}, 
}
    

This page was built using the Academic Project Page Template, which was adopted from the Nerfies project page.