Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models

Jihyun Kim1, Junho Park1, Kyeongbo Kong2, Sukju Kang1,
1Sogang University, 2Pusan National University
Main Image

Programmable-Room interactively creates and edits textured 3D meshes given user-specified language instructions. Using pre-defined modules, it translates the instruction into python codes which is executed in an order. The output of the last line matches the desired final output.

"Generate a fully furnished living room whose width, length, height are each 4.5m, 3.3m, 2m. The walls are covered with dark grey rectangular tiles, and there are two windows and a door on the walls."

"Generate a living room whose width and length are each 5.1m, 3.8m. In the room, there 2 windows and 1 door. The pattern of the walls are floral, and the overall color is baby pink."

"Generate a bedroom whose width and length are each 4.4m and 4.2m. In the room, there are two windows and a door. The walls are painted in white, and the floor is wooden."

Abstract

Programmable-Room interprets user-provided descriptions to create plausible 3D coordinates for room meshes, to generate panorama images for the texture, to construct 3D meshes by integrating the coordinates and panorama texture images, and to arrange furniture, allowing users to specify single or combined actions as needed. Inspired by visual programming (VP), Programmable-Room utilizes a large language model (LLM) to write a python program which is an ordered list of necessary modules for the various tasks given in natural language.

We developed most of the modules. For the texture generating module, we utilize a pretrained large-scale diffusion model to generate panorama images conditioned on text and visual prompts (i.e., layout, depth, and semantic map) simultaneously. Specifically, we accelerate the performance of panorama image generation by optimizing the training objective with 1D representation of panorama scene obtained from bidirectional LSTM.

3D Room Generation

Programmable-Room can create a textured and fully furnished 3D room mesh from a text instruction (Generating an empty room is also possible). Users can specify the room shape and size; the texture of the ceiling, walls, and floor; and furniture. When Generating a textured and furnished 3D room mesh, Programmable-Room uses an LLM to generate python codes like below. The output of each line is visualized along with the corresponding code.

Process Image

In summary, three images (layout image, depth map, semantic map) are generated which follow the user-specified room shape. Then, the three images are used as visual prompts for generating a panorama texture image, which is then folded into an empty room mesh. After allocating appropriate furniture according to the room type, we gain the textured and fully furnished 3D room mesh.

Whereas the state-of-the-art method generates structurally unrealsitic room meshes with repetitive furniture, Programmable-Room creates structurally plausible room meshes whose shape and texture match the instruction.

Mesh Comparison Image

3D Room Editing

For editing tasks, Programmable-Room loads the last room elements, generates new elements according to the new insturction, and combines the new elements with the loaded ones. This process is repeated until the user is satisfied. Currently, we support room shape editing, room texture editing, and furniture editing (remove, add, replace).

Editing Image

Modules

We developed modules for generating and editing room meshes. Users can easily add a new module along with few usage examples for in-context-learning of the LLM, in order to conduct more diverse tasks.

Module List Image

One of the most import modules is GenTexture since there is no existing method to generate a panorama image of an empty room according to specific texture and, most importantly, shape. For this new task, we developed Panorama Room Image Generation (PRIG), a diffusion-based model that generates a panorama room texture image from text and multiple visual prompts.

PRIG Image

In comparison with the state-of-the-art methods, PRIG generates images which better reflect the texture information. Moreover, PRIG generates more structurally coherent images. For example, unlike those of the existing methods, in the images generated by PRIG, the left and the right sides are continuous

PRIG Comparison Image

Diverse Shapes

One of the benefits of Programmable-Room is the creation of rooms with divese shapes. This is possible due to the new method, we developed, to generate a textured 3D room mesh from user-specified room shapes.

Layout Image

Complex Instructions

Using predifined modules, Programmable-Room can transfer a complicated instruction of multiple tasks into an arranged python code.

Compex Code Image