MidJourney: What the experts say about AI in architecture and design

Recently, the virtual world has been taking the real one by storm – call it an after effect of the pandemic, or the butterfly effect from feeding computational systems data over decades – in either case, the truth remains that AI (artificial intelligence) is getting powerful with each passing day.

Just as we were warming up to the idea of the Metaverse, Digital Twins and NFTs, hyper-realistic images that looked like they were right out of an architect’s dream began floating on the internet. These seemed like they took weeks to perfect and render, but this was not true – these images were generated in a matter of minutes or a few hours, at most, on AI image generation platforms like MidJourney and DALL.E 2.

OVERVIEW:

This article is the first installment of a three-part series titled, ‘Creator vs Creation‘ that spotlights the budding relationship between AI platforms, like MidJourney, and interior and architecture design.

Creator vs Creation: What users say about AI-platforms like MidJourney in architecture and design?

🡲MidJourney over DALL.E 2?

🡲 Practise Makes Perfect, Even With AI

🡲Lack of Diversity in Cultural Representation

Semantically called text-to-image prompts, the magic of this art does not lie in the words (at least not foundationally) but in the programming of these platforms. While MidJourney, founded by David Holz, is run via Discord bot that sends and receives calls to AI servers, Open AI’s DALL.E 2 is makes use of CLIP (Contrastive Language-Image Pre-training), which is like being trained off on a whole lot of images to generate independent original images. Out of these two, it is MidJourney that has garnered more attention.

To understand these tools better and how they can influence the world of design, we spoke to architects and designers, who create regularly on these platforms.

MidJourney over DALL.E 2?

Diego Castro Posada, Managing Director of the architectural rendering studio, M.O.N.O.M.O, enlightened us further about the differences between the platforms, while bringing out why, in his opinion, is MidJourney more well-received – even though it is DALL.E 2 that created more photorealistic images compared to the latter’s artsy output. “DALL. E 2 was aiming for realism, nevertheless, it was still missing that incredible factor. You could almost see these patterns that made the image hazy and unclear. On the other hand, MidJourney scrapped realism straight from the beginning and tapped into a more expansive and artistic way to explore your own ideas with a much simpler interface – who nowadays hasn’t gone into Discord for some reason or another? Good UX for me is not only about simplicity, but also about less friction. If we come back to the present, MidJourney has introduced their beta –testp (which stands for photo-realism), this is another loss for DALL.E 2 as you can achieve the same or even better results that you could achieve with DALL.E 2 but with that initial creative and explosive freedom from MidJourney.”

He mentions how effortless it was to even teach his friends to use MidJourney. “After a brief explanation they were already producing nice results from their own ideas, creating variations and upscales. DALL.E 2, I believe, needs to up their game to join MidJourney at the top tier. I see some of their tools for image expansion or brushes for image correction, that are their USPs that they could really take advantage of. I’ve seen many of my peers loving those tools, and sometimes I see myself using them as well,” Diego finishes.

Practise Makes Perfect, Even With AI

Chhavi Mehta, Architectural Assistant at Zaha Hadid Architects, however, points out that there’s much more than what meets the eye when working with MidJourney, “On the face of it, MidJourney is a user-friendly tool in general. Anyone can type in a few words and get an initial result. But learning how to engineer a prompt that enables you to generate a result more aligned with your objectives requires a deeper understanding of how AI works. From a technical perspective, this includes balancing the prompts with text and image weights, and prompt modifiers amongst other things. But, I believe, the generation process in MidJourney is more than structuring and combining words to form a prompt. The images generated are usually ambiguous sets of four – recognising the potential in one and developing it through guiding the AI with precise prompt engineering is the most crucial. I believe it is the understanding of the prototypical generation process as well as the selection of which variation to build upon and what is what leads to the most successful results.”

Chhavi informs that Zaha Hadid Architects experiments with both DALL·E 2 and MidJourney in their design processes. She elaborates, “DALL.E 2 specifically has been used to architect an ongoing metaverse project as well. I believe the fluid nature of AI tools like MidJourney along with the presence of a large number of Zaha Hadid images in the database and training data enables us to generate images that reflect the characteristic aesthetics and spatial identity of the firm’s work, making it a desirable tool for us.”

Lack of Diversity in Cultural Representation

One man’s pleasure is another man’s pain. While for Zaha Hadid Architects, MidJourney’s database is advantageous, for Hassan Ragab, one of the most followed MidJourney creators, whose creations are often influenced by his Egyptian roots, achieving desired results is a complicated process. “I have been facing a lot of challenges when dealing with non-western architecture using AI generators – MidJourney being one of them. I try to get around that by having a more surreal “conceptual” representation of many forms of the different architectural styles in Egypt. To get around the lack of visual representations of non-western landmarks in the dataset, I add a few irrelevant prompts as a way of enforcing the look and feel that I am searching for instead of waiting to get the “lucky” iteration. For example, adding specific ancient Egyptian column styles or status where they shouldn’t be to make it look like an ancient Egyptian temple, but yet again I can’t get an exact visual representation of a certain landmark. For example, I can’t get MidJourney to generate “Al Karnak” temple with enough details to make it visually recognizable. The point here is not to replicate the landmark visually, but rather to have a good basis to add other new visual layers”, he explains.

Even Chhavi tried her hand at generating images that would reflect her Indian heritage and was faced with similar hurdles, “Since MidJourney was trained using a generic data set, certain specific regional and cultural knowledge is less documented and creates a gap. For example, when generating an image of a parametric Indian Hindu temple, it either generates an image of a parametric structure that doesn’t resemble a temple at all or generates the image of a very conventional one. This is because it lacks data and is unable to recognise the distinctive architectural characteristics present in all Indian Hindu temples. In such a scenario, only someone who knows Indian temple architecture can feed in the specific words to the prompt and even prompt the AI with images to be able to achieve successful results. I think it is critical to conserve and promote our heritage and cultural knowledge in the age of AI. This can be done at a higher level by fine-tuning the generic data set to include specific words and images but also collectively by individual users who generate images using prompts that represent aspects of their culture, thus expanding the data set and training the AI,” she states.

Feature Image Courtesy: Hassan Ragab