- The Artificially Intelligent Enterprise
- DALLE-3: Open AI's Latest Generative Image Model
DALLE-3: Open AI's Latest Generative Image Model
ChatGPT Plus now can create images and generate text and it's pretty good.
[Dalle-3 generated the image above ]
Last week, I gave a tip on how to use Midjourney to create an image from another image. Since then, I have had a little time to play with Dalle-3. Like Midjourney, Dalle-3 is an AI model designed to generate images from textual descriptions; it takes a text prompt and produces a visual representation based on that prompt.
In a previous issue of The Artificially Intelligent Enterprise, I wrote about multimodal models and how they would be a boon to our use of GenAI because they can take multiple inputs and create various modes of output—text and images to create videos or presentations or the like. However, this is more of a federation of models than a true multimodal model.
In ChatGPT Plus, OpenAI has integrated two models with their GPT-4 Large Language Model and DALLE-3. This is very interesting as it lets you get text and image output from a single source. It’s not genuinely multimodal because it requires you to switch between your GPT-4 or GPT 3.5 model and DALLE-3 but uses the LLM to create descriptions of the image in one interface.
Using DALLE-3 in ChatGPT Plus
Once you log into ChatGPT Plus, you can go to the GPT-4 model and choose DALL-E 3 from the dropdown. Then, you can enter a simple prompt, and ChatGPT will create four images by default. I asked ChatGPT to create vector images, though they are just PNG files that can be easily converted into vectors using Adobe Express or Illustrator. As far as I can tell, ChatGPT can only create rasterized illustrations.
Now for the experience of creating images in DALL-E. After entering your prompt, Here’s the first thing you will see. It creates a descriptive prompt for each image. I already use ChatGPT to create prompts for Midjourney, so this was a good design point, in my opinion.
After a time, ChatGPT refreshes and shows you the images. It’s created. It also summarizes what it has done.
DALL·E 3, the latest iteration from OpenAI, boasts enhanced nuance and precision over its predecessors, converting ideas into strikingly accurate images. Unlike many text-to-image systems that neglect certain words, DALL·E 3 closely adheres to the provided text. It has shown marked improvement over DALL·E 2, producing more faithful images even from the same prompts. Integrated natively with ChatGPT, DALL·E 3 can be used in conjunction with ChatGPT for brainstorming and refining visual concepts. When users share an idea with ChatGPT, it formulates detailed prompts for DALL·E 3 to visualize. Moreover, generated images are the users' property, free for various uses without OpenAI's consent. I have also noticed that it is much better at correctly integrating text into the images, but it’s still far from perfect. It often makes the text legible but sometimes misspells the text in the image, and even upon iteration, I could not get DALL-E 3 to improve the spelling.
Concept Visualization: Artists and designers can transform vague ideas into visual drafts.
Educational Tools: Teachers can generate visual aids to explain abstract concepts.
Content Creation: Media professionals can produce unique graphics for marketing and storytelling.
Brainstorming Sessions: Teams can utilize DALL·E 3 to visualize potential product designs or conceptual models.Tip of the Week: Using OpenAI’s DALL-E 3 to Generate Images
Generate Presentations: I like using ChatGPT to create an outline for my presentations, and now I can use DALLE-3 to generate the visual aids.
DALLE-3 is a powerful tool for converting textual descriptions into visual imagery. By providing detailed and diverse descriptions, users can harness its full potential for various applications, from art and design to education and entertainment. Here’s a quick overview of how to use DALLE-3.
Description: Provide a detailed textual description of the image you want to generate.
Resolution: Specify the resolution of the image. The available options are:
Square: 1024x1024 (default)
Tall: 1024x1792 (ideal for full-body portraits)
Number of Images: Dalle-3 will generate up to four diverse images from a given description by default. If you want a specific number of images, specify that in your request.
Seeds: Dalle-3 allows users to provide a seed value for image generation. This can be useful if you want to reproduce a specific image or slightly modify a previously generated image.
Diversification: It's beneficial to vary the phrasing and details to get diverse image outputs when providing descriptions. Instead of extending the description's length, refactoring it with different perspectives or angles can yield more varied results.
Bias and Inclusivity: Dalle-3 promotes diverse representation. When depicting people, it's encouraged to specify descent and gender for inclusivity. However, the model also ensures no offensive or inappropriate imagery is generated. See below an example of how DALLE-3 chose different backgrounds for inclusivity by default.
Limitations and Policies:
Dalle-3 won't generate images of politicians or public figures.
Artists or styles from the last 100 years cannot be referenced directly.
Tips and Tricks for Using DALL·E 3
Detailed Descriptions: The more specific you are with your text prompts, the more accurate the resulting image. Instead of "a tree," try "a tall oak tree with golden leaves during autumn."
Iterative Refinement: Start with a broad idea and refine iteratively. If the initial result isn't perfect, tweak the description based on what you see.
ChatGPT Integration: Leverage DALL·E 3's integration with ChatGPT to brainstorm and refine your prompts. Ask ChatGPT for suggestions if unsure.
Avoid Over-complication: While DALL·E 3 is advanced, overly complex or contradictory prompts can yield unexpected results. Keep it concise yet clear.
Prompt Engineering: Familiarize yourself with prompt engineering techniques. Sometimes, minor changes in phrasing can produce drastically different outputs.
Stay Updated: OpenAI frequently updates its models. Watch for new features or improvements that enhance your DALL·E 3 experience.
Ethical Use: Remember to use the tool responsibly, ensuring that generated content is ethical and respectful.
Explore Community Forums: Join online forums or user groups dedicated to DALL·E. Sharing experiences and seeing others' creations can inspire and solve challenges.
Backup Your Creations: Since DALL·E-generated images are your property, ensure you back them up properly. This way, you have a record of your designs.
Experiment and Have Fun: The tool is as much for serious applications as for creativity. Play around, experiment, and discover DALL·E 3's vast possibilities.
Tip of the Week: Chat with Images on the ChatGPT Mobile App
OpenAI has enhanced ChatGPT by allowing users to interact with it using voice or images and typing. Voice requires the OpenAI mobile App available in the Apple iPhone and Android stores. I could use chat with images on the iPhone app, but it’s yet to reach my ChatGPT Plus account. According to OpenAI, it will be available there soon.
The voice function operates similarly to Alexa or Google Assistant, utilizing OpenAI's Whisper model for speech-to-text conversion and introducing a new text-to-speech model. The image feature is likened to Google Lens but with restrictions to avoid privacy issues and inaccuracies. Let’s look at an example of how to use ChatGPT Mobile to chat with images.
Example of Chat with Images in Action
Here’s an example of what I could do with chat with an image I took at dinner one night.
Then, I asked ChatGPT to create a recipe from the image.
Pretty useful, I’d say. I used a simple example, but other ideas for this could be to take a picture and then have ChatGPT create a prompt to create a cartoon image for a website based on a real-life picture. Or take a picture of a car engine and ask it how to fix the problem.
What I Read this Week
Generative AI and the future of work in America - McKinsey Global Institute
Want to Trick an LLM? Try Asking It Nicely or Use Argentinian Spanish - The Information
Generative AI exists because of the transformer - The Financial Times
ChatGPT can now see, hear, and speak - OpenAI