As a blog about my exploits in machine learning, the buzz is usually about algorithms, neural networks, and advancements in machine learning. But today, I’m not here to add to that chorus. Instead, I want to shine a light on something far simpler yet surprisingly powerful in the AI toolkit: Markdown. Lately I feel like discovering the big impact of meager and "less than groundbreaking" tools has been a regular occurrence. While I can appreciate the usefulness of Markdown outside of prompt engineering, I find that it, admittedly, falls squarely into that category for me. Markdown, that lightweight markup language often used for formatting text on the web, has turned out to be my secret weapon in the often-overlooked art of prompt engineering
How we got here
Back in the days of "gpt-3.5-turbo has a tendency to disregard system prompts" (early 2023'ish), I was experimenting with ways that I might be able to segment system instruction in a way that would be easiest for the model to understand. I was relying mostly on static, textual structuring but tried spinning up bots where the instruction was passed as JSON, XML, HTML - you name it, I tried it. But in testing, as I was chatting with my bots in Discord (a tool I use to skirt the overhead of building chat interfaces for one-off experiments), I noticed that instruction and meaning were almost always just flat out *better* when I passed instruction over using native Discord formatting - Markdown.
As the technology evolved and I started building other tools around ChatGPT, it seemed pretty clear to me that formatting in ChatGPT's responses seemed to be markdown as well (even if markdown in user inputs doesn't get interpreted - what's with that, OpenAI?). In moving inputs and outputs back and forth between conversations, adopting a habit of structuring my inputs with markdown just seemed like the natural choice. As time went on and the ChatGPT models continued to undergo refinement, I found that markdown allowed me to conserve tokens and forego some degree of specificity in my inputs just by way of my intent being a bit clearer in the structure of a given input.
One issue with both of the above is that those aren't very objective, right? Accuracy in outputs is more of a subjective interpretation for how well a generation does or doesn't express nuances in the context - language, after all, is full of subtle contextual postulations about what meaning is and isn't expected to come through in a conversation. The subject of the sentence, the object of the sentence, and what can be assumed about them from the context of a conversation. But when the objective benefits became clear was when I started looking to build corpora for RAG (retrieval augmented generation) pipelines. It's no secret that some data prep is generally needed to get a good pipeline set up, and in looking to vectorize a corpus, it became *super* clear that prepping a corpus to fit into a commonly understood and easily manipulatable format like markdown made the whole system MUCH more effective.
The Absolute Basics: A Markdown Primer
If you've been around the internet, like, at all – places like Reddit, GitHub, or even in the midst of Discord chats – you've likely encountered Markdown. It's like the secret sauce that makes text look just right in a lot of places where a platform might want to give their users a way to add a little bit of stylistic flair to their text.
So, what exactly is Markdown? In simple terms, it's a lightweight markup language with plain-text formatting syntax. Its key feature? Simplicity. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, which then converts to structurally valid HTML (or other formats). This means you can create formatted text (like headings, bold text, italics) without needing to know a shred of HTML, and the machines that handle that translation can do so in a way that's predictable and easy to accommodate.
Quick side note: Don't get fooled by Slack’s similar-at-a-glance-but-not-actually markdown language mrkdwn
. Despite its name and similar appearance, it's a slightly different beast. Think of it as Markdown's distant, quirky cousin.
Anyway, I promised a primer:
| Markdown Syntax | Description |
|-----------------------|--------------------------|
| # Heading 1 | Large heading |
| ## Heading 2 | Medium heading |
| ### Heading 3 | Small heading |
| **bold text** | Bold |
| *italic text* | Italic |
| [Link](url) | Hyperlink |
| ![Image](image_url) | Image with alt text |
| > blockquote | Blockquote |
| - List item | Unordered list item |
| 1. List item | Ordered list item |
| --- | Horizontal rule |
| `Code` | Inline code |
| ``` <language> | Fenced code block |
| <code> ``` | |
If you're looking for something you can quickly copy and paste into an input to check your interpretations or examples of how your markdown is applied, I've provided a little snippet as well:
# Welcome to My Markdown Guide
## Why Use Markdown?
Markdown is a fantastic tool for writing on the web. Here's why:
- **Easy to Learn**: You can pick up the basics in minutes!
- *Versatile*: Suitable for blogs, documentation, and even books.
- `Code Friendly`: Great for tech writing and code snippets.
### Try It Out
Here's a quick example of how Markdown can transform your writing:
> Markdown makes writing on the web effortless and fun. It's as simple as writing an email, and with a little practice, you can master its syntax. Check out this link for more [Markdown Tips](https://example.com).
---
## Sharing Code Snippets
Markdown is especially useful for sharing code. Here's a simple `Hello, World!` in Python:
```python
print("Hello, World!")
```