Before the Beacon

“Lighthouses don’t run around the island looking for boats to save—they burn bright, like the beacons of Gondor.”

Published April 2025 · Tags: PromptEngineering, LLMs, SkillCheck

← Back to all posts

This is my before.

Before the workshops.
Before the blog.
Before the refinements, the rewrites, the confidence of mastery.

Before the beacon was lit.

The following isn’t the polished voice of a certified expert—it’s something more raw: a baseline. A peek into my mind when I committed to creating Shinros. It’s a diagnostic snapshot, an invitation to follow the journey. Because if you’re reading this, there’s a good chance you’re where I was. You know AI can do more. You’ve probably tried a few prompts, maybe even copied some from LinkedIn. But you’re still in the dark more often than not.

This blog is the start of showing you how to light your own way. And I’ll begin by showing you how I tested mine.

The Diagnostic: Module 0 Assessment

Module 0 in my course is about orientation and clarity. It doesn’t assume mastery. Instead, it gives you a controlled prompt engineering challenge with no research, no prep, and no tricks. Just instinct.

Then, we run that raw output past three expert LLMs: ChatGPT, Claude, and Grok. Together, they evaluate your answers for accuracy, clarity, teachability, and structure.

Here’s how I scored:

Section Score Consensus Insight
Theoretical 34 / 50 Strong instinct, room to deepen technical clarity.
Practical 24 / 30 Excellent structure and usability; very close to consultant-level.
Total 58 / 80 Consistently evaluated across three top-tier models. Reliable foundation for consulting growth.

These numbers are more than scores to me. They showed me which of my instincts were already solid, and where structured training would turn instinct into confident skill.

What I Got Right—and What I Didn’t Know I Missed

Q1: How LLMs Generate Responses

Score: 8/10

📝 Prompt: Explain how LLMs generate responses. Be literal and technical. Avoid metaphor unless you need to clarify.

I described tokenization, probability-driven output, and generation loops. I even noted sampling methods like temperature and top-p. That’s enough to pass. However, Grok pointed out something I overlooked: stop conditions. I didn’t mention end-of-sequence tokens or max token limits—both critical for understanding when and why a model halts generation. In hindsight, it’s like describing how a car drives without mentioning the brakes.

Q2: Context Window Limits

Score: 7/10

📝 Prompt: What happens when a prompt exceeds the model’s context window? How might that affect the model’s response?

I explained that when a prompt exceeds the model’s context window, earlier tokens are dropped, potentially harming coherence. The metaphor I used—something like “the model forgets”—was accurate in effect but anthropomorphic in phrasing. Grok reminded me to anchor this to mechanics, not metaphor, and refined my understanding. Next time, I’d say: “The model drops tokens—from which part depending on the model. This can result in outputs that lack grounding or continuity.”

Q3: Chain-of-Thought vs ReAct

Score: 6/10

📝 Prompt: How would you explain the difference between Chain-of-Thought and ReAct prompting to a client who’s never heard of either?

This was the hardest to explain cleanly. I framed CoT as “thinking out loud” and ReAct as “reasoning with feedback,” but I didn't understand what makes ReAct unique: external action (even if simulated). Grok offered the missing piece—ReAct isn't just structured thought; it’s alternating between thinking and doing. If I were teaching this now, I’d show both side-by-side with an example, rather than only describing the philosophy. By the way, ReAct prompts have now become some of my favorite to play with.

Q4: Prompt Differences Across Models

Score: 7/10

📝 Prompt: Why might the same prompt work well in ChatGPT but fail in Mistral or LLaMA? List 2+ factors.

I noted that different models can behave differently with the same prompt due to their own system prompts and deployment environments (API vs GUI). What I missed was something more fundamental: training data and architecture. Claude and Grok agreed that this was a large omission. Models don’t just “feel different”—they are different. They’ve seen different training data, use different token vocabularies, and behave differently under load.

Q5: Summarization and Hallucination

Score: 6/10 (across the board)

📝 Prompt: Give 3 reasons why summarization alone can still lead to hallucination when compressing context.

This was a wake-up call. I knew that summarization can lead to detail loss. But I didn’t clearly connect that to hallucination. The feedback helped me realize: hallucination in this context isn’t about having too few tokens, it’s about losing anchor points and nuanced information. If I summarize a meeting and forget who said what, the LLM still follows the pattern. And if it doesn’t have the real information? It fills in the blanks with plausible—but invented—words. That’s where hallucination lives: in empty patterns that beg to be filled.

Today, I’d explain this using a narrative: “Imagine summarizing a team meeting, then asking the model who disagreed with the decision. If the summary didn’t include the disagreement, the model might invent one—plausible, coherent, and entirely fictional.” All because it completed the pattern.

Practical Prompts: My Strongest Instincts

P1: Chain-of-Thought Prompt

Score: 7/10

📝 Prompt: Write a prompt that asks the model to reason step-by-step before answering (e.g. for math, troubleshooting, or decision-making).

My prompt cued multi-step reasoning but could have scaffolding like: “First explain the options, then justify your final choice.” I now do this naturally and without thinking about it.

P2: Compression Prompt (Under 40 Tokens)

Score: 9/10

📝 Prompt: Compress this long-winded instruction into something under 40 tokens without losing intent: “You are a helpful assistant designed to write formal and respectful emails in response to various customer support queries. Begin with a professional greeting, clearly acknowledge the customer's concern, and propose a helpful resolution using polite language.”

This was one of my best-performing answers. I managed to preserve tone, structure, and scope in a compressed instruction. Grok suggested one tiny improvement: beware compression that might introduce ambiguity.

P3: Inclusive HR Prompt

Score: 8.5/10

📝 Prompt: You’ve been asked to help a non-technical HR manager use an LLM to rewrite job descriptions. Write a prompt that guides the LLM to make the language more inclusive without losing clarity.

My answer was praised for structure and usability. Claude especially liked how it taught a non-technical user to revise job descriptions with feedback loops. I’ve since simplified the language even more and turned it into a reusable pattern for HR use cases.

Why This Matters

I’m not showing you this to prove I’m good at prompts. I’m showing it to prove that even a strong start has room to grow—and that you don’t have to learn this alone.

Most professionals using AI today are doing so instinctively. Sometimes it works. Sometimes it doesn’t. And when it fails, they don’t know why.

This blog, this course, this company—Shinros—exists to fix that.

We’re building the lighthouse I wish I had while I was working with AI as a Technical Support Engineer: a way to learn with intent, to speak to AI systems like colleagues, and to bring clarity where most people are still guessing.

This Is the Before

Before the results.
Before the frameworks, the ReAct experiments, the blog series on compression strategies.
Before the testimonials and workshops and playbooks.

But not before the calling. Not before the vision. That was already there—quietly burning.

You’re welcome to follow what comes next. The first formal module is underway. The course is called The Shinros Method: Prompting with Purpose for High-Impact AI Use. It teaches everything I’m now refining—step by step, method by method.

This post is where I started.

This is Before the Beacon.