← Back to all posts

Before the Beacon

“Lighthouses don’t run around the island looking for boats to save. They stand in one place and make it impossible to miss them.”

Published April 21, 2025 · Tags: PromptEngineering, LLMs, SkillCheck

This is my “before.”

Before the workshops.
Before the blog.
Before the refinements, the rewrites, the confidence of mastery.

Before the beacon was lit.

What you’re reading here isn’t the polished voice of a finished program. It’s a baseline. It’s the snapshot I took of myself when I decided to build Shinros. I’m sharing it because there’s a decent chance you’re standing in almost the same place.

You know AI can do more. You’ve tried prompts from LinkedIn. You’ve watched coworkers copy-paste magic phrases into ChatGPT. Sometimes it lands. A lot of the time, you’re guessing.

This blog is where I stop guessing in public. And I’ll start with the first test I gave myself.

The Diagnostic: Module 0 Assessment

Module 0 in my course is about orientation and clarity. No prep. No research. No “cheat sheets.” You sit down, you answer a controlled set of questions about prompting and LLM behavior using only what you already understand.

Then those answers get reviewed by three separate high-end language models: ChatGPT, Claude, and Grok. I’m not asking them to flatter me. I’m asking them to find cracks.

My starting scores:

Section Score Consensus Insight
Theoretical 34 / 50 Strong instinct. Needs deeper technical clarity.
Practical 24 / 30 Structure and usability close to consultant-level.
Total 58 / 80 Reliable foundation. Clear path to professionalization.

The point of doing this wasn’t “Am I good?” The point was: What is already transferable to someone else on day one, and what still lives only in my head?

What I Got Right — and What I Missed

Q1: How LLMs Generate Responses

Score: 8 / 10

Prompt: Explain how LLMs generate responses. Be literal and technical. Avoid metaphor unless you need it.

I talked about tokenization, probability, and step-by-step token generation. I even mentioned sampling controls like temperature and top-p. Good start.

What I didn’t include: how the model decides when to stop. I didn’t talk about end-of-sequence tokens or max token limits. That sounds small, but it’s not. If you don’t understand how output stops, you can’t control output shape. It’s like explaining how a car moves without ever mentioning the brakes.

Q2: Context Window Limits

Score: 7 / 10

Prompt: What happens when a prompt exceeds the model’s context window? How might that affect the model’s response?

I said that when you go past the limit, older tokens get dropped, which can break coherence. That’s basically correct.

The refinement I got back: don’t say “the model forgets.” The model isn’t a person with memory loss. It’s an input buffer with a hard cap. Tokens fall out, sometimes from the start, sometimes from the middle, depending on architecture. That difference matters in production.

Q3: Chain-of-Thought vs ReAct

Score: 6 / 10

Prompt: Explain the difference between Chain-of-Thought and ReAct to a client who’s never heard of either.

I said Chain-of-Thought is “thinking out loud,” and ReAct is “thinking with feedback.” Close, but not sharp.

The missing piece was the action loop. ReAct is not just structured reasoning. It alternates between reasoning and taking an action (like looking something up), then using what came back to continue reasoning. That loop is the whole value. Once I internalized that, I started using ReAct as a diagnostic tool, not just a prompt style.

Q4: The Same Prompt Across Different Models

Score: 7 / 10

Prompt: Why might one prompt work in ChatGPT but fail in Mistral or Llama? Give at least two reasons.

I said environment matters (API vs chat UI, hidden system prompts, safety layer, etc.). That’s true.

What I didn’t say: models are not interchangeable brains. They’re trained on different data, they tokenize text differently, and they’re optimized with different alignment steps. So you don’t “transfer a prompt,” you “adapt an instruction to a system.” That’s now part of how I teach teams.

Q5: Summarization and Hallucination

Score: 6 / 10

Prompt: Give three reasons why summarization alone can still lead to hallucination when compressing context.

I said “summaries lose detail.” True, but shallow.

The better framing I was given: hallucination shows up when the model is forced to complete a missing pattern. If you compress a meeting and you drop who actually pushed back on a decision, then ask “Who disagreed?”, the model might invent an answer that sounds right. Not because it’s lying. Because it’s filling a hole you created.

That reframe hit me. Hallucination is often downstream of our shortcut, not the model’s ego.

Practical Prompts: Where Instinct Was Already Strong

P1: Chain-of-Thought Prompt

Score: 7 / 10

Task: Write a prompt that makes the model reason step-by-step before answering (math, troubleshooting, decision-making, etc.).

I told the model to reason in steps. Solid. What I didn’t add was structure like “List possible options, explain tradeoffs, then choose one and justify.” I add that automatically now. That one tweak forces clarity and exposes weak logic.

P2: Compression Prompt (Under 40 Tokens)

Score: 9 / 10

Task: Compress this instruction under 40 tokens without losing tone: “You are a helpful assistant designed to write formal and respectful emails in response to various customer support queries. Begin with a professional greeting, clearly acknowledge the customer's concern, and propose a helpful resolution using polite language.”

This was my cleanest answer. I kept the behavior, the tone, and the structure without letting it go vague. Feedback from Grok was basically: “Watch for places where compression creates ambiguity.” That’s fair. There’s a point where “short” becomes “slippery.”

P3: Inclusive HR Prompt

Score: 8.5 / 10

Task: Write a prompt an HR manager (non-technical) could use to make job descriptions more inclusive without losing clarity.

Claude liked this one. What landed was that I wasn’t just “write inclusive language.” I walked the HR manager through an edit loop and gave them an audit checklist. It was usable immediately, which is the entire point.

Why This Matters

I’m not posting this to claim “Look how strong the score is.” That’s not the win.

The win is: I could see my blind spots. And once I could see them, I could design training around them.

Most people using AI at work are doing it on instinct. Sometimes that instinct is great. Sometimes it’s quietly wrong in a way that poisons an answer and nobody catches it until a customer escalates.

Shinros exists to fix that gap. Not with hype, not with “magic prompts,” but with teachable process: asking better questions, verifying output faster, and keeping failure modes visible.

This Is the Before

Before the results.
Before the workflow audits and rebuilt processes.
Before the Reliability Playbook and the Confidence Program.

But not before the pull. That was already there.

From here, I started building the course that became The Shinros Method: Prompting with Purpose for High-Impact AI Use. It’s the same work I use now when I train Support teams to trust AI under pressure.

This post is where it began.

This is Before the Beacon.

Stop guessing with ChatGPT

Train your Support team to ask cleaner questions, verify answers, and escalate less.

Book a Call