This is my before.
Before the workshops.
Before the blog.
Before the refinements, the rewrites, the confidence of mastery.
Before the beacon was lit.
The following isn’t the polished voice of a certified expert—it’s something more raw: a baseline. A peek into my mind when I committed to creating Shinros. It’s a diagnostic snapshot, an invitation to follow the journey. Because if you’re reading this, there’s a good chance you’re where I was. You know AI can do more. You’ve probably tried a few prompts, maybe even copied some from LinkedIn. But you’re still in the dark more often than not.
This blog is the start of showing you how to light your own way. And I’ll begin by showing you how I tested mine.
The Diagnostic: Module 0 Assessment
Module 0 in my course is about orientation and clarity. It doesn’t assume mastery. Instead, it gives you a controlled prompt engineering challenge with no research, no prep, and no tricks. Just instinct.
Then, we run that raw output past three expert LLMs: ChatGPT, Claude, and Grok. Together, they evaluate your answers for accuracy, clarity, teachability, and structure.
Here’s how I scored:
Section | Score | Consensus Insight |
---|---|---|
Theoretical | 34 / 50 | Strong instinct, room to deepen technical clarity. |
Practical | 24 / 30 | Excellent structure and usability; very close to consultant-level. |
Total | 58 / 80 | Consistently evaluated across three top-tier models. Reliable foundation for consulting growth. |
These numbers are more than scores to me. They showed me which of my instincts were already solid, and where structured training would turn instinct into confident skill.
What I Got Right—and What I Didn’t Know I Missed
Q1: How LLMs Generate Responses
Score: 8/10
📝 Prompt: Explain how LLMs generate responses. Be literal and technical. Avoid metaphor unless you need to clarify.
I described tokenization, probability-driven output, and generation loops. I even noted sampling methods like temperature and top-p. That’s enough to pass. However, Grok pointed out something I overlooked: stop conditions. I didn’t mention end-of-sequence tokens or max token limits—both critical for understanding when and why a model halts generation. In hindsight, it’s like describing how a car drives without mentioning the brakes.
Q2: Context Window Limits
Score: 7/10
📝 Prompt: What happens when a prompt exceeds the model’s context window? How might that affect the model’s response?
I explained that when a prompt exceeds the model’s context window, earlier tokens are dropped, potentially harming coherence. The metaphor I used—something like “the model forgets”—was accurate in effect but anthropomorphic in phrasing. Grok reminded me to anchor this to mechanics, not metaphor, and refined my understanding. Next time, I’d say: “The model drops tokens—from which part depending on the model. This can result in outputs that lack grounding or continuity.”
Q3: Chain-of-Thought vs ReAct
Score: 6/10
📝 Prompt: How would you explain the difference between Chain-of-Thought and ReAct prompting to a client who’s never heard of either?
This was the hardest to explain cleanly. I framed CoT as “thinking out loud” and ReAct as “reasoning with feedback,” but I didn't understand what makes ReAct unique: external action (even if simulated). Grok offered the missing piece—ReAct isn't just structured thought; it’s alternating between thinking and doing. If I were teaching this now, I’d show both side-by-side with an example, rather than only describing the philosophy. By the way, ReAct prompts have now become some of my favorite to play with.
Q4: Prompt Differences Across Models
Score: 7/10
📝 Prompt: Why might the same prompt work well in ChatGPT but fail in Mistral or LLaMA? List 2+ factors.
I noted that different models can behave differently with the same prompt due to their own system prompts and deployment environments (API vs GUI). What I missed was something more fundamental: training data and architecture. Claude and Grok agreed that this was a large omission. Models don’t just “feel different”—they are different. They’ve seen different training data, use different token vocabularies, and behave differently under load.
Q5: Summarization and Hallucination
Score: 6/10 (across the board)
📝 Prompt: Give 3 reasons why summarization alone can still lead to hallucination when compressing context.
This was a wake-up call. I knew that summarization can lead to detail loss. But I didn’t clearly connect that to hallucination. The feedback helped me realize: hallucination in this context isn’t about having too few tokens, it’s about losing anchor points and nuanced information. If I summarize a meeting and forget who said what, the LLM still follows the pattern. And if it doesn’t have the real information? It fills in the blanks with plausible—but invented—words. That’s where hallucination lives: in empty patterns that beg to be filled.
Today, I’d explain this using a narrative: “Imagine summarizing a team meeting, then asking the model who disagreed with the decision. If the summary didn’t include the disagreement, the model might invent one—plausible, coherent, and entirely fictional.” All because it completed the pattern.
Practical Prompts: My Strongest Instincts
P1: Chain-of-Thought Prompt
Score: 7/10
📝 Prompt: Write a prompt that asks the model to reason step-by-step before answering (e.g. for math, troubleshooting, or decision-making).
My prompt cued multi-step reasoning but could have scaffolding like: “First explain the options, then justify your final choice.” I now do this naturally and without thinking about it.
P2: Compression Prompt (Under 40 Tokens)
Score: 9/10
📝 Prompt: Compress this long-winded instruction into something under 40 tokens without losing intent: “You are a helpful assistant designed to write formal and respectful emails in response to various customer support queries. Begin with a professional greeting, clearly acknowledge the customer's concern, and propose a helpful resolution using polite language.”
This was one of my best-performing answers. I managed to preserve tone, structure, and scope in a compressed instruction. Grok suggested one tiny improvement: beware compression that might introduce ambiguity.
P3: Inclusive HR Prompt
Score: 8.5/10
📝 Prompt: You’ve been asked to help a non-technical HR manager use an LLM to rewrite job descriptions. Write a prompt that guides the LLM to make the language more inclusive without losing clarity.
My answer was praised for structure and usability. Claude especially liked how it taught a non-technical user to revise job descriptions with feedback loops. I’ve since simplified the language even more and turned it into a reusable pattern for HR use cases.
Why This Matters
I’m not showing you this to prove I’m good at prompts. I’m showing it to prove that even a strong start has room to grow—and that you don’t have to learn this alone.
Most professionals using AI today are doing so instinctively. Sometimes it works. Sometimes it doesn’t. And when it fails, they don’t know why.
This blog, this course, this company—Shinros—exists to fix that.
We’re building the lighthouse I wish I had while I was working with AI as a Technical Support Engineer: a way to learn with intent, to speak to AI systems like colleagues, and to bring clarity where most people are still guessing.
This Is the Before
Before the results.
Before the frameworks, the ReAct experiments, the blog series on compression strategies.
Before the testimonials and workshops and playbooks.
But not before the calling. Not before the vision. That was already there—quietly burning.
You’re welcome to follow what comes next. The first formal module is underway. The course is called The Shinros Method: Prompting with Purpose for High-Impact AI Use. It teaches everything I’m now refining—step by step, method by method.
This post is where I started.
This is Before the Beacon.