PILOT — Private preview. Progress is saved for this browser session only.
HaiPhai.AI Fluency for Biotech

The Bright Lines — What Never Goes Into AI Tools

Lesson 1~22 min3-question check

Module 03 · Lesson 01

The Bright Lines — What Never Goes Into AI Tools

Reading time: 22 minutes Track: Governance & Compliance · REQUIRED before role specialization Prerequisites: Module 02 complete


What this lesson does

Module 03 is the required gate of the HaiPhai curriculum. You cannot complete certification without it, and you cannot specialize in your role path until you've completed it. This is intentional. Of the five universal capabilities you learned in Module 01, governance fluency is the only one where being weak is dangerous in addition to suboptimal.

Lesson 01 covers the bright lines — the absolutes. The data types and use cases where there is no judgment call, no gray zone, no reasonable exception. These are the items that need to be automatic refusals.

By the end of this lesson, you'll be able to:

  1. Identify the seven categories of data that never enter public AI tools
  2. Distinguish "public AI tools" from "enterprise AI tools" and apply correct rules to each
  3. Recognize subtle scenarios that violate bright lines despite looking innocent
  4. Explain the consequences of bright-line violations clearly enough to teach others

This lesson is the most important in the entire curriculum. Read it carefully.


01 · Why bright lines matter

Most of biotech AI governance involves judgment — gray-zone decisions where the right answer depends on context, the specific tool, the specific data, the specific use case. Module 03's later lessons cover those decisions in detail.

Bright lines are different. Bright lines are the cases where there is no judgment to make. The answer is no, regardless of context, tool, urgency, or apparent benefit. If you find yourself reasoning your way around a bright line, you've already failed — the answer was no before you started reasoning.

The pattern of bright-line violations

Almost every serious biotech AI incident follows the same pattern:

  1. A professional with legitimate intent encounters a time-pressured task
  2. They use an AI tool they normally use for that kind of work
  3. The specific case they're working on includes content that violates a bright line
  4. They paste it anyway, telling themselves "it's just this one time" or "the tool is probably secure" or "I'm not really sharing it, I'm just using AI to help with it"
  5. The content is now in the AI vendor's logs, possibly used for training, possibly accessible to vendor staff, possibly subject to subpoena
  6. When discovered, the consequences land — sometimes immediately, sometimes years later

The intent was good. The tool was reasonable. The case was real. None of that matters. The bright line was crossed.

Bright lines exist because professional judgment under pressure is unreliable. The rules have to be automatic so that pressure doesn't produce bad choices.


02 · The tool taxonomy — what "AI tool" means

Before we get to the bright lines themselves, you need a clean mental model of the AI tool landscape. The rules differ based on which type of tool you're using.

Category A — Public consumer AI tools

Examples: chat.openai.com (consumer), claude.ai (consumer), Gemini consumer products, Perplexity, Copilot in consumer Edge.

Properties:

  • Data may be used for model training (depending on settings)
  • Vendor employees may access conversations in some circumstances
  • No HIPAA Business Associate Agreement (BAA)
  • No enterprise data handling commitments
  • Conversations stored on vendor servers indefinitely or per default retention

Rule: Treat these as fully public. Anything you paste in could theoretically end up in training data, vendor logs, or third-party hands.

Category B — Enterprise AI tools with proper agreements

Examples: ChatGPT Enterprise, Claude for Enterprise (with proper contract), Microsoft Copilot for Microsoft 365 (with appropriate licensing), Google Workspace AI features (with proper agreement), private API access to models with no-training agreements.

Properties:

  • Contractual commitment that data is not used for training
  • BAA available if needed for PHI
  • Enterprise-grade data handling
  • Vendor employees access restricted by contract
  • Defined data retention and deletion policies

Rule: Many more use cases are appropriate, but specific data types still have rules. Confirm what's actually in the contract, not what the marketing page claims.

Category C — Private deployments

Examples: On-premises model hosting (your company runs the model on your own infrastructure), private cloud deployment with no data leaving your environment, fine-tuned models hosted in your VPC.

Properties:

  • Data never leaves your environment
  • No vendor access at all
  • Full control over retention, access, and use
  • Limited to whichever models you've deployed (usually open-source)

Rule: Approaches "no different from using your own tools." Still some restrictions (PHI handling SOPs still apply) but fewer.

Category D — AI features inside other tools

Examples: AI summarization in Notion, AI assistance in Salesforce, AI features in CRM tools, AI-powered features in EHR/EMR systems.

Properties:

  • Inherits the data handling of the underlying tool
  • May or may not use external AI services in the background
  • Often unclear which model is being used and where data goes

Rule: Treat with caution until you understand the architecture. The tool's data handling claims don't automatically transfer to its AI features.

Why this matters

Throughout the rest of Module 03, when we say "AI tool," we'll specify the category. A rule that's absolute for Category A may be allowable in Category B and entirely fine in Category C. Know which category your tool is in before you put anything into it.

If you don't know, the safe default is to treat it as Category A.


03 · The seven bright lines

These are the seven categories that need to be automatic refusals across all biotech AI use. Some have nuances depending on tool category, but the bright-line cases below are absolute even for Category B and most Category C tools.

Bright Line 1 — Identifiable patient data

Never paste into any AI tool that isn't HIPAA-compliant with a BAA, deployed in a HIPAA-compliant environment, and explicitly approved by your privacy team:

  • Patient names
  • Patient dates of birth or specific dates of service
  • Patient addresses, phone numbers, or email addresses
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers, Social Security numbers
  • Biometric identifiers (fingerprints, voiceprints, photos of faces)
  • Any of the 18 HIPAA Safe Harbor identifiers in combination
  • Any data that could be re-identified when combined with other available information

Why it's absolute: HIPAA violations carry serious legal, regulatory, and financial consequences. Civil penalties range from $100 to $50,000 per violation. Criminal penalties can include imprisonment. Reputational damage to the organization can be severe.

The most common violation pattern: A clinical operations professional pastes case narratives, deviation reports, or query responses containing subject information into a public AI tool to "help draft" a response. The subject information isn't fully de-identified. Even Subject IDs in some cases can be re-identified if you have access to other systems.

The fix: Before using AI for any patient-related work, ensure (a) you're in a HIPAA-compliant tool with appropriate agreements in place, OR (b) the data is properly de-identified per HIPAA Safe Harbor or Expert Determination standards.

Bright Line 2 — Material non-public information (MNPI)

Never paste into any non-private AI tool:

  • Unannounced clinical trial results (even preliminary)
  • Unannounced regulatory decisions or interactions
  • Unannounced M&A discussions
  • Unannounced financial information (revenue, guidance, etc.)
  • Unannounced executive changes
  • Unannounced major partnerships or licensing deals
  • Any information that, if traded on, could trigger securities violations

Why it's absolute: MNPI handling is governed by securities law. Insider trading violations are felonies. The boundary applies even if you have no intent to trade — the data exposure itself can be a violation depending on jurisdiction and circumstances.

The most common violation pattern: A communications or investor relations professional pastes the draft of an upcoming press release into a public AI tool to "improve the language." The draft contains MNPI. The data is now in the tool's logs and potentially training data.

The fix: For any pre-announcement content, use only Category B (enterprise) or Category C (private) tools with explicit contractual data protection. Verify with your legal team before using any tool for MNPI handling.

Bright Line 3 — Sponsor-confidential data (CDA/MSA covered)

Never paste into any non-private AI tool when you're acting under a CDA or MSA:

  • Sponsor's clinical trial data, even at the protocol level
  • Sponsor's drug development plans
  • Sponsor's regulatory strategy
  • Sponsor's commercial plans
  • Sponsor's CMC information
  • Sponsor's competitive intelligence

Why it's absolute: CDAs and MSAs typically include explicit prohibitions on disclosure to third parties without consent. Pasting into a public AI tool is disclosure. The sponsor may have grounds for legal action, contract termination, or financial damages.

The most common violation pattern: A CRO employee uses a public AI tool to help write protocol summaries, monitoring reports, or DM listings for a sponsor's trial. The protocol contains confidential information about study design, comparators, or endpoints. The pasting has now constituted a third-party disclosure.

The fix: When working on sponsor data, verify what your contract permits. Most sponsors have not yet explicitly authorized AI tool use for their data; in the absence of explicit authorization, treat the data as fully restricted from any AI tool.

Bright Line 4 — Pre-publication scientific content

Never paste into any non-private AI tool:

  • Unpublished manuscript drafts
  • Unpublished poster content
  • Unpublished abstract submissions
  • Pre-submission grant applications
  • Pre-print content under embargo
  • Co-authored content where co-authors haven't consented to AI tool use

Why it's absolute: Journals increasingly require disclosure of AI assistance and have policies on AI use in submitted work. Submitting content to a public AI tool can:

  • Violate journal publication policies
  • Compromise blind peer review processes
  • Create authorship attribution questions
  • Expose research before it's published, undermining priority

The most common violation pattern: A researcher pastes their nearly-complete manuscript into a public AI tool to "polish the language." The manuscript is now potentially in vendor logs and training data, before it's been peer-reviewed and published.

The fix: For pre-publication content, use only Category B (enterprise) or Category C (private) tools, verify journal policies on AI use, and disclose AI assistance per journal requirements.

Bright Line 5 — Pre-submission regulatory content

Never paste into any non-private AI tool:

  • Draft IND/NDA/BLA submissions before filing
  • Draft FDA correspondence before sending
  • Draft Information Request responses
  • Pre-submission meeting briefing documents
  • Confidential FDA feedback or guidance

Why it's absolute: Pre-submission regulatory content is highly confidential. Disclosure can compromise regulatory strategy, expose proprietary positions, and in some cases create issues with FDA itself if the agency learns content was processed by third-party systems without disclosure.

The most common violation pattern: A regulatory affairs specialist pastes draft IR responses into a public AI tool to refine the language under time pressure. The draft contains confidential information about the company's regulatory position. Now in vendor logs.

The fix: Use only contractually-protected tools for regulatory work. Engage your AI governance team for explicit approval of any tools used in regulatory workflows.

Bright Line 6 — Source code and proprietary methods

Never paste into any non-private AI tool:

  • Proprietary code (algorithms, analytical pipelines)
  • Methods that constitute trade secrets
  • Internal scientific methodology not yet published
  • Statistical analysis plans that contain proprietary methodologies
  • Proprietary models or model weights

Why it's absolute: Trade secrets lose legal protection when disclosed publicly. Pasting proprietary code or methods into a public AI tool may constitute disclosure that compromises trade secret status, regardless of intent.

The most common violation pattern: A computational biologist pastes a proprietary analysis pipeline into a public AI tool to "help debug" it. The code is now in the vendor's systems. Even if the vendor doesn't use it, trade secret status may be impaired.

The fix: Use only Category C (private deployment) or Category B (enterprise with no-training agreement) for any proprietary code or methods.

Bright Line 7 — Information covered by attorney-client privilege

Never paste into any non-private AI tool:

  • Legal advice from your attorneys
  • Communications with outside counsel
  • Documents created at attorney direction for legal purposes
  • Litigation strategy or analysis
  • Internal investigations under legal direction

Why it's absolute: Attorney-client privilege can be waived by disclosure to third parties. Some courts have held that disclosure to AI tools constitutes waiver. The risk to ongoing or potential litigation is severe.

The most common violation pattern: An employee pastes legal advice they received into a public AI tool to "help understand it" or "help respond to it." The privilege may now be waived.

The fix: Treat all legal communications as highly restricted from AI tools unless your legal team has explicitly approved a specific workflow with a specific tool.


04 · The "subtle scenarios" that violate bright lines

Some bright-line violations are obvious. The harder ones are subtle — they involve content that looks innocent but actually crosses the line.

Scenario 1 — De-identified, but re-identifiable

You think your data is de-identified because you removed names. But you kept:

  • Specific dates of service
  • Specific zip codes (for small populations)
  • Rare diagnoses or unusual combinations
  • Specific employer or facility names

These can re-identify individuals when combined with publicly available information. Re-identifiable data is still PHI under HIPAA.

Scenario 2 — "Just helping with the language"

You paste a confidential document and ask the AI only to improve the language. You're not asking the AI to use the content for anything else. But the content is now in vendor logs regardless of your intent. The use case doesn't determine the disclosure status; the act of pasting does.

Scenario 3 — "I removed the most sensitive parts"

You pasted a sponsor-confidential protocol with the most sensitive sections removed. But the remaining content still constitutes disclosure under your CDA. CDAs typically cover the document, not specific sentences. You've still violated the agreement.

Scenario 4 — "It's just an internal email about a confidential matter"

You forwarded an email through an AI tool to summarize it for your own use. The email referenced confidential business matters. The forwarding put confidential content in the tool's systems.

Scenario 5 — "The AI tool is from a company we already work with"

Your IT team approved the vendor for general use. The AI features of the vendor's product may have different data handling than the general product. Approval of the tool doesn't automatically extend to AI features within the tool unless explicitly confirmed.

Scenario 6 — "I'm only using AI to summarize public information"

You're summarizing what looks like public information, but the summary references your company's specific interpretation, position, or strategy. The summary now contains confidential commentary even if the source material was public.

Scenario 7 — "I anonymized the names"

You replaced real names with placeholders. But you kept the specific roles, organizations, or unique combinations that identify the people involved. "The VP of Clinical Operations at our company" identifies one specific person to anyone who knows your organization.


05 · The "I didn't know" defense

A common after-the-fact response to bright-line violations: "I didn't know the tool worked that way." Or: "I didn't realize the data was protected."

This defense doesn't work. In professional contexts, ignorance of data classification or tool handling isn't a defense — it's a sign of inadequate professional standards. Your employer expects you to know what data you handle and what tools you use. Your regulators assume the same. Your sponsor partners expect it.

The practical implication: you are responsible for knowing the rules. Not just intuiting them, knowing them. This lesson and the rest of Module 03 give you the framework. After completing this module, "I didn't know" is no longer available to you.


06 · The consequences

A clear-eyed view of what happens when bright lines are crossed:

Personal consequences

  • Termination of employment (often immediately, often with cause that affects future employment)
  • Permanent record on professional licenses where applicable
  • Possible personal legal liability depending on circumstances
  • Inability to work in similar roles at other companies in the industry
  • In some cases, regulatory or licensing actions

Organizational consequences

  • Regulatory enforcement actions
  • Civil penalties (HIPAA: up to $50,000 per violation, capped at $1.5M per year per type)
  • Criminal penalties in egregious cases
  • Loss of sponsor relationships and contracts
  • Loss of clinical trial sites and investigators
  • Reputational damage in the industry
  • FDA actions affecting ongoing or future programs
  • Securities violations and shareholder actions for public companies
  • Class action lawsuits for privacy violations

Industry-level consequences

When violations become public, they shift the entire industry's posture toward AI adoption. More restrictive rules. Slower deployment. Bigger compliance budgets. The people who cause public incidents make it harder for everyone else to do AI work productively.

This last consequence is one many practitioners don't consider but should. You're not just managing your own risk; you're affecting the industry's ability to use these tools at all.


07 · The default — when in doubt, don't

The unifying rule across all seven bright lines: when in doubt, don't.

If you can't immediately classify whether a piece of content is allowed in a specific tool, treat it as not allowed. Take 5 minutes to verify. Ask your governance team. Use a more restrictive tool. Find an alternative workflow.

The cost of being overly cautious is small (some lost productivity, some delayed work). The cost of being wrong is enormous (potentially career-ending, potentially company-damaging). The math favors caution by a factor of 100 or more.

This is not the same as "don't use AI tools." It's "don't use AI tools for content where you're uncertain about the rules." For the vast majority of work, the rules are clear and AI use is fully appropriate. For the edge cases, default to caution.


08 · Self-check — could you teach this to a colleague?

A useful test for whether you've absorbed this lesson:

Imagine a colleague asks you "Can I paste this protocol summary into Claude to help me draft a monitoring report?" You should be able to:

  1. Ask the right diagnostic questions (which tool category? what's in the protocol? what's the CDA say?)
  2. Identify the bright-line categories that might apply (sponsor confidentiality? patient data? MNPI?)
  3. Recommend a path forward (use enterprise tool? remove specific content? get governance approval?)
  4. Explain why in language that's accurate and complete

If you can do all four, you've internalized the material. If any of the four feels uncertain, re-read the relevant sections before moving on.


09 · Knowledge check

Three questions.


Q1. A medical writer is on deadline for a CSR section. The section involves data from a Phase 2b trial governed by a CDA with the sponsor. The writer's company has approved ChatGPT Enterprise (Category B) with a no-training agreement. What's the correct action?

a) Paste the data — Category B tools are fully approved for any use b) Don't use AI at all for sponsor-related work c) Verify what the specific CDA permits regarding AI tool use; in the absence of explicit sponsor authorization, treat the data as restricted from AI tools regardless of the Category B status d) Use a Category A tool but ask the AI not to remember the content


Q2. Which of the following is a subtle scenario that still violates bright lines despite appearing innocent?

a) Pasting your own approved press release into AI to draft social media variations b) Pasting a sponsor protocol summary into a public AI tool to "just improve the language" without using the AI for analysis c) Using AI to summarize a publicly available FDA guidance document d) Using AI to draft an email about a vendor selection meeting


Q3. Why is "I didn't know" not a valid defense after a bright-line violation?

a) Because regulators don't accept it b) Because in professional contexts, knowing data classification and tool handling is a baseline professional responsibility — ignorance is itself a failure of professional standards c) Because the company should have trained you better d) Because AI tools should have warned you


Answers: Q1: c · Q2: b · Q3: b


10 · What's next

Lesson 02 — The gray zones — covers the harder cases. The data and use cases where there isn't a bright-line answer, and you need decision frameworks for navigating ambiguity. Most of your day-to-day governance decisions live in these gray zones.

The bright lines you just learned are the foundation. The gray zones are where you'll spend most of your governance attention. Don't skip ahead.


End of Lesson 01.

Knowledge check

3 questions · select an answer to see if you got it
1.A medical writer is on deadline for a CSR section. The section involves data from a Phase 2b trial governed by a CDA with the sponsor. The writer's company has approved ChatGPT Enterprise (Category B) with a no-training agreement. What's the correct action?
2.Which of the following is a subtle scenario that still violates bright lines despite appearing innocent?
3.Why is "I didn't know" not a valid defense after a bright-line violation?