After many years spent in research, the scientific process—from idea to publication—becomes second nature. However, this intuition, though invaluable, deserves to be structured. The desire to describe this workflow stems not only from a need to better understand my own work but also from the desire to create a map that can help others navigate this complex terrain.
One inspiration was a humorous but accurate list from the book “We Have No Idea: A Guide to the Unknown Universe” by Jorge Cham and Daniel Whiteson:
- Organize what you know
- Look for patterns
- Ask questions
- Buy a tweed jacket with elbow patches
However, scientific work is, above all, the art of asking the right questions. It’s not about “beating the baseline” but about understanding a phenomenon. The question “why?” is a researcher’s compass. In turn, understanding often means the ability to reconstruct a mechanism (e.g., by implementing code or a formal proof), although in some areas of mathematics, a complete, verifiable line of reasoning is sufficient.
I have noticed that whether I am writing an empirical paper in Natural Language Processing (NLP) or a systematic review with a meta-analysis, a common skeleton lies beneath the surface. The result of these observations is the working framework below, which attempts to visualize this skeleton.
A Practical Sketch of the Research Process
The following framework is a map, not a dogma. It shows the key stages and their specifics depending on the type of work, including reporting standards and the cautious use of modern tools like Large Language Models (LLMs).
Research Stage (Phase) | General Description and Goal | Empirical Paper (e.g., NLP, Observational Studies) | Theoretical Paper (e.g., Mathematics, Computer Science) | Systematic Review (+ Meta-analysis) | Role and Application of LLMs (always with expert validation) |
---|---|---|---|---|---|
0. Planning, Ethics, and Pre-registration | Establishing ethical and methodological frameworks before research begins. Ensuring transparency. | Ethics committee (IRB) approval. Pre-registration of the research plan (e.g., OSF). Data Management Plan (DMP) (compliance with GDPR, HIPAA). | Declaration of conflicting interests. Defining rules for using others’ work and proofs. | Protocol registration in a database (e.g., PROSPERO for health; OSF, INPLASY for other fields). | Support in formulating a data management plan. Assistance in identifying potential ethical risks for further analysis. |
1. Conceptualization and Problem Identification | Broad reading, observation, discussions. The goal is to find a significant scientific question or a knowledge gap. | Observing the limitations of existing models, identifying a new, practical problem. | Noticing a gap in existing theory or the possibility of generalizing a proof. | Observing conflicting research results, lack of a clear answer to an important clinical/practical question. | Sparring partner: Generating ideas, summarizing fields for verification, identifying trends and potential literature gaps. |
2. Formulating a Question and Hypothesis | Focused literature review. The goal is to formulate a precise, verifiable research question. | Formulating a hypothesis. Defining novelty (new method or new problem). | Formulating a specific thesis to be proven or disproven (conjecture). | Formulating a question in PICO/SPIDER format. Defining inclusion/exclusion criteria. | Literature review support: Suggesting keywords, summarizing abstracts for researcher’s assessment, thematic grouping of papers. |
3. Designing the Methodology | A detailed plan for answering the research question. Justification for the choice of methods (why these and not others?). | Protocol according to EQUATOR standards: e.g., STROBE, CONSORT, SPIRIT, TRIPOD. Selection of metrics, baselines. | Defining the proof strategy, necessary lemmas. Considering formal verification (e.g., using proof assistants like Lean/Coq). | Review protocol: Compliant with PRISMA 2020. Finalizing search strategies (Boolean/MeSH), planning the risk of bias assessment (e.g., RoB 2, ROBINS-I; or PROBAST for predictive models). | Design assistant: Generating draft code, proposing libraries, assisting in formulating preliminary search strategies. |
4. Implementation and Data Analysis | Protocol implementation. Collecting results and their statistical analysis. | Model training, validation. Pre-defined ablation plan, stability tests. Corrections for multiple comparisons (e.g., Benjamini–Hochberg FDR control). | Working on the proof. Implementing an algorithm to test the theory. | Dual, independent screening with agreement assessment (e.g., Cohen’s κ coefficient). Data extraction. Synthesis, heterogeneity analysis (I², τ²), publication bias (Egger’s test), prediction intervals. | Analyst’s assistant: Writing and debugging scripts, generating visualizations for verification. Important: inclusion/exclusion decisions in a review are always made by a human. |
5. Interpretation and Synthesis | Answering the “so what?” question. What do the results mean, what are their implications and limitations? | Analysis of results in the context of the hypothesis. Discussion of limitations, threats to validity (see below), and future research directions. | Understanding the implications of the proof. What new questions does it open? | Qualitative and quantitative synthesis (e.g., using the Hartung–Knapp adjustment), assessment of the certainty of evidence (e.g., GRADE, GRADE-CERQual). | Critical sparring partner: Proposing alternative explanations for the results, helping to identify weak points in the argumentation. |
6. Communicating Results (Writing) | Structuring the results into a paper according to reporting standards (e.g., IMRaD). | Writing according to the IMRaD structure and relevant standard extensions (e.g., CONSORT-AI, MI-CLAIM). “Limitations” section. | Flexible structure (definitions, lemmas, proof, conclusions). Precision and coherence are key. | Writing according to PRISMA 2020 and PRISMA-S standards. Including a PRISMA 2020 flow diagram and checklist. | Writing assistant: Language and style correction, paraphrasing, bibliography formatting, generating a draft abstract. |
7. Sharing and Archiving | Ensuring replicability and transparency. Science is a conversation. | Data/Code Availability Statement . Publication of code, data, and environment (repository + DOI, with an explicit license, e.g., MIT/BSD-3 for code; CC BY 4.0 for data). |
Archiving the proof on platforms like arXiv. | Sharing extraction forms, data, and analytical scripts. | Assistance in preparing code documentation (README.md ). Archiving prompts (including the system prompt), settings (temperature, top_p, seed), and model version. |
Key Aspects of Research Rigor
A framework is one thing, but true rigor lies in the details.
Threats to the Validity of Conclusions
Every study is subject to the risk of error. Awareness of these threats is a sign of research maturity. It is always worth asking about four types of validity:
- Internal: Is the observed effect really caused by our intervention and not by another factor?
- External: Can the results be generalized to other populations, contexts, or times?
- Construct: Do our measurement tools (metrics, questionnaires) actually measure what we intend to measure?
- Statistical: Are the statistical conclusions correct (e.g., did we have sufficient power, did we handle multiple testing correctly)?
The Role of AI: A Tool, Not an Authority
Large language models are powerful accelerators, but their use requires discipline. Risks such as hallucinations and perpetuating biases are well-documented (see Bibliography: Weidinger et al., 2021). On my team, LLMs serve as an assistant and a critical sparring partner. They support brainstorming, literature organization, and code drafting, but every output and suggestion is verified by a human: through tests, replication, and source review. We treat them as productivity tools, not as oracles.
Example of an AI usage declaration:
A large language model (GPT-4, August 2025 version) was used in the preparation of this manuscript to support the following tasks: (1) stylistic and grammatical correction, (2) generating draft R code for data visualization. All generated code snippets were manually verified, tested, and refactored by the authors. The model was not involved in data analysis, formulating conclusions, or making decisions about the inclusion/exclusion of studies in the systematic review. The prompts used, including the system prompt, and key parameters (temperature=0.5, top_p=1.0) have been archived in the project repository.
Checklist and the Culture of Openness
This framework is a living document. At the end of each project, however, it’s worth conducting a simple rigor test. The following checklist, though not exhaustive, helps maintain a high standard and aligns with the ongoing shift in scientific culture towards full transparency, manifested in formats like Registered Reports.
- Transparency: Is there a pre-registration or public protocol? Where?
- Replicability: Are the code, data, environment, and random seeds shared (repository + DOI, license, instructions)?
- Statistical Rigor: Are metrics reported with measures of uncertainty (e.g., confidence/credibility intervals)? Were sensitivity analyses conducted?
- Self-criticism: Does the paper include an explicit “Threats to Validity” and/or “Limitations” section?
- Tool Accountability: Is the role of AI and the methods for its verification clearly described?
Bibliography and Further Reading
- Cham, J., & Whiteson, D. (2017). We Have No Idea: A Guide to the Unknown Universe. Riverhead Books.
- Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., … & Gabriel, I. (2021). Ethical and social risks of harm from Language Models. arXiv. https://doi.org/10.48550/arXiv.2112.04359
- Willis, L. D. (2023). Formulating the Research Question and Framing the Hypothesis (FINER, PICO). PubMed
- Mark Xu (2022). How to do theoretical research, a personal perspective. LessWrong
- Cory-Wright, R., Cornelio, C., Dash, S., El Khadir, B., … Horesh, L. (2024). Evolving scientific discovery by unifying data and background knowledge with AI Hilbert. Nature Communications
- What is the difference between research methods, research methodology and research approach? Quora (Q&A)
- Nikola Balić (2025). AI Ate Its Own Tail, and I Learned Something About Writing. nibzard.com (blog)
- Robort Gabriel (2025). AI Tools for Productivity: Boost Workflow & Output. DEV Community
- hashcollision (Substack). Merging with AI. Substack (essay)
- What are the types of scientific research? Cochrane
Note: Treat items from blogs/Q&A as inspirations and commentaries, not as evidentiary sources. It is better to support ‘hard’ claims in a scientific text with methodological/peer-reviewed literature.