Skip to content

CLI Reference

mosaic [OPTIONS] COMMAND [ARGS]...

Global options

OptionShortDescription
--version-vPrint the installed version and exit
--verboseShow per-source warnings (hidden by default)
--helpShow help and exit

--verbose must appear before the subcommand:

bash
mosaic --version           # e.g. mosaic 0.0.5
mosaic -v
mosaic --verbose search "protein folding"   # show source warnings

Commands

Search for papers across all configured sources.

mosaic search [OPTIONS] QUERY
OptionShortTypeDefaultDescription
--max-nint10Max results per source
--download-dflagoffDownload available PDFs after search
--oa-onlyflagoffShow only open-access papers
--pdf-onlyflagoffShow only papers with a known PDF URL
--source-sstrallLimit to one source — tab-completes all shorthands
--year-ystrYear filter (see formats below)
--author-astrAuthor filter, repeatable
--journal-jstrJournal name substring filter
--field-fstrallScope query to title, abstract, or all — tab-completes
--raw-querystrRaw query sent directly to APIs, bypasses all transforms
--output-opathSave results to file (repeatable); format from extension: .md, .markdown, .csv, .json, .bib, .ris
--download-dirpathconfigOverride PDF download directory for this run only
--sortstrSort results: citations, year, or relevance — tab-completes
--statsflagoffPrint per-source counts and deduplication stats
--cachedflagoffSearch only the local cache — no network requests
--prefer-cacheflagoffSubstitute rich cached records for freshly fetched ones (see Cache Management)
--zoteroflagoffExport results to Zotero
--zotero-collectionstrZotero collection name (created if missing)
--zotero-localflagoffForce local API even when a web API key is configured

Source shorthands for --source:

ShorthandSourceRequires
arxivarXiv
ssSemantic Scholar— (API key optional)
sdScienceDirectAPI key or browser session
spSpringer Nature (browser)Playwright ([browser] extra)
springerSpringer Nature (API)Free API key
doajDOAJ
epmcEurope PMC
oaOpenAlex— (email optional)
baseBASE
coreCOREFree API key
adsNASA ADSFree API token
ieeeIEEE XploreFree API key
zenodoZenodo— (access token optional)
crossrefCrossref— (email optional)
dblpDBLP
halHAL
pubmedPubMed— (API key optional)
pmcPubMed Central— (API key optional)
rxivbioRxiv / medRxiv
scopusScopusAPI key or browser session

--year / -y formats:

FormatExampleMeaning
Single year2020Exact year
Range2018-2022Inclusive range
List2019,2021,2023Specific years only

--author / -a behaviour:

  • Case-insensitive substring match against any author name in the paper
  • Repeat the flag for multiple authors — paper must match at least one
  • Example: -a Hinton -a LeCun returns papers authored by either

--journal / -j behaviour:

  • Case-insensitive substring match against the journal name
  • Example: -j "Nature" matches Nature, Nature Communications, Nature Methods, etc.

Filter application:

Each filter is applied at the source API level where supported, then as a post-processing step on all returned results:

SourceYearAuthorJournal
arXiv✓ native✓ native✓ native
Semantic Scholar✓ nativepost-processpost-process
ScienceDirect✓ native✓ native✓ native
Europe PMC✓ native✓ native✓ native
DOAJ✓ native✓ native✓ native
OpenAlex✓ nativepost-processpost-process
BASE✓ native✓ native✓ native
CORE✓ native✓ native✓ native
NASA ADS✓ nativepost-processpost-process
bioRxiv/medRxiv✓ native✓ nativepost-process

--sort behaviour:

  • citations: sort by citation count descending; adds a Cited column to the results table
  • year: sort by publication year descending (newest first)
  • relevance: re-score every paper against the query and sort by score descending; adds a Rel. column (0.00 – 1.00). Uses BM25 by default; upgrades to LLM scoring when [llm] is configured. See the Relevance Ranking guide for setup.

--field / -f behaviour:

  • all (default): query is sent as a general full-text search to each source
  • title: scopes the query to the title field using each source's native syntax
  • abstract: scopes the query to the abstract field using each source's native syntax

--raw-query behaviour:

  • Sent verbatim to every queried source, bypassing all field/author/journal transforms
  • Useful for power-users who know each source's query language (e.g. arXiv's ti: prefixes, Lucene syntax for BASE/DOAJ/CORE)
  • Note: year filter (-y) is still applied as post-processing even when --raw-query is set

--output / -o formats:

Format is inferred from the file extension:

ExtensionFormatContents
.mdMarkdown tableCompact summary: title, authors, year, DOI, source, OA, PDF
.markdownMarkdown sectionsOne ## subsection per paper with a full-field key/value table; empty fields omitted
.csvCSVAll fields; authors joined with ;; opens in Excel / Google Sheets
.jsonJSON arrayAll fields as a JSON list; authors as a native array; suitable for scripting
.bibBibTeX@article for journal papers, @misc for preprints; includes eprint/eprinttype for arXiv, abstract, pdf, OA note
.risRISStandard interchange format; JOUR/GEN type; one AU per author; SP/EP pages; compatible with EndNote, Mendeley, RefWorks, Papers

Examples:

bash
# Search all sources
mosaic search "protein folding"

# 25 results per source, open-access only
mosaic search "deep learning" -n 25 --oa-only

# Filter by year range
mosaic search "diffusion models" -y 2020-2023

# Relevance-ranked results (BM25 or LLM — see Relevance Ranking guide)
mosaic search "transformer attention" --sort relevance

# Filter by exact year and author
mosaic search "attention" -y 2017 -a Vaswani

# Filter by journal (substring)
mosaic search "CRISPR" -j "Nature" -y 2021-2023

# Multiple authors (OR), single source, download
mosaic search "graph neural" -a Kipf -a Velickovic --source ss --download

# Search arXiv only, download PDFs
mosaic search "diffusion models" --source arxiv --download

# Scope query to title only
mosaic search "attention mechanism" --field title

# Scope query to abstract only (shorter synonym)
mosaic search "CRISPR off-target" -f abstract --source epmc -n 50

# Power-user raw query (arXiv native syntax)
mosaic search "" --raw-query "ti:transformers AND au:Vaswani" --source arxiv

# Save results to Markdown (summary table)
mosaic search "protein folding" -n 20 --output results.md

# Save results to Markdown (one subsection per paper, all fields)
mosaic search "protein folding" -n 20 --output results.markdown

# Save to BibTeX for import into Zotero / JabRef / LaTeX
mosaic search "diffusion models" -y 2023-2025 --oa-only --output refs.bib

# Save to CSV for Excel / Sheets
mosaic search "CRISPR" -j "Nature" --output crispr.csv

# Save full metadata as JSON
mosaic search "attention mechanism" --output attention.json

# Save to multiple formats in one search
mosaic search "diffusion models" -y 2023-2025 --oa-only \
  --output results.md --output refs.bib --output results.json

Output formats demo


similar

Find papers related to a given DOI or arXiv ID.

mosaic similar [OPTIONS] IDENTIFIER
OptionShortTypeDefaultDescription
--max-nint10Max similar papers to return
--download-dflagoffDownload available PDFs
--oa-onlyflagoffShow only open-access papers
--pdf-onlyflagoffShow only papers with a known PDF URL
--sortstrSort: citations, year, or relevance — tab-completes
--output-opathSave results to file (repeatable)
--download-dirpathconfigOverride PDF download directory
--zoteroflagoffExport results to Zotero
--zotero-collectionstrZotero collection name (created if missing)
--zotero-localflagoffForce local API even when a web API key is configured

IDENTIFIER accepts the same formats as mosaic get: a bare DOI, doi:10.xxx, DOI:10.xxx, arxiv:NNNN.NNNNN, or ARXIV:NNNN.NNNNN.

Sources used:

  • OpenAlex related_works — always queried; no key required.
  • Semantic Scholar recommendations — queried when ss-key is set in config.

See the Find Similar Papers guide for a full walkthrough and workflow examples.

Examples:

bash
mosaic similar 10.48550/arXiv.1706.03762
mosaic similar arxiv:1706.03762 -n 20 --sort citations
mosaic similar 10.1038/s41586-021-03819-2 --oa-only --download

get

Download a paper by DOI, or bulk-download all DOIs from a BibTeX or CSV file.

mosaic get [OPTIONS] [DOI]
OptionTypeDefaultDescription
--frompathBibTeX (.bib) or CSV (.csv) file containing DOIs to bulk-download
--oa-onlyflagoffIn bulk mode: treat unresolvable papers as skipped rather than failed
--download-dirpathconfigOverride PDF download directory for this run
--zoteroflagoffExport downloaded paper(s) to Zotero
--zotero-collectionstrZotero collection name (created if missing)
--zotero-localflagoffForce local API even when a web API key is configured

Provide either a DOI positional argument (single download) or --from <file> (bulk download) — not both.

Cache-first lookup: Before hitting Unpaywall or a browser session, MOSAIC checks the local cache for the DOI. If the paper was seen in a previous search and a PDF URL is already known, the download starts immediately without any network resolution step. A dim confirmation line is printed when the cache is used.

Bulk mode behaviour:

  • For .bib files: extracts all doi = {…} fields (case-insensitive, no extra dependency)
  • For .csv files: reads the doi column (case-insensitive header)
  • Duplicate DOIs within the file are downloaded only once
  • Entries without a DOI are silently skipped
  • Prints a per-entry result line and a final summary: N downloaded, M failed, K skipped

Examples:

bash
# Single DOI
mosaic get 10.48550/arXiv.1706.03762

# Bulk from BibTeX — download all resolvable PDFs
mosaic get --from refs.bib

# Bulk from CSV, skip non-OA entries instead of counting them as failures
mosaic get --from references.csv --oa-only

# Override download directory for this run
mosaic get --from refs.bib --download-dir ~/papers

index

Build or update the local vector index for semantic search and RAG.

mosaic index [OPTIONS]
OptionShortTypeDefaultDescription
--reindexflagoffRe-embed all papers, even already-indexed ones (required after changing the embedding model)
--query-qstrEmbed only papers matching this cache query
--frompathEmbed only papers listed in a .bib or .csv file
--batch-sizeint96Texts sent per embedding API call

Requires sqlite-vec: pipx inject mosaic-search sqlite-vec. See the RAG & Literature Analysis guide for full setup instructions.

Examples:

bash
# Index all cached papers
mosaic index

# Re-index after switching embedding model
mosaic index --reindex

# Index only a topic subset
mosaic index --query "protein folding"

# Index papers from a BibTeX file
mosaic index --from refs.bib

ask

Ask a question about your indexed papers using RAG.

mosaic ask [OPTIONS] QUESTION
OptionShortTypeDefaultDescription
--modestrsynthesisPrompt mode: synthesis, gaps, compare, extract
--query-qstrPre-filter: restrict retrieval to papers matching this query
--frompathPre-filter: restrict retrieval to papers from a .bib/.csv
--year-ystrYear filter (same formats as search)
--top-nintconfigOverride rag.top_k for this query
--output-opathSave answer to .md or .json
--show-sourcesflagoffPrint retrieved papers before the answer

Prompt modes:

ModeWhat it produces
synthesisComprehensive state-of-the-art summary with [n] citations
gapsOpen problems, contradictions, and methodological limitations
compareStructured comparison: methods, datasets, metrics, results, trade-offs
extractPer-paper structured extraction: Task, Method, Dataset, Metric, Key Result

Examples:

bash
mosaic ask "What are the main approaches to neural machine translation?"
mosaic ask "Open problems in protein structure prediction" --mode gaps
mosaic ask "transformer vs LSTM" --mode compare
mosaic ask "diffusion model scaling" --year 2023-2025
mosaic ask "RLHF" -n 20 --show-sources
mosaic ask "attention mechanisms" --output analysis.md
mosaic ask "CRISPR" --output crispr.json

chat

Interactive multi-turn RAG session over your cached papers.

mosaic chat [OPTIONS]
OptionShortTypeDefaultDescription
--query-qstrNarrow retrieval to papers matching this query
--frompathNarrow retrieval to papers from a .bib/.csv
--modestrsynthesisDefault prompt mode for the session

In-session commands:

CommandEffect
/mode <mode>Switch prompt mode (synthesis, gaps, compare, extract)
/sourcesShow the retrieval pool for the current session
/clearClear conversation history
/quitExit the chat

Examples:

bash
mosaic chat
mosaic chat -q "protein folding"
mosaic chat --mode gaps
mosaic chat --from refs.bib

ui

Launch the MOSAIC web interface in your browser.

mosaic ui [OPTIONS]
OptionTypeDefaultDescription
--hoststr127.0.0.1Bind address
--portint5555Port number
--no-browserflagoffDon't auto-open the browser
--debugflagoffUse Flask dev server with hot-reload

Requires the ui extra: pip install 'mosaic-search[ui]'.

By default, the server uses Waitress (production-grade, multi-threaded). Pass --debug to use Flask's built-in dev server with hot-reload for development.

bash
mosaic ui                          # default: http://127.0.0.1:5555
mosaic ui --port 8080              # custom port
mosaic ui --host 0.0.0.0           # accessible on LAN
mosaic ui --debug                  # development mode

See the Web UI guide for a full walkthrough.


config

View or update MOSAIC configuration.

mosaic config [OPTIONS]
OptionTypeDescription
--showflagPrint current config as JSON
API keys
--elsevier-key TEXTstrSet Elsevier/ScienceDirect API key
--ss-key TEXTstrSet Semantic Scholar API key
--ncbi-key TEXTstrSet NCBI API key (PubMed + PMC)
--core-key TEXTstrSet CORE API key
--ads-key TEXTstrSet NASA ADS API key
--ieee-key TEXTstrSet IEEE Xplore API key
--springer-key TEXTstrSet Springer Nature API key
--scopus-key TEXTstrSet Scopus API key
--scopus-inst-token TEXTstrSet Scopus institutional token
--zenodo-key TEXTstrSet Zenodo API key / token
--zotero-key TEXTstrSet Zotero API key (web API); auto-discovers and caches user ID
--unpaywall-email TEXTstrSet Unpaywall email
Sources
--enable-source TEXTstrEnable a source by name (repeatable)
--disable-source TEXTstrDisable a source by name (repeatable)
Downloads
--download-dir TEXTstrSet PDF download directory
--db-path TEXTstrSet SQLite cache path
--filename-pattern TEXTstrSet PDF filename pattern (see below)
--rate-limit-delay FLOATfloatSet default delay between API calls in seconds
Obsidian
--obsidian-vault TEXTstrSet Obsidian vault path
--obsidian-subfolder TEXTstrSet subfolder inside vault for paper notes
--obsidian-filename-pattern TEXTstrSet Obsidian note filename pattern
--obsidian-tag TEXTstrSet Obsidian tags (repeatable, replaces existing list)
--obsidian-wikilinks / --no-obsidian-wikilinksboolUse [[wikilinks]] in generated notes
PEDro
--pedro-fair-use / --no-pedro-fair-useboolAcknowledge PEDro fair-use policy (required to enable source)
--pedro-fetch-details / --no-pedro-fetch-detailsboolFetch detail pages for richer metadata (slower)
--pedro-rate-limit-delay FLOATfloatDelay between PEDro requests in seconds (default: 3.0)
LLM
--llm-provider TEXTstrLLM provider for relevance ranking: openai or anthropic
--llm-api-key TEXTstrAPI key for the LLM provider (any string for local servers)
--llm-model TEXTstrModel name; defaults to gpt-4o-mini (openai) or claude-haiku-4-5-20251001 (anthropic)
--llm-base-url TEXTstrBase URL for a local OpenAI-compatible server (e.g. http://localhost:11434/v1)
RAG / Embeddings
--embedding-model TEXTstrEmbedding model name (e.g. snowflake-arctic-embed2, text-embedding-3-small)
--embedding-base-url TEXTstrBase URL for the embedding server (e.g. http://localhost:11434/v1)
--embedding-api-key TEXTstrAPI key for the embedding server (any string for local servers)
--rag-top-k INTintNumber of papers retrieved per RAG query (default: 10)
--rag-auto-index / --no-rag-auto-indexboolAuto-index new papers after each search/get run

--filename-pattern placeholders:

PlaceholderValue
{year}Publication year (or 0000 if unknown)
{source}Source name (e.g. arXiv, DOAJ)
{author}First author's last name
{title}Title slug, truncated to 60 characters
{doi}DOI with special characters replaced by _
{journal}Journal name slug (or no_journal if unknown)

The default pattern is {year}_{source}_{author}_{title}, which produces filenames like 2017_arXiv_Vaswani_Attention_Is_All_You_Need.pdf.

Examples:

bash
# Show current config
mosaic config --show

# Set multiple values at once
mosaic config --unpaywall-email me@uni.edu --download-dir ~/papers

# API keys
mosaic config --elsevier-key abc123def456
mosaic config --ncbi-key YOUR_KEY          # sets both PubMed and PMC
mosaic config --core-key YOUR_KEY
mosaic config --ads-key YOUR_KEY
mosaic config --ieee-key YOUR_KEY
mosaic config --springer-key YOUR_KEY
mosaic config --scopus-key YOUR_KEY --scopus-inst-token YOUR_INST_TOKEN
mosaic config --zenodo-key YOUR_TOKEN

# Enable / disable sources
mosaic config --disable-source dblp --disable-source hal
mosaic config --enable-source dblp

# Downloads
mosaic config --filename-pattern "{author}_{year}_{title}"
mosaic config --rate-limit-delay 0.5
mosaic config --db-path ~/mydata/mosaic.db

# Obsidian integration
mosaic config --obsidian-vault ~/Documents/MyVault --obsidian-subfolder literature
mosaic config --obsidian-tag paper --obsidian-tag science
mosaic config --no-obsidian-wikilinks

# Configure LLM relevance scoring — cloud OpenAI
mosaic config --llm-provider openai --llm-api-key sk-... --llm-model gpt-4o-mini

# Configure LLM relevance scoring — local Ollama
mosaic config --llm-provider openai \
              --llm-base-url http://localhost:11434/v1 \
              --llm-api-key ollama \
              --llm-model llama3.2

notebook create

Create a Google NotebookLM notebook from a search query or a directory of PDFs.

mosaic notebook create [OPTIONS] NAME

Requires the [notebooklm] extra — see NotebookLM Integration.

OptionShortTypeDefaultDescription
--query-qstrSearch query to populate the notebook
--from-dirpathImport all PDFs from this directory
--max-nint10Max results per source (with --query)
--oa-onlyflagoffOnly include open-access papers
--pdf-onlyflagoffOnly include papers with a known PDF URL
--year-ystrYear filter (same formats as search)
--author-astrAuthor filter, repeatable
--journal-jstrJournal name substring filter
--field-fstrallScope query to title, abstract, or all — tab-completes
--raw-querystrRaw query sent directly to APIs, bypasses all transforms
--download-dirpathconfigOverride PDF download directory for this run only
--podcastflagoffQueue an Audio Overview after import
--videoflagoffQueue a Video Overview after import
--briefingflagoffQueue a Briefing Doc after import
--study-guideflagoffQueue a Study Guide after import
--quizflagoffQueue a Quiz after import
--flashcardsflagoffQueue Flashcards after import
--infographicflagoffQueue an Infographic after import
--slide-deckflagoffQueue a Slide Deck after import
--data-tableflagoffQueue a Data Table after import
--mind-mapflagoffQueue a Mind Map after import

--query and --from-dir are mutually exclusive; exactly one must be provided. Filters (-y, -a, -j, -f, --raw-query) only apply when using --query. --oa-only and --pdf-only apply in both modes.

Examples:

bash
# Search, download, and import into a new notebook
mosaic notebook create "Transformers" --query "attention is all you need" --oa-only

# Queue an Audio Overview (podcast) after import
mosaic notebook create "AMR-GPU" --query "adaptive mesh refinement gpu" -y 2024-2026 --oa-only --podcast

# Queue multiple artifacts at once
mosaic notebook create "CRISPR 2024" --query "CRISPR gene editing" --oa-only \
  --briefing --quiz --mind-map

# Filter by author and journal
mosaic notebook create "Hinton Papers" --query "deep learning" -a Hinton -j "Nature" --oa-only

# Import PDFs you already have locally
mosaic notebook create "My Papers" --from-dir ~/mosaic-papers/

# Import local PDFs and queue a slide deck
mosaic notebook create "My Papers" --from-dir ~/mosaic-papers/ --slide-deck

auth login

Open a browser, log in to a site, and save the session for future PDF downloads.

mosaic auth login [OPTIONS] NAME
Argument / OptionTypeDescription
NAMEstrSession label — tab-completes common providers: elsevier, springer, scopus
--url / -ustrURL to open in the browser (required)

Requires the [browser] extra — see Authenticated Access.

MOSAIC tries browsers in order: Chromium → Firefox → WebKit.

Examples:

bash
mosaic auth login elsevier --url https://www.sciencedirect.com/user/login
mosaic auth login myuni    --url https://library.myuni.edu/login

auth status

List all saved browser sessions.

bash
mosaic auth status

auth logout

Remove a saved browser session.

mosaic auth logout [OPTIONS] NAME
ArgumentDescription
NAMESession name to remove — tab-completes from your saved sessions
bash
mosaic auth logout elsevier

cache stats

Print a summary of the local cache.

bash
mosaic cache stats

cache list

List cached papers, newest-first.

mosaic cache list [OPTIONS]
OptionShortTypeDefaultDescription
--limit-nint20Max papers to show
--offsetint0Skip this many rows (pagination)
--query-qstrSubstring filter on title and abstract

cache show

Show the full cached record for a paper identified by DOI or arXiv ID.

mosaic cache show IDENTIFIER

IDENTIFIER accepts the same formats as mosaic get: bare DOI, doi:10.xxx, arxiv:NNNN.NNNNN, etc.


cache verify

Check whether each tracked PDF file still exists on disk.

bash
mosaic cache verify

Prints a line per tracked download with a ✓ / ✗ indicator and a final count of missing files.


cache clean

Remove download records whose PDF files no longer exist on disk.

bash
mosaic cache clean

Only records with status=ok are checked. Paper metadata is never deleted.


cache clear

Wipe all papers, downloads, searches, and exports from the cache.

mosaic cache clear [OPTIONS]
OptionDescription
--yesSkip the confirmation prompt

PDF files already on disk are not deleted.


cache export

Bulk-export all cached papers to a file. Format is inferred from the extension.

mosaic cache export [OPTIONS] OUTPUT
OptionShortTypeDefaultDescription
--query-qstrSubstring filter before exporting

Supported extensions: .csv, .json, .bib, .md, .markdown — same semantics as --output on search.


Exit codes

CodeMeaning
0Success
1Invalid argument or unknown source