mirror of https://github.com/blackboxprogramming/simulation-theory.git synced 2026-03-17 06:57:15 -05:00

Files

copilot-swe-agent[bot] 6879279cdd Add scrapers for arXiv, Wikipedia, and OEIS

Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com>

2026-02-25 18:20:10 +00:00

2.2 KiB

Raw Blame History

Scrapers

Python web scrapers for collecting data relevant to the simulation-theory research repository.

Scrapers

Script	Source	Topics
`arxiv_scraper.py`	arXiv	Simulation hypothesis, Gödel incompleteness, Riemann zeta, qutrit/ternary quantum, halting problem, IIT consciousness
`wikipedia_scraper.py`	Wikipedia	SHA-256, Riemann hypothesis, quantum computing, Euler's identity, fine-structure constant, Turing machine, DNA, Blockchain
`oeis_scraper.py`	OEIS	Prime numbers, Fibonacci, pi digits, Euler–Mascheroni constant, Catalan numbers, partition numbers

Setup

pip install -r requirements.txt

Usage

arXiv scraper

# Use default topic list
python arxiv_scraper.py

# Custom query, limit to 3 results per query
python arxiv_scraper.py --query "Riemann hypothesis zeros" --max 3

# Save to file
python arxiv_scraper.py --output arxiv_results.json

Wikipedia scraper

# Use default topic list
python wikipedia_scraper.py

# Custom topics
python wikipedia_scraper.py --topics "Riemann hypothesis" "SHA-2" "Turing machine"

# Save to file
python wikipedia_scraper.py --output wikipedia_results.json

OEIS scraper

# Use default sequence list
python oeis_scraper.py

# Custom sequence IDs
python oeis_scraper.py --ids A000040 A000045 A000796

# Save to file
python oeis_scraper.py --output oeis_results.json

Output format

All scrapers output JSON to stdout by default, or to a file with --output.

arXiv — dict keyed by query, each value is a list of:

{
  "title": "...",
  "authors": ["..."],
  "published": "2024-01-01T00:00:00Z",
  "abstract": "...",
  "url": "https://arxiv.org/abs/..."
}

Wikipedia — list of:

{
  "topic": "SHA-2",
  "title": "SHA-2",
  "url": "https://en.wikipedia.org/wiki/SHA-2",
  "summary": "..."
}

OEIS — list of:

{
  "id": "A000040",
  "name": "The prime numbers.",
  "description": "...",
  "values": ["2", "3", "5", "7", "11", "..."],
  "url": "https://oeis.org/A000040"
}

2.2 KiB Raw Blame History Unescape Escape

Scrapers

Scrapers

Setup

Usage

arXiv scraper

Wikipedia scraper

OEIS scraper

Output format

2.2 KiB

Raw Blame History