简介
ArchiveBox 是一个用Python编写的自托管且功能强大的互联网存档解决方案,是可用于Linux、macOS和Windows系统的跨平台工具。
它使您能够收集、保存和查看要脱机保存的站点,当前ArchiveBox可以设置为命令行工具、桌面应用程序或通过web访问,可以把你想静态化的任何网站进行静态化,包括文本、图片、PDF 甚至视频。
Github地址:https://github.com/ArchiveBox/ArchiveBox/
前期准备
由于pip命令无法使用root权限运行,需要添加一个普通带sudo权限的账号:
adduser archivebox && usermod -a archivebox -G sudo && su archivebox
安装
一键安装
curl -sSL 'https://get.archivebox.io' | sh
手动安装
这边以Ubuntu为例,其他系统可以参考:官方手动安装文档,更好的方式还是Docker。
安装依赖
sudo apt install python3 python3-pip python3-distutils git wget curl youtube-dl
sudo apt install chromium-browser
安装archivebox
python3 -m pip install --upgrade archivebox
警告
WARNING: The script sqlformat is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script pygmentize is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script normalizer is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script django-admin is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts ipython and ipython3 are installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script dateparser-download is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script archivebox is installed in '/home/allen/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
解决方案:
执行命令:
echo 'export PATH=/home/allen/.local/bin:$PATH' >>~/.bashrc
将黄色警告部分提示的路径复制添加在
export PATH=
后面的, 你需要把你的黄色警告提示的路径复制粘贴替换.然后再重新安装:
python3 -m pip install --upgrade archivebox
运行
初始化:
mkdir /home/allen/data && cd /home/allen/data
archivebox init
创建管理员账户:
archivebox manage createsuperuser
我的密码设置太简单出现红色的警告。
启动服务:
archivebox server 0.0.0.0:8000
浏览器打开,正常访问。
点击上面的 ADD
,添加 URL
地址:
等待抓取:
一段时间后可以看到抓取成功:
扩展
反向代理
Nginx的简单配置:
server {
listen 80;
listen [::]:80;
server_name archivebox.yydnas.cn;
index index.php index.html index.htm;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header REMOTE-HOST $remote_addr;
}
}
后台运行
程序默认是在终端中运行,最简单的是运行以下命令:
nohup archivebox server 0.0.0.0:8000 &> /dev/null &
也可以创建一个名为 start-archivebox.sh
,放在你的 archivebox
目录,内容如下:
#!/bin/bash
ps -aux | grep "archivebox server" | grep -v grep > /dev/null
if [ "${?}" == "0" ]; then
# echo archivebox is running
exit 1
fi
ABPath=/home/allen/data #替换为你的安装目录
ABPort=8000
if [ -f ${ABPath}/ArchiveBox.conf ]; then
cd ${ABPath}
nohup archivebox server 0.0.0.0:${ABPort} &> /dev/null &
exit 0
fi
exit 2
运行: bash start-archivebox.sh
。
这个是参考的知乎上面的一篇文章开源的私人档案馆ArchiveBox简介,及二段补强
最后
这只是最简单的安装,更多的使用方法请查阅 ArchiveBox Usage。
不过这个程序好像无法设置语言,默认就是英文界面,但是由于界面元素不多,正常使用肯定是没有问题的。