机器学习环境配置

以前在学习线性回归算法的时候主要使用 Octave 作为主要编程语言,现在想来无论是画图还是矩阵运算不如Python的numpy、matplotlib等方便快捷,现主要记录下通过Anaconda与Jupyter NoteBook搭建机器学习环境的过程。

目前我只在 Ubuntu22.04 下搭建了 Python + Anaconda + Jupyter Notebook 的环境,首先下载这个:

wget https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.sh

需要配置的环境变量,因为直接安装在root目录下的,如果需要其他目录在安装是可以选择的:

export ANACONDA_HOME=/root/anaconda3
export PATH=$ANACONDA_HOME/bin:$PATH

测试一下,安装已经完成:

root@PC:~# conda --v
conda 23.3.1

查看包括版本的更多信息:

conda info

现在需要熟悉Anaconda的常用命令,因为要使用Anaconda来管理Python的依赖包,所以为了方便管理,Anaconda有了环境的概念,每个环境都是独立且相互隔离的。

安装 Anaconda

首先如果是SSH远程链接的话,因为使用的是虚拟终端,如果之前的某次ssh连接命令激活某个环境后忘记退出环境就关闭终端就会导致 CommandNotFoundError: Your shell has not been properly configured to use ‘conda activate’. 的错误,所以如果是远程虚拟终端先重新进入虚拟环境就没问题:

source activate

下面是一些最常用的Anaconda关于环境管理的命令

创建环境:

conda create --name env_name

# 如果需要指定Python环境的版本为3.6
conda create --name env_name python=3.6

激活/切换到该环境:

conda activate env_name

退出该环境:

conda deactivate

列出全部环境:

conda env list

查看当前环境安装的包:

conda list

查看别的环境安装的包:

conda list -n env_name  

删除环境:

conda remove --name env_name --all

克隆环境:

conda create --name clone_env --clone env_name

导出环境配置:

conda env export > my_environment.yml

导入环境配置:

conda env create -f my_environment.yml

安装依赖包,比如 numpy:

conda install numpy

也可以切换到对应环境后使用 pip install numpy。

安装 Jupyter Notebook

conda install jupyter notebook

jupyter_notebook_config.py 是 Jupyter Notebook 的配置文件,NotebookApp.notebook_dir 定义了主目录文件夹。下面对 Jupyter Notebook 进行美化一下,谁不想拥有一个好看的工具呢,jupyterthemes 是专门用于Jupyter Notebook 的主题美化工具,下面是安装命令以及我的配置清单:

https://github.com/dunovank/jupyter-themes

# install jupyterthemes
pip install jupyterthemes -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

# upgrade to latest version
pip install --upgrade jupyterthemes

# 我的自定义配置1
jt -t grade3 -f fira -fs 16 -cellw 90% -ofs 11 -dfs 11 -T

# 我的自定义配置2
jt -t onedork -f roboto -fs 14 -nfs 14 -tfs 14 -ofs 11

安装完之后生成配置文件:

jupyter notebook --generate-config

默认生成的文件配置在 ~/.jupyter/jupyter_notebook_config.py 配置里常用修改的项目如下,无非就是IP、端口、默认工作目录之类的:

c.NotebookApp.ip ='0.0.0.0'

c.NotebookApp.port = 8888

c.NotebookApp.notebook_dir = ''

安装完成之后启动服务即可:

jupyter notebook --ip=0.0.0.0 --no-browser --allow-root

如果需要后台运行:

nohup jupyter notebook --ip=0.0.0.0 --allow-root > jupyter.log 2>&1 &

Jupyter Notebook 快捷键

下面列出一些Jupyter Notebook常用操作的快捷键:

ESC(命令)模式下:

操作 快捷键
添加一行 B
删除一行 DD
剪贴 X
粘贴 V
复制 C
执行当前行 Ctrl + Enter / Command + Enter

安装常用库 —— pandas\numpy\matplotlib

scikit-learn 是 Python语言中专门针对机器学习应用而发展起来的一款开源框架(算法库),可以实现数据预处理、分类、回归、降维、模型选择等常用的机器学习算法。

scikit-learn 特点:

集成了机器学习中各类成熟的算法,容易安装和使用,样例丰富,教程和文档也非常详细

不支持Python之外的语言,不支持深度学习和强化学习 https://scikit-learn.org/stable/index

conda install pandas

conda install numpy

conda install matplotlib

conda install scikit-learn

一切准备就绪,开始愉快Coding!

通过Frp外网访问

外网访问的主要途径还是通过FRP做内网穿透,虽然运营商提供了IPV6,但是有时候身处的环境并不支持IPV6,所以FRP成了首选。这样随时Coding的目的终于要实现了呀!在服务器上部署一个在线 Jupyter Notebook 服务,这样无论处在什么环境中,只需要一台可以上网,具有浏览器的设备,你都可以使用Jupyter Notebook 编辑并运行代码!是不是非常心动?

虽然很多FRP服务商就支持HTTPS,可以直接通过Frp Client的https插件来完成,但是却发现一个问题,那就是如果直接拿Frp Client的https插件来做这件事,内核会不断重启,根本无法使用,

[I 20:11:50.384 NotebookApp] Kernel started: d8d74f6b-d1bc-4d45-909e-6eef6e9608d6, name: python3
[I 20:11:50.916 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:52.019 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:52.020 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:53.118 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:53.119 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:54.215 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:54.216 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:55.326 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:55.327 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:56.418 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:56.419 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:57.509 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:57.511 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:58.610 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:11:58.611 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[W 20:12:06.176 NotebookApp] Notebook Course_MachinaLearn/experiment01.ipynb is not trusted
[W 20:12:06.177 NotebookApp] Trusting notebook /Course_MachinaLearn/experiment01.ipynb
[I 20:12:06.540 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:12:06.541 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:a190232b48c1448294552727d7c27f81
[I 20:12:07.528 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:08.858 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:08.859 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:09.144 NotebookApp] 302 GET /edit/cat (127.0.0.1) 0.670000ms
[I 20:12:09.984 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:09.985 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:11.079 NotebookApp] Restoring connection for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b
[I 20:12:11.080 NotebookApp] Starting buffering for d8d74f6b-d1bc-4d45-909e-6eef6e9608d6:51ed9123212c4bbcb7850248c3f9951b

最终发现,还是使用Nginx吧,通过Nginx来完成反向代理服务和SSL校验,参考了这篇回答 《Jupyter notebook keeps reconnecting to kernel》 ,但是这篇回答并没有涉及到配置SSL的坑,所以呀,一定不要让FRP Client来做SSL校验,还是得上Nginx!下面贴出我的nginx.conf:

#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    map $http_upgrade $connection_upgrade {
    	default upgrade;
    	'' close;
    }

    server {
    	listen 10086 ssl;
	server_name jypyter;

        ssl_certificate      /xxxx/yyyy.crt;
        ssl_certificate_key  /xxxx/yyyy.key;

        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  5m;

        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;
		
	location / {
    	    proxy_pass http://127.0.0.1:8888;
	    proxy_set_header X-Real-IP $remote_addr;
    	    proxy_set_header Host $host;
	    proxy_set_header X-Forwarded_For $proxy_add_x_forwarded_for;
	    proxy_set_header X-NginX-Proxy true;
	    auth_basic "Restricted Content";
	
	    proxy_http_version 1.1;
	    proxy_set_header Upgrade $http_upgrade;
	    proxy_set_header Connection $connection_upgrade;
	    proxy_set_header Origin "";
            proxy_read_timeout 86400;
	}
    }
}

配置好以后使用Nginx重载一下配置文件就好:

nginx -s reload