skip to content
Logo
Yurikago Blog
Table of Contents

Pythonの自然言語処理ライブラリであるNLTKをAlpineLinux環境で実行する方法をまとめました。

検証環境

  • Docker Toolbox 18.09.3
  • VirtualBox 5.2.20
    • Python 3.7.5
    • NLTK 3.4.5

検証リポジトリ: https://github.com/cuavv/sandbox-nltk

ディレクトリ構造

/
- docker
- Dockerfile
- src
- test.py
- docker-compose-yml

Dcokerfile

FROM python:3.7.5-alpine
# https://www.nltk.org/data.html
# > The downloader will search for an existing nltk_data directory to install NLTK data.
RUN mkdir /usr/share/nltk_data
RUN pip3 install nltk==3.4.5 && python3 -m nltk.downloader all

python3 -m nltk.downloader all を実行してダウンロードするライブラリを /usr/share/nltk_data に保存します。

docker-compose.yml
version: '3'
services:
nltk:
build:
context: ./docker
image: image-nltk
container_name: container-nltk
volumes:
- ./src:/src
tty: true
test.py
import nltk
sentence = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = nltk.word_tokenize(sentence)
print(tokens)
tagged = nltk.pos_tag(tokens)
print(tagged)

ビルド

Terminal window
% docker-compose build

コンテナ作成

Terminal window
% docker-compose up -dlt

スクリプト実行

Terminal window
% dcoker-compose exec nltk ash
Terminal window
% python3 /src/test.py
['NLTK', 'is', 'a', 'leading', 'platform', 'for', 'building', 'Python', 'programs', 'to', 'work', 'with', 'human', 'language', 'data', '.']
[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('leading', 'VBG'), ('platform', 'NN'), ('for', 'IN'), ('building', 'VBG'), ('Python', 'NNP'), ('programs', 'NNS'), ('to', 'TO'), ('work', 'VB'), ('with', 'IN'), ('human', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('.', '.')]