Joyent’s Manta. Store and Compute
Until now we only used Joyent’s Triton. We deployed Docker container with Triton. Probably we want to store date somewhere. Manta is a Amazon S3 like storage service. However, Manta also can compute on your data. Let’s start!
Preparation
Manta does have a REST API. However, in this blog post we’ll just use the Manta command line utilities.
sudo npm install manta -g
Then we setup the Manta environment, so we copy from the Manta manual.
# https://my.joyent.com/main/#!/manta/intro Lists your's
export MANTA_URL=https://us-east.manta.joyent.com;
export MANTA_USER=<Joyent-User>;
export MANTA_KEY_ID=<ssh-public-key-finger-print>;
# Or read it directly from your local .ssh keys
export MANTA_KEY_ID=$(ssh-keygen -l -f $HOME/.ssh/id_rsa.pub | awk '{print $2}')
That’s all. We’re set up.
Let’s Store Some Files
First, let’s upload a few files.
echo "Hi there, internet" > ~/hello-manta.txt
# Putting up a simple file
mput -f ~/hello-manta.txt ~~/stor/hello-manta.txt
# Can get it again
mget ~~/stor/hello-manta.txt
# Can put any content, here piped beer data from stack exchange
curl -sL https://tools.ietf.org/rfc/rfc2616.txt | mput -p ~~/stor/blog/rfc2616.txt
# List directory
mls ~~/stor/blog/
# List more
mls -l ~~/stor/blog/
# Find stuff
mfind -n rfc*.txt ~~/stor/blog/
# Delete everything
mrm -r ~~/stor/blog/
To upload we use mput
. The -f
option will upload the specified file. The ~~/stor/hello-manta.txt
specifies where
to upload the file in Manta. The ~~
is like the Unix home directory, just on Manta. mput
also can upload from a
pipe. Here we upload a RFC, right from the curl
output. The -p
option creates the parent directories if missing.
mls
lists a Manta directory content and mfind
finds files based on name. Finally, mrm
deletes files and
directories.
Manta’s Secret: Compute
First, let’s upload some videos.
curl -sL http://www.caminandes.com/download/01_llama_drama_1080p.zip | mput -p ~~/stor/blog/caminandes_01.zip
curl -sL http://www.caminandes.com/download/02_gran_dillama_1080p.zip | mput -p ~~/stor/blog/caminandes_02.zip
curl -sL http://www.caminandes.com/download/03_caminandes_llamigos_1080p.mp4 | mput -p ~~/stor/blog/caminandes_03.mp4
So, we stored the videos on Manta. Maybe we want to create mobile friendly versions of these videos. So we download the video, transcode the video and upload it again? If video is a few GBytes, we down and upload it again? No! Manta can compute, so we can convert the video right there, on Manta. Aha, so we have to learn a new framework? No! Manta just uses regular Unix programs. Let’s start!
# Login to you're stored file.
# After that, you get a regular prompt
mlogin ~~/stor/blog/caminandes_03.mp4
# Check out the enviroment
env
# The MANTA_INPUT_FILE will contain the actual file
# The MANTA_INPUT_OBJECT the name in the store
MANTA_OUTPUT_BASE=/Gamlor/jobs/f018b6a5-aa28-44fb-bc30-b5cca7a1db60/stor/Gamlor/stor/blog/caminandes_03.mp4.0.
MANTA_INPUT_FILE=/manta/Gamlor/stor/blog/caminandes_03.mp4
MANTA_INPUT_OBJECT=/Gamlor/stor/blog/caminandes_03.mp4
# Try out things you wish to do on the file. Like for a video, ffmpeg can transcode a video to smaller sizes.
# Like this one to make it way smaller, for like a phone
ffmpeg -i $MANTA_INPUT_FILE -strict -2 -b:v 500k -s 320x240 -vcodec mpeg4 -acodec aac ~/small.mp4
# Once you explored, you can exit
exit
We can login to the Manta file with mlogin
. Really! This way we can run all Unix programs right next to the file and
can compute anything right there. We want to create a smaller video, so we use ffmpeg
. However, we want to transcode
all videos. Yes, so let’s use mjob
for that.
#mjob: Takes a list of Manta path. Applies the program on it. -o waits for the task and returns the stdout
echo ~~/stor/blog/caminandes_03.mp4 | mjob create -o -m 'sha1sum'
#=> added 1 input to 61c30a0c-c7f0-e09c-c995-8fc5d61b06c3
#=> ddd2bf01f87be76b875efafd46c9930f722113b5 -
#So, with mfind, we can now do calculations across files
mfind ~~/stor/blog/ | mjob create -o -m 'sha1sum'
#=> added 3 inputs to fb3aa947-b637-c5ea-8d38-b42620cbef1d
#=> 07acf89b74b677432acc3ee6579bcbb8ee13640e -
#=> 0e502cad377f75973c597eab2318f39aa4763ad4 -
#=> ddd2bf01f87be76b875efafd46c9930f722113b5 -
#Or more pretty listing.
# First we hash the data: sha1sum
# Then fetch extract the first colum. Note that we escape the dollar sign: awk "{print \$1}"
# Last, compose together the hash and the file name: echo $(cat) $(basename $MANTA_INPUT_OBJECT)
mfind ~~/stor/blog/ | mjob create -o -m 'sha1sum | awk "{print \$1}"| echo $(cat) $(basename $MANTA_INPUT_OBJECT)'
#=> added 3 inputs to 8bc9b130-9b73-47c5-dec9-ed4e87cdc761
#=> 07acf89b74b677432acc3ee6579bcbb8ee13640e caminandes_02.zip
#=> ddd2bf01f87be76b875efafd46c9930f722113b5 caminandes_03.mp4
#=> 0e502cad377f75973c597eab2318f39aa4763ad4 caminandes_01.zip
# For our movie tranformation, we first unzip the zip files.
# zip files are not streamable. So we ditch the stdin, an4d read from the file: unzip $MANTA_INPUT_FILE -d ~/out < /dev/null
# Then, get the output file name: tail -n 1
# Extract the actual file name column: awk '{print $2}'
# Push that file to stdout: xargs cat
# And use manta's mpipe to create a named Manta output file: mpipe ${MANTA_INPUT_OBJECT}.mp4
mfind -t o -n '.zip$' ~~/stor/blog/ | mjob create -w -m "unzip \$MANTA_INPUT_FILE -d ~/out < /dev/null \
| tail -n 1 | awk '{print \$2}' | xargs cat | mpipe \${MANTA_INPUT_OBJECT}.mp4"
# Last step. Transcode the videos to a mobile format:
# The downloaded video might not be streamable. So we cannot take it from the stdin. So, just read if from the file.
# First transcode it to a 600kbit/s stream, 320x240 resolution, mp4 format: ffmpeg -nostdin -i \$MANTA_INPUT_FILE -strict -2 -b:v 300k -s 320x240 -vcodec mpeg4 -acodec aac ~/tmp.mp4
# Cat the file, pipe it to a named Manta output file: cat ~/tmp.mp4 | mpipe \${MANTA_INPUT_OBJECT}.mobile.mp4
mfind -t o -n '.mp4$' ~~/stor/blog/ | mjob create -w -m "ffmpeg -nostdin -i \$MANTA_INPUT_FILE \
-strict -2 -b:v 600k -s 320x240 -vcodec mpeg4 -acodec aac ~/tmp.mp4 > /dev/null && cat ~/tmp.mp4 | mpipe \${MANTA_INPUT_OBJECT}.mobile.mp4"
# After completion:
mls ~~/stor/blog
#=> caminandes_01.zip
#=> caminandes_01.zip.mp4
#=> caminandes_01.zip.mp4.mobile.mp4
#=> caminandes_02.zip
#=> caminandes_02.zip.mp4
#=> caminandes_02.zip.mp4.mobile.mp4
#=> caminandes_03.mp4
#=> caminandes_03.mp4.mobile.mp4
# fetch a mobile video to check it out:
mget ~~/stor/blog/caminandes_03.mp4.mobile.mp4 > caminandes_03.mp4.mobile.mp4
mjob’ takes a Manta file path on stdin. `mjob create
starts a new computation. -o
will wait to completion and show
the results. After the -m
option we specify the computation
1st example: Compute ~~/stor/blog/caminandes_03.mp4
’s sha1.
2nd example: Find all files with mfind
and compute the sha1s.
3nd example: We can use Unix pipes. So, let’s create a easy to read output.
Let’s try some useful computation. Let’s unzip
caminandes_01.zip andcaminandes_02.zip, then store it as with mpipe’
as another Manta file. `-w
waits until the computation completed. And now let’s create the small, mobile friendly
videos. We transcode with ffmpeg
and store the small file with mpipe
to a new Manta file. TATAAAA! Here are our
smaller videos. We can check out a small file with `mget’. We did all this directly on Manta. If these file were large
we didn’t have to download, then upload again anything. We did everything in Manta.
Map Reduce
So far we only used mjob create -m
. mjob
can do map reduce. When we need some summary type of computation, we use
the map reduce feature. Here a example, were we calculate a summary of the used video bit rates:
# Let's list the bit rate:
# First find the bitrate: ffprobe $MANTA_INPUT_FILE 2>&1
# Find the bitrate line: grep bitrate
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe $MANTA_INPUT_FILE 2>&1 | grep bitrate'
#=> added 6 inputs to 00abc11b-2bad-405c-8add-941400614cc4
#=> Duration: 00:02:26.05, start: 0.000000, bitrate: 6900 kb/s
#=> Duration: 00:02:30.13, start: 0.000000, bitrate: 10680 kb/s
#=> Duration: 00:02:30.12, start: 0.021333, bitrate: 717 kb/s
#=> Duration: 00:01:30.02, start: 0.023220, bitrate: 672 kb/s
#=> Duration: 00:02:26.08, start: 0.021333, bitrate: 725 kb/s
#=> Duration: 00:01:30.00, start: 0.000000, bitrate: 3120 kb/s
# Let's list the bit rate again:
# Only extract the bit rate colum: awk "{print \$6}"
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe $MANTA_INPUT_FILE 2>&1 | grep bitrate | awk "{print \$6}"'
#=> added 6 inputs to 6c1b8b80-1516-e8e5-f6b6-99c5ebcd9f3b
#=> 6900
#=> 672
#=> 10680
#=> 717
#=> 725
#=> 3120
# With the reduce phase we can collect the result's back together.
# For example, get the min, max and mean bit reate of all our videos
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe $MANTA_INPUT_FILE 2>&1 | grep bitrate | awk "{print \$6}"' \
-r 'maggr max,min,mean'
First, we extract the video’s info with mbjob
and ffprobe
. We look for the bitrat with grep
and locate the right,
6th column with awk
. Finally, we specify the reduct step after the -r
parameter. Here we use
maggr
to do some
statistics. (^_^)
Explore Manta
I skipped many features and topics. Take a look at the Manta documentation and try it out.