2020年02月28日

Vivado HLS 2019.2 で普通に C ソースコードを書いて Sobel フィルタを実装する2

”Vivado HLS 2019.2 で普通に C ソースコードを書いて Sobel フィルタを実装する1”の続き。

前回は、ivado HLS 2019.2 で普通に C ソースコードを書いて Sobel フィルタを実装するためのソースコードとテストベンチを貼った。今回は、C シミュレーション、 C コードの合成、 C/RTL 協調シミュレーション、 Export RTL を行う。

C シミュレーションを行った。結果を示す。

sobel_filter_axis3/solution/csim/build ディレクトリを示す。

Sobel フィルタ処理結果の sobel.jpg を示す。綺麗なエッジが検出できている。

C コードの合成を行った。

Latency は 480016 クロックで、総ピクセル数よりも 16 クロック多いだけである。リソース使用量も少ない。

C/RTL 協調シミュレーションを行った。

Latency は 480041 クロックだった。優秀だ。

C/RTL 協調シミュレーションの波形を示す。

outs_TVALID も ins_TREADY もほとんど 1 にアサートされたままなので、スループットが高い。

最後に Export RTL を行った。結果を示す。

リソース使用量も少なく、 CP achieved post-implementation が 4.152 ns で大丈夫そうだ。

2020年02月27日

Vivado HLS 2019.2 で普通に C ソースコードを書いて Sobel フィルタを実装する1

xfOpenCV 、 HLS Video Library で Sobel フィルタを実装してきたが、 C のソースコードで Sobel フィルタを実装してみよう。
なお、この Sobel フィルタの実装は、HDLab さんで、”FPGAの部屋プレゼンツ「 HLSハンズオンセミナー基礎編」”をやっているが、その次のセミナとして Vivado HLS のチューニング方法を詳しく解説する応用編で使用しようと思っている。ただし、ここに載せたソースコードは少し変更してある。
この Sobel フィルタの実装は、横方向と縦方向の Sobel フィルタを同時に行って、その結果を二乗和平方根を取っている。その平方根を計算する部分には、”square root を Vivado HLS で実装する3”のソースコードを使用する。平方根のコードは Sobel フィルタで使用するため開発を行った。

ソースコードの sobel_filter_axis3.cpp を示す。

// sobel_filter_axis3.cpp
// 2020/02/27 by marsee

#include <ap_int.h>
#include <hls_stream.h>
#include <ap_axi_sdata.h>

#define HORIZONTAL  0
#define VERTICAL    1

ap_int<32> sobel_fil(ap_int<32> h_or_v, ap_int<32> x0y0, ap_int<32> x1y0, ap_int<32> x2y0, ap_int<32> x0y1,
        ap_int<32> x1y1, ap_int<32> x2y1, ap_int<32> x0y2, ap_int<32> x1y2, ap_int<32> x2y2);
ap_int<32> conv_rgb2y(ap_int<32> rgb);
ap_int<32> square_root8(ap_int<32> val);

#define DISPLAY_WIDTH 800
#define DISPLAY_HIGHT 600

int sobel_filter_axis(hls::stream<ap_axis<32,1,1,1> >& ins, hls::stream<ap_axis<32,1,1,1> >& outs){
#pragma HLS INTERFACE axis register both port=ins
#pragma HLS INTERFACE axis register both port=outs
#pragma HLS INTERFACE s_axilite port=return

    ap_axis<32,1,1,1> pix;
    ap_axis<32,1,1,1> sobel;
    ap_int<32> sobel_val, sobel_h_val, sobel_v_val;

    ap_int<32> line_buf[2][DISPLAY_WIDTH];
#pragma HLS array_partition variable=line_buf block factor=2 dim=1
#pragma HLS resource variable=line_buf core=RAM_2P

    ap_int<32> pix_mat[3][3];
#pragma HLS array_partition variable=pix_mat complete

    LOOP_WAIT_USER : do {   // user が 1になった時にフレームがスタートする
#pragma HLS LOOP_TRIPCOUNT min=1 max=1 avg=1
        ins >> pix;
    } while(pix.user == 0);

    LOOP_Y: for(int y=0; y<DISPLAY_HIGHT; y++){
#pragma HLS LOOP_TRIPCOUNT min=48 max=48 avg=48
        LOOP_X: for(int x=0; x<DISPLAY_WIDTH; x++){
#pragma HLS PIPELINE II=1
#pragma HLS LOOP_TRIPCOUNT min=64 max=64 avg=64
            if (!(x==0 && y==0))    // 最初の入力はすでに入力されている
                ins >> pix; // AXI4-Stream からの入力

            LOOP_PIX_MAT_K: for(int k=0; k<3; k++){
                LOOP_PIX_MAT_M: for(int m=0; m<2; m++){
                    pix_mat[k][m] = pix_mat[k][m+1];
                }
            }
            pix_mat[0][2] = line_buf[0][x];
            pix_mat[1][2] = line_buf[1][x];
            ap_int<32> y_val = conv_rgb2y(pix.data);
            pix_mat[2][2] = y_val;

            line_buf[0][x] = line_buf[1][x];    // 行の入れ替え
            line_buf[1][x] = y_val;

            sobel_h_val = sobel_fil(HORIZONTAL, pix_mat[0][0], pix_mat[0][1], pix_mat[0][2],
                                                pix_mat[1][0], pix_mat[1][1], pix_mat[1][2],
                                                pix_mat[2][0], pix_mat[2][1], pix_mat[2][2]);
            sobel_v_val = sobel_fil(VERTICAL,   pix_mat[0][0], pix_mat[0][1], pix_mat[0][2],
                                                pix_mat[1][0], pix_mat[1][1], pix_mat[1][2],
                                                pix_mat[2][0], pix_mat[2][1], pix_mat[2][2]);
            sobel_val = square_root8(sobel_h_val*sobel_h_val + sobel_v_val*sobel_v_val);
            sobel.data = (sobel_val<<16)+(sobel_val<<8)+sobel_val;

            if(x==0 && y==0) // 最初のピクセル
                sobel.user = 1;
            else
                sobel.user = 0;
            if(x == (DISPLAY_WIDTH-1)) // 行の最後
                sobel.last = 1;
            else
                sobel.last = 0;

            if(x<2 || y<2)
                sobel.data = 0;

            outs << sobel;
        }
    }
    return(0);
}

// RGBからYへの変換
// RGBのフォーマットは、{8'd0, R(8bits), G(8bits), B(8bits)}, 1pixel = 32bits
// 輝度信号Yのみに変換する。変換式は、Y =  0.299R + 0.587G + 0.114B
// "YUVフォーマット及び YUV<->RGB変換"を参考にした。http://vision.kuee.kyoto-u.ac.jp/~hiroaki/firewire/yuv.html
//　2013/09/27 : float を止めて、すべてint にした
ap_int<32> conv_rgb2y(ap_int<32> rgb){
    ap_int<32> r, g, b, y_f;
    ap_int<32> y;

    b = rgb & 0xff;
    g = (rgb>>8) & 0xff;
    r = (rgb>>16) & 0xff;

    y_f = 77*r + 150*g + 29*b; //y_f = 0.299*r + 0.587*g + 0.114*b;の係数に256倍した
    y = y_f >> 8; // 256で割る

    return(y);
}

// sobel filter
// HORZONTAL
// x0y0 x1y0 x2y0  1  2  1
// x0y1 x1y1 x2y1  0  0  0
// x0y2 x1y2 x2y2 -1 -2 -1
// VERTICAL
// x0y0 x1y0 x2y0  1  0 -1
// x0y1 x1y1 x2y1  2  0 -2
// x0y2 x1y2 x2y2  1  0 -1
ap_int<32> sobel_fil(ap_int<32> h_or_v, ap_int<32> x0y0, ap_int<32> x1y0, ap_int<32> x2y0, ap_int<32> x0y1,
        ap_int<32> x1y1, ap_int<32> x2y1, ap_int<32> x0y2, ap_int<32> x1y2, ap_int<32> x2y2){
    ap_int<32> y;

    if(h_or_v == HORIZONTAL){
        y = x0y0 + 2*x1y0 + x2y0 - x0y2 - 2*x1y2 - x2y2;
    } else {
        y = x0y0 - x2y0 + 2*x0y1 - 2*x2y1 + x0y2 - x2y2;
    }
    if(y<0)
        y = -y;
        //y = 0;
    else if(y>255)
        y = 255;
    return(y);
}

// square_root8
// 8bit幅のsquare_rootを求める
ap_int<32> square_root8(ap_int<32> val){
    ap_int<32> temp = 0;
    ap_int<32> square;

    for(int i=7; i>=0; --i){
        temp += (1 << i);
        square = temp * temp;

        if(square > val){
            temp -= (1 << i);
        }
    }

    return(temp);
}

テストベンチの sobel_filter_axis3_tb.cpp を示す。

// sobel_filter_axis3_tb.cpp
// 2020/02/27 by marsee

#include <stdio.h>
#include <stdint.h>
#include "hls_opencv.h"
#include <ap_int.h>
#include <hls_stream.h>
#include <ap_axi_sdata.h>

int sobel_filter_axis(hls::stream<ap_axis<32,1,1,1> >& ins, hls::stream<ap_axis<32,1,1,1> >& outs);
int sobel_filter_soft(int32_t *cam_fb, int32_t *sobel_fb,
        int32_t x_size, int32_t y_size);
int32_t square_root8_soft(int32_t val);

const char INPUT_BMP_FILE[] = "test2.jpg";
const char OUTPUT_BMP_FILE[] = "sobel.jpg";

int main(){
    hls::stream<ap_axis<32,1,1,1> > ins;
    hls::stream<ap_axis<32,1,1,1> > ins_soft;
    hls::stream<ap_axis<32,1,1,1> > outs;
    hls::stream<ap_axis<32,1,1,1> > outs_soft;
    ap_axis<32,1,1,1> pix;
    ap_axis<32,1,1,1> vals;

    // BMPファイルをMat に読み込む
    cv::Mat img = cv::imread(INPUT_BMP_FILE);

    // ピクセルを入れる領域の確保
    std::vector<int32_t> rd_bmp(sizeof(int32_t)*img.cols*img.rows);
    std::vector<int32_t> hw_sobel(sizeof(int32_t)*img.cols*img.rows);
    std::vector<int32_t> sw_sobel(sizeof(int32_t)*img.cols*img.rows);

    // rd_bmp にBMPのピクセルを代入
    cv::Mat_<cv::Vec3b> dst_vec3b = cv::Mat_<cv::Vec3b>(img);
    for (int y=0; y<img.rows; y++){
        for (int x=0; x<img.cols; x++){
            cv::Vec3b pixel;
            pixel = dst_vec3b(y,x);
            rd_bmp[y*img.cols+x] = (pixel[0] & 0xff) | ((pixel[1] & 0xff)<<8) | ((pixel[2] & 0xff)<<16);
            // blue - pixel[0]; green - pixel[1]; red - pixel[2];
        }
    }

    // ins に入力データを用意する
    for(int i=0; i<5; i++){ //　dummy data
        pix.user = 0;
        pix.data = i;
        ins << pix;
    }

    for(int j=0; j < img.rows; j++){
        for(int i=0; i < img.cols; i++){
            pix.data = (ap_int<32>)rd_bmp[(j*img.cols)+i];

            if (j==0 && i==0)   // 最初のデータの時に TUSER を 1 にする
                pix.user = 1;
            else
                pix.user = 0;

            if (i == img.cols-1) // 行の最後でTLASTをアサートする
                pix.last = 1;
            else
                pix.last = 0;

            ins << pix;
        }
    }

    sobel_filter_axis(ins, outs);   // ハードウェアのソーベルフィルタ
    sobel_filter_soft(rd_bmp.data(), sw_sobel.data(), img.cols, img.rows);  // ソフトウェアのソーベルフィルタ

    // ハードウェアとソフトウェアのソーベルフィルタの値のチェック
    for (int y=0; y<img.rows; y++){
        for (int x=0; x<img.cols; x++){
            outs >> vals;
            ap_int<32> val = vals.data;
            hw_sobel[y*img.cols+x] = (int32_t)val;
            if (val != sw_sobel[y*img.cols+x]){
                printf("ERROR HW and SW results mismatch x = %ld, y = %ld, HW = %d, SW = %d\n",
                        x, y, val, sw_sobel[y*img.cols+x]);
                return(1);
            }
        }
    }
    printf("Success HW and SW results match\n");

    const int sobel_row = img.rows;
    const int sobel_cols = img.cols;
    cv::Mat wbmpf(sobel_row, sobel_cols, CV_8UC3);
    // wbmpf にsobel フィルタ処理後の画像を入力
    cv::Mat_<cv::Vec3b> sob_vec3b = cv::Mat_<cv::Vec3b>(wbmpf);
    for (int y=0; y<wbmpf.rows; y++){
        for (int x=0; x<wbmpf.cols; x++){
            cv::Vec3b pixel;
            pixel = sob_vec3b(y,x);
            int32_t rgb = hw_sobel[y*wbmpf.cols+x];
            pixel[0] = (rgb & 0xff); // blue
            pixel[1] = (rgb & 0xff00) >> 8; // green
            pixel[2] = (rgb & 0xff0000) >> 16; // red
            sob_vec3b(y,x) = pixel;
        }
    }

    // ハードウェアのソーベルフィルタの結果を bmp ファイルへ出力する
    cv::imwrite(OUTPUT_BMP_FILE, wbmpf);

    return(0);
}

#define HORIZONTAL  0
#define VERTICAL    1

int32_t sobel_fil_soft(int32_t h_or_v, int32_t x0y0, int32_t x1y0, int32_t x2y0, int32_t x0y1,
        int32_t x1y1, int32_t x2y1, int32_t x0y2, int32_t x1y2, int32_t x2y2);
int32_t conv_rgb2y_soft(int32_t rgb);

int sobel_filter_soft(int32_t *cam_fb, int32_t *sobel_fb,
    int32_t x_size, int32_t y_size){
    int32_t sobel_val, sobel_h_val, sobel_v_val;
    int32_t pix[3][3];

    for(int y=0; y<y_size; y++){
        for(int x=0; x<x_size; x++){
            for(int i=2; i>=0; --i){
                for(int j=2; j>=0; --j){
                    if(x>=2 && y>=2)
                        pix[i][j] = conv_rgb2y_soft(cam_fb[(y-i)*x_size+(x-j)]);
                    else
                        pix[i][j] = 0;
                }
            }
            sobel_h_val = sobel_fil_soft(HORIZONTAL,pix[0][0], pix[0][1], pix[0][2],
                                                    pix[1][0], pix[1][1], pix[1][2],
                                                    pix[2][0], pix[2][1], pix[2][2]);
            sobel_v_val = sobel_fil_soft(VERTICAL,  pix[0][0], pix[0][1], pix[0][2],
                                                    pix[1][0], pix[1][1], pix[1][2],
                                                    pix[2][0], pix[2][1], pix[2][2]);
            sobel_val = square_root8_soft(sobel_h_val*sobel_h_val + sobel_v_val*sobel_v_val);
            sobel_fb[y*x_size+x] = (sobel_val<<16)+(sobel_val<<8)+sobel_val;
        }
    }

    return(0);
}

// RGBからYへの変換
// RGBのフォーマットは、{8'd0, R(8bits), G(8bits), B(8bits)}, 1pixel = 32bits
// 輝度信号Yのみに変換する。変換式は、Y =  0.299R + 0.587G + 0.114B
// "YUVフォーマット及び YUV<->RGB変換"を参考にした。http://vision.kuee.kyoto-u.ac.jp/~hiroaki/firewire/yuv.html
//　2013/09/27 : float を止めて、すべてint にした
int32_t conv_rgb2y_soft(int32_t rgb){
    int32_t r, g, b, y_f;
    int32_t y;

    b = rgb & 0xff;
    g = (rgb>>8) & 0xff;
    r = (rgb>>16) & 0xff;

    y_f = 77*r + 150*g + 29*b; //y_f = 0.299*r + 0.587*g + 0.114*b;の係数に256倍した
    y = y_f >> 8; // 256で割る

    return(y);
}

// sobel filter
// HORZONTAL
// x0y0 x1y0 x2y0  1  2  1
// x0y1 x1y1 x2y1  0  0  0
// x0y2 x1y2 x2y2 -1 -2 -1
// VERTICAL
// x0y0 x1y0 x2y0  1  0 -1
// x0y1 x1y1 x2y1  2  0 -2
// x0y2 x1y2 x2y2  1  0 -1
int32_t sobel_fil_soft(int32_t h_or_v, int32_t x0y0, int32_t x1y0, int32_t x2y0, int32_t x0y1,
        int32_t x1y1, int32_t x2y1, int32_t x0y2, int32_t x1y2, int32_t x2y2){
    int32_t y;

    if(h_or_v == HORIZONTAL){
        y = x0y0 + 2*x1y0 + x2y0 - x0y2 - 2*x1y2 - x2y2;
    } else {
        y = x0y0 - x2y0 + 2*x0y1 - 2*x2y1 + x0y2 - x2y2;
    }
    if(y<0)
        y = -y;
        //y = 0;
    else if(y>255)
        y = 255;
    return(y);
}

// square_root8_soft
// 8bit幅のsquare_rootを求める
int32_t square_root8_soft(int32_t val){
    int32_t temp = 0;
    int32_t square;

    for(int i=7; i>=0; --i){
        temp += (1 << i);
        square = temp * temp;

        if(square > val){
            temp -= (1 << i);
        }
    }

    return(temp);
}

Vivado HLS 2019.2 の sobel_filter_axis3 プロジェクトを作成した。

2020年02月26日

Vivado HLS 2019.2 で HLS Video Library を使用した Sobel フィルタを作る2

”Vivado HLS 2019.2 で HLS Video Library を使用した Sobel フィルタを作る1”の続き。

前回は、 HLS Video Library で実装した Sobel フィルタのソースコードを貼って、Vivado HLS 2019.2 の sobel_filter プロジェクトを示した。今回は、 sobel_filter プロジェクトで、C シミュレーション、C コードの合成、C/RTL 協調シミュレーション、Export RTL を行う。

C シミュレーションの結果を示す。

7 個ほど OpenCV の結果を合わないとエラーがでているが偏差も 1 または 2 の範囲に収まっている。

sobel_filter/solution1/csim/build ディレクトリを見ると元画像の test2.jpg と HLS Video Library による Sobel フィルタ結果の test2_result.jpg と OpenCV による Sobel フィルタ結果の test2_result_cv.jpg がある。

元画像の test2.jpg

HLS Video Library による Sobel フィルタ結果の test2_result.jpg

OpenCV による Sobel フィルタ結果の test2_result_cv.jpg

画像を見る限りでは、 test2_result.jpg も test2_result_cv.jpg も同じようである。

C コードの合成を行った。

Latency は 485824 クロックで 485824 クロック / 480000 ピクセル約 1.01 クロック / ピクセルだった。問題無さそうだ。
BRAM_18K の使用量も 3 個で、その他のリソースも使用量が xfOpenCV よりも圧倒的に少ない。

HLS Video Library では Sobel フィルタの実装を Filter2D でやっているようだ。その合成レポートを示す。

Analysis 画面を示す。

C/RTL 協調シミュレーションを行った。

Latency は 485860 クロックだった。

C/RTL 協調シミュレーションの波形を見てみよう。

拡大してみた。

OUTPUT_STREAM_TVALID が 0 に落ちているところの周期は 4.035 us だった。800 クロックで 4 us なので 1 行分のようだ。

OUTPUT_STREAM_TVALID が 0 に落ちているところを拡大した。

OUTPUT_STREAM_TVALID が 0 に落ちている時間は 35 ns 、つまり 7 クロック分だった。
xfOpenCV に比べて優秀だ。

Export RTL を行った。

リソース使用量も少ないし、CP achieved post-implementation も 3.433 ns と問題ない。

2020年02月25日

Vivado HLS 2019.2 で HLS Video Library を使用した Sobel フィルタを作る1

前回まで、xfOpenCV で Sobel フィルタを実装してきたが、Vivado HLS には旧 OpenCV のスキームがまだ、 2019.2 でも備わっている。それが HLS Video Library だ。Xilinx Wiki に HLS Video Library の資料がある。
HLS Video Library の hls::Sobel() を使って、Sobel フィルタを実装してみよう。
実は”FPGAの部屋”のブログで 1 度 HLS Video Library で Sobel フィルタを実装してある。下に示す。
”Vivado HLS 2015.4 で OpenCV を使ってみた3（Sobelフィルタを試した1）”
”Vivado HLS 2015.4 で OpenCV を使ってみた4（Sobelフィルタを試した2）”
”Vivado HLS 2015.4 で OpenCV を使ってみた5（Sobelフィルタを試した3）”
”Vivado HLS 2015.4 で OpenCV を使ってみた6（Sobelフィルタを試した4）”

今回は、”Vivado HLS 2015.4 で OpenCV を使ってみた3（Sobelフィルタを試した1）”のソースコードを少し変更してやってみよう。

最初に、sobel_filter_hvl.h を示す。

// sobel_filter_hvl.h
// 2020/02/25 by marsee

#ifndef __SOBEL_FILTER_HVL_H__
#define __SOBEL_FILTER_HVL_H__

#include "ap_axi_sdata.h"
#include "hls_video.h"

#define MAX_HEIGHT 600
#define MAX_WIDTH 800

typedef hls::stream<ap_axiu<32,1,1,1> > AXI_STREAM;
typedef hls::Scalar<3, unsigned char> RGB_PIXEL;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> GRAY_IMAGE;
#endif

ソースコードの sobel_filter_hvl.cpp を示す。横方向の Sobel フィルタとなっている。

// sobel_filter_hvl.cpp
// 2020/02/25 by marsee

// 2016/04/03 : グレー変換あり Sobel フィルタ

#include "sobel_filter_hvl.h"

void sobel_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM, int rows, int
cols) {
#pragma HLS DATAFLOW
#pragma HLS INTERFACE ap_stable port=cols
#pragma HLS INTERFACE ap_stable port=rows
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE axis register both port=OUTPUT_STREAM
#pragma HLS INTERFACE axis register both port=INPUT_STREAM
#pragma HLS INTERFACE s_axilite port=cols
#pragma HLS INTERFACE s_axilite port=rows

    RGB_IMAGE img_0(rows, cols);
    GRAY_IMAGE img_1g(rows, cols);
    GRAY_IMAGE img_2g(rows, cols);
    RGB_IMAGE img_3(rows, cols);

    hls::AXIvideo2Mat(INPUT_STREAM, img_0);
    hls::CvtColor<HLS_BGR2GRAY>(img_0, img_1g);
    hls::Sobel<1,0,3>(img_1g, img_2g);
    hls::CvtColor<HLS_GRAY2BGR>(img_2g, img_3);
    hls::Mat2AXIvideo(img_3, OUTPUT_STREAM);
}

テストベンチの sobel_filter_hvl_tb.cpp を示す。

// sobel_filter_hvl_tb.cpp
// 2020/02/25 by marsee

// OpenCV 2 の Mat を使用したバージョン
// 2016/04/03 : グレー変換あり Sobel フィルタ

#include <iostream>
#include "hls_opencv.h"
#include "sobel_filter_hvl.h"

using namespace cv;

#define INPUT_IMAGE        "test2.jpg"
#define OUTPUT_IMAGE    "test2_result.jpg"
#define OUTPUT_IMAGE_CV    "test2_result_cv.jpg"

void sobel_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM, int rows, int
cols);
void opencv_sobel_filter(Mat& src, Mat& dst);

int main (int argc, char** argv) {
    // OpenCV で画像を読み込む
    Mat src = imread(INPUT_IMAGE);
    AXI_STREAM src_axi, dst_axi;

    // Mat フォーマットから AXI4 Stream へ変換
    cvMat2AXIvideo(src, src_axi);

    // image_filter() 関数をコール
    sobel_filter(src_axi, dst_axi, src.rows, src.cols);

    // AXI4 Stream から Mat フォーマットへ変換
    // dst は宣言時にサイズとカラー・フォーマットを定義する必要がある
    Mat dst(src.rows, src.cols, CV_8UC3);
    AXIvideo2cvMat(dst_axi, dst);

    // Mat フォーマットからファイルに書き込み
    imwrite(OUTPUT_IMAGE, dst);

    // opencv_image_filter() をコール
    Mat dst_cv(src.rows, src.cols, CV_8UC3);
    opencv_sobel_filter(src, dst_cv);
    imwrite(OUTPUT_IMAGE_CV, dst_cv);

    // dst と dst_cv が同じ画像かどうか？比較する
    for (int y=0; y<src.rows; y++){
        Vec3b* dst_ptr = dst.ptr<Vec3b>(y);
        Vec3b* dst_cv_ptr = dst_cv.ptr<Vec3b>(y);
        for (int x=0; x<src.cols; x++){
            Vec3b dst_bgr = dst_ptr[x];
            Vec3b dst_cv_bgr = dst_cv_ptr[x];

            // bgr のどれかが間違っていたらエラー
            if (dst_bgr[0] != dst_cv_bgr[0] || dst_bgr[1] != dst_cv_bgr[1] || dst_bgr[2] != dst_cv_bgr[2]){
                printf("x = %d, y = %d,  Error dst=%d,%d,%d dst_cv=%d,%d,%d\n", x, y,
                        dst_bgr[0], dst_bgr[1], dst_bgr[0], dst_cv_bgr[0], dst_cv_bgr[1], dst_cv_bgr[2]);
                //return 1;
            }
        }
    }
    printf("Test with 0 errors.\n");

    return 0;
}

void opencv_sobel_filter(Mat& src, Mat& dst){
    Mat gray(src.rows, src.cols, CV_8UC1);
    Mat img0g(src.rows, src.cols, CV_8UC1);

    cvtColor(src, gray, CV_BGR2GRAY);
    Sobel(gray, img0g, IPL_DEPTH_16S, 1, 0, 3);
    cvtColor(img0g, dst, CV_GRAY2BGR);

}

Vivado HLS 2019.2 の sobel_filter プロジェクトを示す。このプロジェクトは、Ultra96 用のクロック周期が 5 ns のプロジェクトだ。

2020年02月24日

Vivado HLS 2019.2 で xfOpenCV を使用する5（sobel filter 3）

”Vivado HLS 2019.2 で xfOpenCV を使用する4（sobel filter 2）（xfOpenCV を使用する時のVivado HLSの設定方法）”の続き。

前回は、sobel_filter プロジェクトを使用して、 xfOpenCV を使用する時の GUI 上での Vivado HLSの設定方法を紹介した。今回は、C シミュレーション、C コードの合成、C/RTL 協調シミュレーション、Export RTL を行う。

まずは、C シミュレーションからやってみよう。

sobel_filter/solution1/csim/build ディレクトリの内容を示す。

hls.jpg を示す。

左端のエッジが表示されているようだ。

out_ocv.jpg を示す。

out_error.jpg を示す。やはり、左端の線が表示されている。

C コードの合成を行った。

BRAM_18K が 239 個使用されている。これは、Mat のメモリを実装してしまっているのだろうか？
analysis 画面を見ても大量のBRAM が実装されている。何で？

HLS ビデオライブラリで Sobel フィルタを実装した際は BRAM の使用量は 3 個だった。ラインバッファ使っているので、これくらいだと思うのだが、素直に Mat をメモリとして実装しているとしか考えられない？

C/RTL 協調シミュレーションを行った。
Dump Trace は all に変更した。

結果を示す。

1447426 クロックかかっている。総ピクセル数は 480000 個なので、約 3.02 クロック/ピクセルになっている。

C/RTL 協調シミュレーションの波形を示す。
出力と入力がきれいに分かれていて、出力期間は入力期間の約 1/2 になっている。3倍かかるはずだ。。。DATAFLOW 指示子を入れてあるのにシリアライズされているようだ。これではバッファも総ピクセル数分は必要となるだろう？

入力部分を拡大してみた。 2 クロックに 1 回データ転送している。

今度は出力部分を拡大した。TVALID が 0 に下がる部分までの周期は 4.02 us だった。
TVALID が 0 の部分は 20 ns なので、 4000 ns がデータ転送期間だ。 4000 ns / 5 ns（クロック周期） = 800 なので、1行のピクセル数になる。出力だけ取ってみれば、まともな出力であると言える。

Export RTL を行った。
Vivado synthesis, place and route にチェックを入れて OK ボタンをクリックした。

RAMB18 と RAMB36/FIFO が足りないというエラーになった。メモリ食いすぎ。。。

エラー内容を示す。

ERROR: [DRC UTLZ-1] Resource utilization: RAMB18 and RAMB36/FIFO over-utilized in Top Level Design (This design requires more RAMB18 and RAMB36/FIFO cells than are available in the target device. This design requires 470 of such cell types but only 432 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: RAMB36/FIFO over-utilized in Top Level Design (This design requires more RAMB36/FIFO cells than are available in the target device. This design requires 235 of such cell types but only 216 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
ERROR: [DRC UTLZ-1] Resource utilization: RAMB36E2 over-utilized in Top Level Design (This design requires more RAMB36E2 cells than are available in the target device. This design requires 235 of such cell types but only 216 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.)
INFO: [Vivado_Tcl 4-198] DRC finished with 3 Errors

少なくとも xfOpenCV の Sobel フィルタは現時点では使い物にならないということが分かった。HLS ビデオライブラリか自分でコードを書いたほうがよほど良いと思う。

（2020/02/29：追記）
”Vivado HLS 2019.2 で xfOpenCV を使用する2（dilation 2）”で教えてもらった方法で HLS stream プラグマで depth=1 を depth=16 にしたらレイテンシが 1/2 になった。

ソースコードの xf_sobel.cpp の HLS stream プラグマを示す。

#pragma HLS stream variable=imgInput1.data dim=1 depth=16
#pragma HLS stream variable=imgOutput1.data dim=1 depth=16

C コードの合成を行った。以前の結果と並べてみた。左が以前の結果だ。

C/RTL 協調シミュレーションを行った。

968026 クリックかかっている。480000 ピクセルなので、約倍のクロックかかっている。前よりも少ない。

波形を示す。入力と出力がパイプラインされていない。画面分のバッファが必要だ。

Export RTL は同様にRAMB18 と RAMB36/FIFO が足りないというエラーになった。やはり、メモリ食いすぎ。。。

xfOpenCV の sobel フィルタはやはり使えないようだ。