Page 205 - Kaleidoscope Academic Conference Proceedings 2020
P. 205

Industry-driven digital transformation




           indicates  its  concatenated embedding.  P is  the  conditional   implements of the original BERT model and several recently
           probability which the softmax layer provides. In a matter of   published  modified  models.  In  practice,  we  chose  the
           speaking, the objective is to maximize the probability that   optimized BERT implement named A Lite BERT (ALBERT)
           each encoded flow sample is predicted as its corresponding   [16] which is more efficient and less resource-consuming.
           category. The flow-level information is involved in the final   However, even to be properly optimized, current BERT pre-
           softmax  classifier,  and  thus  will  be  used  to  fine-tune  the   training is very costly that we use 4 Nvidia Tesla P100 GPU
           packet-level encoding network during the back propagation.   cards.
           The main point of such a fine-tuning strategy is to separate
           the  learning  of  the  packets  relationship  from  the  time-  Table 1 – Pre-training parameter settings
           consuming pre-training procedures.
                                                                   Parameter     Value        Description
                          4.  EXPERIMENTS                      hidden_size       768    Vector size of the encoding
                                                                                        outputs (embedding vectors).
           4.1    Experiment Settings                          num_hidden_layers   12   Number of encoders used in
                                                                                        the encoding network.
           4.1.1   Data sets                                   num_attention_heads  12   Number of attention heads
                                                                                        used in the multi-head
           Unlabeled Traffic Data: The data set that is utilized for the                attention mechanism.
           pre-training of our PERT encoding network. To generate this   intermediate_size   3072   Size of the hidden vectors in
           data set, we capture a large amount of raw traffic data from                 the FNN networks.
           different sources using different devices through a network   input_length   128   Amount of tokenized
           sniffer.  Typically,  there  is  no  special  requirement  for  the          bigrams used in a single
           unlabeled  traffic  data  except  you  should  make  sure  your              packet.
           collected  samples  can  cover  the  mainstream  protocols,  as
           many as possible.                                  Table  1  shows  the  settings  of  our  pre-training  and
                                                              corresponding description of each parameter. Such settings
           ISCX Data Set: We chose a popular encrypted traffic data   refer to what common NLP works with BERT encoding use.
                                       1
           set  "ISCX2016  VPN-nonVPN"   [15]  to  make  our   After sufficient training, we save the encoding network as a
                                                                    4
           classification  evaluations  more  persuasive.  However,  this   Pytorch  format which can be reused in our classification
           data  set  only  marks  where  its  encrypted  traffic  data  is   networks. Also, all of our other networks are implemented
           captured from and whether the capturing is through a VPN   using the Pytorch.
           session  or  not,  which  means  a  further  labeling  should  be
           performed. The ISCX data set is utilized in several works yet   Table 2 – Classification parameter settings
           the results are rather different even when the same model is
           applied [7],[8]. This is mainly due to how the raw data is   Parameter   Value      Description
           processed and labeled. We only found [7] provided their pre-  packet_num   alternative   Amount of the first packets
                                                    2
           processing and labeling procedures in their github . In this      (5 by default)  in a flow that are chosen.
           way, we follow this open source project to process the raw   softmax_hidden   768   Size of the hidden vectors
           ISCX data set and label it with 12 classes.                                   in the softmax layer.
                                                               dropout       0.5         The dropout rate of the
           Android Application Data Set: We find the ISCX data set                       softmax layer.
           is  not  entirely  encrypted  as  it  also  contains  data  of  some
           unencrypted  protocols  like  the  DNS.  To  make  a  better   Classification:  The  encoding  network  used  at  the
           evaluation, in this work, we manually capture traffic samples   classification stage shares strictly the same structure as the
           from 100 Android applications via the Android devices and   pre-trained one. Other settings of the classification layers are
           network sniffer tool-kit. All the captured data belongs to the   shown in Table 2. As fine-tuning the encoding network in a
           top  activated  applications  of  the  Chinese  Android  app   classification task is relatively inexpensive [3], a single GPU
           markets. Afterward, we exclusively pick the HTTPS flows   card will be just enough.
           to ensure only the encrypted data remains.
                                                              4.1.3   Baselines
           4.1.2   Parameters
                                                              Below  are  the  baseline  classification  methods  we  use  for
           Pre-training: First of all, to perform the packet-level PERT   comparison:
           pre-training  for  our  unlabeled  traffic  data,  we  introduce
                                           3
           public  Python  library  transfomers   which  provide


           1    https://www.unb.ca/cic/datasets/vpn.htm       3    https://huggingface.co/transformers/

           2    https://github.com/echowei/DeepTraffic        4    https://pytorch.org




                                                          – 147 –
   200   201   202   203   204   205   206   207   208   209   210