User Tools

Site Tools


tuto_ecj_weka

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tuto_ecj_weka [2014/03/03 16:16]
Denis Pallez [Class diagram:]
tuto_ecj_weka [2014/03/03 16:48] (current)
Denis Pallez [Conclusion]
Line 22: Line 22:
  
 Here is a short diagram representing what we will add to ECJ in to be able to use it correctly with weka and how it is connected to the core of ECJ. Here is a short diagram representing what we will add to ECJ in to be able to use it correctly with weka and how it is connected to the core of ECJ.
-{{:​ecj_weka_fichiers:​ecj_mods_dgrm.png?​direct&​300 ​| UML Class diagramm}} +{{:​ecj_weka_fichiers:​ecj_mods_dgrm.png|UML Class diagramm}} 
-]===== Getting the results of the evolutionary algorithm =====+] 
 +===== Getting the results of the evolutionary algorithm =====
  
 ECJ is based on parameters files which contains the parameters of the evolutionary algorithms, including the name of the calss used to perform the different parts of the algorithm. The only way to have access to the results of the computation in a proper manner is through the statistics process. Thus I cretaed the class **ec.weka.StatisticsForWeka** which role is to keep the best individuals computed for each subpopulation in each jobs. This class is also responsible of saving this information and restore it uppon checkpointing. ECJ is based on parameters files which contains the parameters of the evolutionary algorithms, including the name of the calss used to perform the different parts of the algorithm. The only way to have access to the results of the computation in a proper manner is through the statistics process. Thus I cretaed the class **ec.weka.StatisticsForWeka** which role is to keep the best individuals computed for each subpopulation in each jobs. This class is also responsible of saving this information and restore it uppon checkpointing.
Line 100: Line 101:
 This class is used to get the results of the computation and give it to weka (via the custom Evolve class which is studied later in this tutorial). So if you want your algorithm to be able to communicate its results to weka, you'll need to make your statistics class derive from this one. This class is used to get the results of the computation and give it to weka (via the custom Evolve class which is studied later in this tutorial). So if you want your algorithm to be able to communicate its results to weka, you'll need to make your statistics class derive from this one.
  
 +<note important>​
 You probably noticed that there are 2 variables storing the best individuals. This is completely normal and necessary. Indeed, the static field is not stored uppon serialization. So when checkpointing,​ the information stored in it will be lost when restoring. In the other hand, the private field is renew between two jobs because we build a new statistic class for each jobs. In conclusion the information will be lost from one job to another. That's why we need both of them. You probably noticed that there are 2 variables storing the best individuals. This is completely normal and necessary. Indeed, the static field is not stored uppon serialization. So when checkpointing,​ the information stored in it will be lost when restoring. In the other hand, the private field is renew between two jobs because we build a new statistic class for each jobs. In conclusion the information will be lost from one job to another. That's why we need both of them.
  
Line 105: Line 107:
  
 We couldn'​t put this information in the Evolve class since it is not part of the serialized data when checkpointing. That's why i choose to put it there. We couldn'​t put this information in the Evolve class since it is not part of the serialized data when checkpointing. That's why i choose to put it there.
 +</​note>​
 ===== Running the algorithm and get the results ===== ===== Running the algorithm and get the results =====
  
Line 116: Line 118:
 The changes introduced are mainly initializations of the variables used to store the results in different cases. The changes introduced are mainly initializations of the variables used to store the results in different cases.
  
-Since the class is quite long, I won't print it here but you can find it in [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​ec/​weka/​Evolve.java|src/​ec/​weka/​Evolve.java]]+Since the class is quite long, I won't print it here but you can find it in {{:ecj_weka_fichiers:​evolve.java|src/​ec/​weka/​Evolve.java}}
  
 The only changes are in the run and main method and the addition of a new field of type **Instance[]** which usage is explained in the next section. You can skip the rest of the file. The only changes are in the run and main method and the addition of a new field of type **Instance[]** which usage is explained in the next section. You can skip the rest of the file.
Line 389: Line 391:
 See weka's documentation for a full explanation on how to perform option handling and make it available in the GUI. See weka's documentation for a full explanation on how to perform option handling and make it available in the GUI.
  
-Now your options are fully available. I built a complete exemple at [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​weka/​clusterers/​EcjBasedClusterer.java|weka.clusterers.EcjBasedClusterer]]+Now your options are fully available. I built a complete exemple at {{:ecj_weka_fichiers:​ecjbasedclusterer.java|weka.clusterers.EcjBasedClusterer.java}}
  
 ===== Calling the ECJ algorithm: ===== ===== Calling the ECJ algorithm: =====
Line 418: Line 420:
 There you are, the algorithm runs with the parameters file you specified in weka options for your clusterer and you got the results back at the end of the computation. There you are, the algorithm runs with the parameters file you specified in weka options for your clusterer and you got the results back at the end of the computation.
  
 +<note important>​
 The parameters array given to the run method must contains the same thing as if you were calling **Evolve** from the command line. Don't forget that when calling from the command line the first argument in the array is the name of the command typed. Here i left it blanks because its not read at any time. But it can be discarded or ignored so i prefer having one empty string at the begining of the array and specify the options in the rest of the array. The parameters array given to the run method must contains the same thing as if you were calling **Evolve** from the command line. Don't forget that when calling from the command line the first argument in the array is the name of the command typed. Here i left it blanks because its not read at any time. But it can be discarded or ignored so i prefer having one empty string at the begining of the array and specify the options in the rest of the array.
 +</​note>​
  
 If you want to load from a checkpoint, you don't need to set the learning data. I mentioned that the data given by weka to ecj are placed in a manner that allow them to be save when checkpointing. So when resuming, the data previously saved are restored and used to complete the evolution process. If new data are given, they are ignored. Here is a bit of code showing how to launch an algorithm from a checkpoint file: If you want to load from a checkpoint, you don't need to set the learning data. I mentioned that the data given by weka to ecj are placed in a manner that allow them to be save when checkpointing. So when resuming, the data previously saved are restored and used to complete the evolution process. If new data are given, they are ignored. Here is a bit of code showing how to launch an algorithm from a checkpoint file:
Line 433: Line 437:
 Now you should be able to construct simple Clusterer for weka using an ECJ algorithm. Practising these two softwares, you should then be able to build more complicated things. The class **weka.clusterer.ECJBasedClusterer** shows an exemple of clusterer taking two parameters, one string for an ECJ parameters file and another string for an ECJ checkpoint file. The last one overrides the first one if both are given. Then the clusterer does nothing in its buildClassifier method and it always returns 0 for the cluster. It's useless as a clusterer, the goal is just to show the overall construction of a clusterer with some options. Now you should be able to construct simple Clusterer for weka using an ECJ algorithm. Practising these two softwares, you should then be able to build more complicated things. The class **weka.clusterer.ECJBasedClusterer** shows an exemple of clusterer taking two parameters, one string for an ECJ parameters file and another string for an ECJ checkpoint file. The last one overrides the first one if both are given. Then the clusterer does nothing in its buildClassifier method and it always returns 0 for the cluster. It's useless as a clusterer, the goal is just to show the overall construction of a clusterer with some options.
  
-This class is available at: [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​weka/​clusterers/​EcjBasedClusterer.java|src/​weka/​clusterers/​EcjBasedClusterer.java]].+This class is available at: {{:ecj_weka_fichiers:​ecjbasedclusterer.java|src/​weka/​clusterers/​EcjBasedClusterer.java}}.
  
 Be aware that even if this exemple shows only how to build a clusterer, the work to build a classifier or an associator is fairly the same. You just need to change the base class used in weka and construct an algorithm for classification instead of clusterisation. Be aware that even if this exemple shows only how to build a clusterer, the work to build a classifier or an associator is fairly the same. You just need to change the base class used in weka and construct an algorithm for classification instead of clusterisation.
Line 439: Line 443:
 In conclusion, the capabilities of this interaction between ECJ and weka only depends on the algorithm you're writting with ECJ and the use of the results your doing in weka. In conclusion, the capabilities of this interaction between ECJ and weka only depends on the algorithm you're writting with ECJ and the use of the results your doing in weka.
  
-Rédigé par Romaric ​pighetti. Janvier ​2012.+<​note>​written by Romaric ​Pighetti in 2012/01.</​note>​
tuto_ecj_weka.1393859799.txt.gz · Last modified: 2014/03/03 16:16 by Denis Pallez