User Tools

Site Tools


tuto_ecj_weka

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tuto_ecj_weka [2014/03/03 16:03]
Denis Pallez [How to build weka classifiers using ecj algorithms]
tuto_ecj_weka [2014/03/03 16:48] (current)
Denis Pallez [Conclusion]
Line 1: Line 1:
-====== How to build weka classifiers using ecj algorithms ​? ======+====== How to build weka classifiers using ECJ library ​? ======
  
 A short version is available [[tuto_ecj_weka_easier|here]]. A short version is available [[tuto_ecj_weka_easier|here]].
  
-===== Table of content: ===== +If you have any question please do not hesitate to ask. You can contact me at romaric DOT pighetti AT gmail DOT com.
- +
-  * [[#​prerequisities|Pre requisities]]  +
-  * [[#​classdiagram|Class diagram]]  +
-  * [[#​GetTheResultsOfTheEvolutionaryAlgorithm|Getting the results of the evolutionary algorithm]]  +
-  * [[#​RunningTheAlgorithmAndGetTheResults|Running the algorithm and get the results]]  +
-  * [[#​GettingDataFromWeka|Getting data from Weka]]  +
-  * [[#​UsingInstances|Using the ]]**[[#​UsingInstances|Instances]]**[[#​UsingInstances| and access to the information enclosed within it]]  +
-  * [[#​BuildWekaAlgorithm|Build a new algorithm in weka]]  +
-  * [[#​HandlingOptions|Handling options in a weka clusterer]]  +
-  * [[#​CallingECJ|Calling the ECJ algorithm:​]]  +
-  * [[#​Conclusion|Conclusion]]  +
- +
----- +
- +
-This tutorial is under construction. It should be finished within a couple of weeks. ​If you have any question please do not hesitate to ask. You can contact me at romaric DOT pighetti AT gmail DOT com.+
  
 ===== Goal and introduction ===== ===== Goal and introduction =====
Line 36: Line 21:
 ===== Class diagram: ===== ===== Class diagram: =====
  
-Here is a short diagram representing what we will add to ECJ in to be able to use it correctly with weka and how it is connected to the core of ECJ (Click to access the image)+Here is a short diagram representing what we will add to ECJ in to be able to use it correctly with weka and how it is connected to the core of ECJ. 
- +{{:​ecj_weka_fichiers:​ecj_mods_dgrm.png|UML Class diagramm}} 
-[[images:​ecj_mods_dgrm|{{images:​ecj_mods_dgrm.png|class diagram}}]]===== Getting the results of the evolutionary algorithm =====+] 
 +===== Getting the results of the evolutionary algorithm =====
  
 ECJ is based on parameters files which contains the parameters of the evolutionary algorithms, including the name of the calss used to perform the different parts of the algorithm. The only way to have access to the results of the computation in a proper manner is through the statistics process. Thus I cretaed the class **ec.weka.StatisticsForWeka** which role is to keep the best individuals computed for each subpopulation in each jobs. This class is also responsible of saving this information and restore it uppon checkpointing. ECJ is based on parameters files which contains the parameters of the evolutionary algorithms, including the name of the calss used to perform the different parts of the algorithm. The only way to have access to the results of the computation in a proper manner is through the statistics process. Thus I cretaed the class **ec.weka.StatisticsForWeka** which role is to keep the best individuals computed for each subpopulation in each jobs. This class is also responsible of saving this information and restore it uppon checkpointing.
Line 44: Line 30:
 ec.weka.StatisticsForWeka:​ ec.weka.StatisticsForWeka:​
  
-<​code>​+<​code ​java>
 package ec.weka; package ec.weka;
  
Line 115: Line 101:
 This class is used to get the results of the computation and give it to weka (via the custom Evolve class which is studied later in this tutorial). So if you want your algorithm to be able to communicate its results to weka, you'll need to make your statistics class derive from this one. This class is used to get the results of the computation and give it to weka (via the custom Evolve class which is studied later in this tutorial). So if you want your algorithm to be able to communicate its results to weka, you'll need to make your statistics class derive from this one.
  
 +<note important>​
 You probably noticed that there are 2 variables storing the best individuals. This is completely normal and necessary. Indeed, the static field is not stored uppon serialization. So when checkpointing,​ the information stored in it will be lost when restoring. In the other hand, the private field is renew between two jobs because we build a new statistic class for each jobs. In conclusion the information will be lost from one job to another. That's why we need both of them. You probably noticed that there are 2 variables storing the best individuals. This is completely normal and necessary. Indeed, the static field is not stored uppon serialization. So when checkpointing,​ the information stored in it will be lost when restoring. In the other hand, the private field is renew between two jobs because we build a new statistic class for each jobs. In conclusion the information will be lost from one job to another. That's why we need both of them.
  
Line 120: Line 107:
  
 We couldn'​t put this information in the Evolve class since it is not part of the serialized data when checkpointing. That's why i choose to put it there. We couldn'​t put this information in the Evolve class since it is not part of the serialized data when checkpointing. That's why i choose to put it there.
 +</​note>​
 ===== Running the algorithm and get the results ===== ===== Running the algorithm and get the results =====
  
Line 131: Line 118:
 The changes introduced are mainly initializations of the variables used to store the results in different cases. The changes introduced are mainly initializations of the variables used to store the results in different cases.
  
-Since the class is quite long, I won't print it here but you can find it in [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​ec/​weka/​Evolve.java|src/​ec/​weka/​Evolve.java]]+Since the class is quite long, I won't print it here but you can find it in {{:ecj_weka_fichiers:​evolve.java|src/​ec/​weka/​Evolve.java}}
  
 The only changes are in the run and main method and the addition of a new field of type **Instance[]** which usage is explained in the next section. You can skip the rest of the file. The only changes are in the run and main method and the addition of a new field of type **Instance[]** which usage is explained in the next section. You can skip the rest of the file.
Line 161: Line 148:
 ec.weka.EvolutionStateForWeka:​ ec.weka.EvolutionStateForWeka:​
  
-<​code>​+<​code ​java>
 package ec.weka; package ec.weka;
  
Line 253: Line 240:
 ECJBasedClusterer.java,​ first step: ECJBasedClusterer.java,​ first step:
  
-<​code>​+<​code ​java>
 package weka.clusterers;​ package weka.clusterers;​
  
Line 316: Line 303:
 Path to the ECJ parameters file: Path to the ECJ parameters file:
  
-<​code>​+<​code ​java>
 private String parameterFilePath;​ private String parameterFilePath;​
 </​code>​ </​code>​
Line 328: Line 315:
 These 3 methods in our case: These 3 methods in our case:
  
-<​code>​+<​code ​java>
 /** /**
  * Returns an enumeration describing the available options.  * Returns an enumeration describing the available options.
Line 383: Line 370:
 setter setter
  
-<​code>​+<​code ​java>
 public void setBlabla(TypeOfTheField t) public void setBlabla(TypeOfTheField t)
 { {
Line 392: Line 379:
 getter getter
  
-<​code>​+<​code ​java>
 public TypeOfTheField getBlabla() public TypeOfTheField getBlabla()
 { {
Line 404: Line 391:
 See weka's documentation for a full explanation on how to perform option handling and make it available in the GUI. See weka's documentation for a full explanation on how to perform option handling and make it available in the GUI.
  
-Now your options are fully available. I built a complete exemple at [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​weka/​clusterers/​EcjBasedClusterer.java|weka.clusterers.EcjBasedClusterer]]+Now your options are fully available. I built a complete exemple at {{:ecj_weka_fichiers:​ecjbasedclusterer.java|weka.clusterers.EcjBasedClusterer.java}}
  
 ===== Calling the ECJ algorithm: ===== ===== Calling the ECJ algorithm: =====
Line 412: Line 399:
 setLearningDataSet:​ setLearningDataSet:​
  
-<​code>​+<​code ​java>
 Instance[] instances = new Instance[numberOfInstance];​ Instance[] instances = new Instance[numberOfInstance];​
 /* /*
Line 424: Line 411:
 calling the run method with a parameter file: calling the run method with a parameter file:
  
-<​code>​+<​code ​java>
 String[] parameters = new String[3]; String[] parameters = new String[3];
 parameters[1] = "​-file";​ parameters[1] = "​-file";​
Line 433: Line 420:
 There you are, the algorithm runs with the parameters file you specified in weka options for your clusterer and you got the results back at the end of the computation. There you are, the algorithm runs with the parameters file you specified in weka options for your clusterer and you got the results back at the end of the computation.
  
 +<note important>​
 The parameters array given to the run method must contains the same thing as if you were calling **Evolve** from the command line. Don't forget that when calling from the command line the first argument in the array is the name of the command typed. Here i left it blanks because its not read at any time. But it can be discarded or ignored so i prefer having one empty string at the begining of the array and specify the options in the rest of the array. The parameters array given to the run method must contains the same thing as if you were calling **Evolve** from the command line. Don't forget that when calling from the command line the first argument in the array is the name of the command typed. Here i left it blanks because its not read at any time. But it can be discarded or ignored so i prefer having one empty string at the begining of the array and specify the options in the rest of the array.
 +</​note>​
  
 If you want to load from a checkpoint, you don't need to set the learning data. I mentioned that the data given by weka to ecj are placed in a manner that allow them to be save when checkpointing. So when resuming, the data previously saved are restored and used to complete the evolution process. If new data are given, they are ignored. Here is a bit of code showing how to launch an algorithm from a checkpoint file: If you want to load from a checkpoint, you don't need to set the learning data. I mentioned that the data given by weka to ecj are placed in a manner that allow them to be save when checkpointing. So when resuming, the data previously saved are restored and used to complete the evolution process. If new data are given, they are ignored. Here is a bit of code showing how to launch an algorithm from a checkpoint file:
  
-<​code>​+<​code ​java>
 String[] parameters = new String[3]; String[] parameters = new String[3];
 parameters[1] = "​-checkpoint";​ parameters[1] = "​-checkpoint";​
Line 448: Line 437:
 Now you should be able to construct simple Clusterer for weka using an ECJ algorithm. Practising these two softwares, you should then be able to build more complicated things. The class **weka.clusterer.ECJBasedClusterer** shows an exemple of clusterer taking two parameters, one string for an ECJ parameters file and another string for an ECJ checkpoint file. The last one overrides the first one if both are given. Then the clusterer does nothing in its buildClassifier method and it always returns 0 for the cluster. It's useless as a clusterer, the goal is just to show the overall construction of a clusterer with some options. Now you should be able to construct simple Clusterer for weka using an ECJ algorithm. Practising these two softwares, you should then be able to build more complicated things. The class **weka.clusterer.ECJBasedClusterer** shows an exemple of clusterer taking two parameters, one string for an ECJ parameters file and another string for an ECJ checkpoint file. The last one overrides the first one if both are given. Then the clusterer does nothing in its buildClassifier method and it always returns 0 for the cluster. It's useless as a clusterer, the goal is just to show the overall construction of a clusterer with some options.
  
-This class is available at: [[http://​pighetti.atlasiens.fr/​tutorials/​using-ecj-in-weka/​src/​weka/​clusterers/​EcjBasedClusterer.java|src/​weka/​clusterers/​EcjBasedClusterer.java]].+This class is available at: {{:ecj_weka_fichiers:​ecjbasedclusterer.java|src/​weka/​clusterers/​EcjBasedClusterer.java}}.
  
 Be aware that even if this exemple shows only how to build a clusterer, the work to build a classifier or an associator is fairly the same. You just need to change the base class used in weka and construct an algorithm for classification instead of clusterisation. Be aware that even if this exemple shows only how to build a clusterer, the work to build a classifier or an associator is fairly the same. You just need to change the base class used in weka and construct an algorithm for classification instead of clusterisation.
Line 454: Line 443:
 In conclusion, the capabilities of this interaction between ECJ and weka only depends on the algorithm you're writting with ECJ and the use of the results your doing in weka. In conclusion, the capabilities of this interaction between ECJ and weka only depends on the algorithm you're writting with ECJ and the use of the results your doing in weka.
  
-Rédigé par Romaric ​pighetti. Janvier ​2012.+<​note>​written by Romaric ​Pighetti in 2012/01.</​note>​
tuto_ecj_weka.1393858982.txt.gz · Last modified: 2014/03/03 16:03 by Denis Pallez