Quick Takes: python(x,y) – Python for Scientists

 

Python(x,y) is a free scientific and engineering development software for numerical computations, data analysis and data visualization based on Python programming language, Qt graphical user interfaces (and development framework) and Eclipse integrated development environment.

Although I would say I am conversant in Python and can see why a lot of people like it, it is not necessary for any of my job functions. In fact I recently converted the only python program used at work over to PowerShell. It was a trivial program that has been written a million times in a multitude of scripting languages. In this case it had a bug so it was a fairly trivial exercise to convert it over to Microsoft’s favorite scripting language.

Scarcely could I imagine that I would be seriously playing with python just a couple of weeks later. The trigger for this event was a blog post on SQLServerCentral.com called Python for the SQL Server DBA. In the article I was intrigued when the author said he used Python(x,y). I had not heard of it so I checked out the web site, python(x,y) – Python for Scientists, and decided to convert an Excel spreadsheet graph over to python. The graph is a fairly standard multiple line plot of time data. This is the type of graph you can create in Excel in about five minutes.

It took a lot longer to create the graph in python but I am not disappointed. Much of my time was spent learning how to manipulate Matplotlib to achieve the desired graph. Matplotlib is a library for making 2D plots of arrays in Python and looks a lot like MATLABâ„¢ . Since my knowledge of Matlab was nil, I had a lot of catching up. The flexibility of Matplotlib to customize a graph reminded me a lot of SAS/GRAPH. That is both the good and bad news. Although Excel has a lot of graphing options and I recommend it for most graphing requests, there is always some option it does not do quite right. Matplotlib overcomes those problems with lots of customization options and can be used to create some pretty exotic graphs. The bad news is there is a significant learning curve in understanding how to use those options.

Almost of all of my development for this simple graph program was done in IPython although more interactive environments like Eclipse and Spyder were available. In hindsight I would probably prefer Spyder to develop my next program. Most of my work is not very sophisticated and the lightweight integrated IDE of Spyder appealed to me more than Eclipse. Eclipse is still relatively slow at starting up. When I look at the whole python(x,y) download, the greatest contribution is the breadth of the products included in its download. You can start your work from the command line for simple programs like I did and progress all the way up to fairly comprehensive graphical user interface using QT and Eclipse for sophisticated programs. The python development has come a long way.

System.Management.Automation.dll missing?

Recently I have been playing around with Windows PowerShell. I had this desire to synchronize the date modified field between identical files in two directories. Awhile back I had created a repository with the copied code and during the copy the date modified had been set on all of the files to the current date. Since I am working on “other people’s code” the date the code was last modified is a helpful clue in troubleshooting. Now I wanted the repository to show the correct “old” date and this looked like a good way to write my first PowerShell script. The script objective is pretty simple. For every file in my source directory I wanted to update the date modified field in my target directory with the date modified if the date modified field in the target directory is less than my cut off date. The cut off date is the date I created the repository. So if a file has not been changed since the cut off date I wanted to set it to the original date modified. After a few fumbles I got it to work. Now I can change the date modified back to the original value for the unchanged files in the directory.

Then I started thinking about comparing directories and MD5 hash files. I knew my source and target directories were good matches since WinMerge told me but I was curious whether PowerShell natively supported MD5 hashes. After a little searching I found this Bart’s post about creating a file hasher cmdlet. This was interesting and looked like a very short task so I tried to create my first cmdlet. The instructions were simple but I fumbled over a lot of minor issues.

  1. You need to set the execution policy. I set my code signing to RemoteSigned.
  2. You need to compile the cmdlet.
  3. You need to create/modify your profile.ps1 so you can use it every time you get into PowerShell.

The hardest one of these tasks was the second task. Trying to find the System.Management.Automation.dll took me on a wild goose chase. I knew it was probably on my machine but I could not find it. When I gave up looking for it and tried to download the Windows SDK 2008, the download barfed on me. Finally I found Raj’s post about viewing the GAC. This confirmed that the System.Management.Automation.dll was in the GAC on my machine. To make things very simple I copied the file to my PowerShell default directory and compiled the cmdlet. Later I found this recommendation by Oisin Grehan in a Vista forum in which he says since it is in the GAC, the compiler will find it without any fancy path statements.

csc.exe cmdlet.cs … /r:System.Management.Automation.dll

Re: System.Management.Automation.dll missing? – Vista Forums

I tried it and it did not work. I was able to compile using a reference to the actual GAC location. So if we combine this all together we get something like this. You execute these statements inside PowerShell.

$ref = "$env:windir\assembly\GAC_MSIL\System.Management.Automation\1.0.0.0__31bf3856ad364e35\System.Management.Automation.dll"
$compiler = "$env:windir/Microsoft.NET/Framework/v2.0.50727/csc"
&$compiler /target:library /r:$ref hashcmdlet.cs

I had to install the plugin outside of PowerShell with:
%windir%\Microsoft.NET\Framework\v2.0.50727\installutil -i hashcmdlet.dll

Finally I had to create a profile.ps1 in the PowerShell directory to load the snap-in and extend type system every time you get into PowerShell. This post was a great help. This is so Unix it makes me chuckle! I have not figured out what I am going to do with this new found knowledge but I learned a lot about creating custom cmdlets and it was fun!


 

Frequently Forgotten Fundamental Facts about Software Engineering

Here are some jewels about Software Engineering that are worth reviewing. I recommend the rest of the article. It’s an oldie but goody!

Recently I have been doing a lot of work deciphering “other people’s code”. I would say the largest part of the work has been in error detection and removal. Even a seemingly comprehensive test plan does not make me terribly confident. I have been humbled by logic problems and faulty design requirements too many times.

Reliability

RE1. Error detection and removal accounts for roughly 40 percent of development costs. Thus it is the most important phase of the development life cycle.

RE2. There are certain kinds of software errors that most programmers make frequently. These include off-by-one indexing, definition or reference inconsistency, and omitting deep design details. That is why, for example, N-version programming, which attempts to create multiple diverse solutions through multiple programmers, can never completely achieve its promise.

RE3. Software that a typical programmer believes to be thoroughly tested has often had only about 55 to 60 percent of its logic paths executed. Automated support, such as coverage analyzers, can raise that to roughly 85 to 90 percent. Testing at the 100-percent level is nearly impossible.

RE4. Even if 100-percent test coverage (see RE3) were possible, that criteria would be insufficient for testing. Roughly 35 percent of software defects emerge from missing logic paths, and another 40 percent are from the execution of a unique combination of logic paths. They will not be caught by 100-percent coverage (100-percent coverage can, therefore, potentially detect only about 25 percent of the errors!).

RE5. There is no single best approach to software error removal. A combination of several approaches, such as inspections and several kinds of testing and fault tolerance, is necessary.

RE6. (corollary to RE5) Software will always contain residual defects, after even the most rigorous error removal. The goal is to minimize the number and especially the severity of those defects.

Frequently Forgotten Fundamental Facts about Software Engineering

Re-engineering an application

Recently I found myself trying to debug an active server page application. It appears to be a simple application. When you go to the page, the server generates a text file which I call a data feed. It is used by search engines to build links to your products. The final step in the process is to download the data feed file and then upload it on the the search engine site. This is such a simple application you could have programmed this in a variety of languages without much effort or concern. The original developer chose to develop the application as an active server page. ASP would not have been my first choice primarily because programming it in SQL is a much simpler solution. In SQL the solution is so simple and straight forward it approaches the holy grail of computer programming, self documenting.

I got involved with re-engineering the ASP application because it was not working anymore. The page was not displaying and their were no error messages. By definition applications are no longer simple if they fail and do not produce an easy to understand error message. I suspected that the error might be related to a “response buffer limit exceeded” issue so I increased the buffer limit. This worked on the development system but it had no effect on the production system. That is not good! Now I was going down a path I did not want to go, fiddling with IIS parameters on a production system trying to fix a problem. Since I am definitely “old school” and evidently SQL centric, I decided to turn this into a batch operation and skip out on the human download/upload process altogether. My plan was to schedule a SQL job to download the data feeds into files using SQL and then use FTP to upload the feeds to their respective search engine sites.

I originally thought I would have this finished this task in a day or two. Boy was I wrong! The combination of ASP, XML, XSL, and SQL stored procedure put the processing in various places and difficult to follow. Of course there wasn’t any program documentation and the original programmer was unavailable. My plan was to combine everything into a SQL view that either BCP or OSQL would use to create a tab delimited file.  Using BCP I can use the ultra-simple “Select *” query on the view.

The first big problem was to create the category field. I needed to recursively lookup the category parent from a table of categories. This was process was originally performed in ASP. After some effort I created a SQL table to mimic the process.

The next problems came in rapid succession. The description field needed the HTML tags removed and some HTML entities needed to be escaped. Then I found that some products were being listed in multiple categories and the category being used by my view was a defunct category.

One of the nice benefits of using the “SQL View” approach was that it was easy to test and verify. I also had a backup plan if the batch process failed for some reason. Although I briefly tried OSQL I found that BCP had a more direct way of creating tab delimited files. Since it only takes a minute and half to create the four feeds, processing requirements are not an issue. Once I had copied the headers to the front of the file I was good to go. I matched the data using WinMerge on the development system since the ASP screen still worked on it.

The data matched and now I am ready to submit the files. This minor re-engineering took a lot more time than I planned but I think the process if very to explain.

The next problems were more annoying. There were permission problems with running BCP. Yahoo created FTP problems for me. They allow you to update files using FTP but your FTP client better support PASV. I was able to upload the file using FileZilla but not Microsoft FTP. I am searching for a command line FTP client I can use. I think MOVEit Freely from Ipswitch might be the answer. Ipswitch is probably best known for WS_FTP. A few years back WS_FTP was the standard bearer for FTP clients and servers.

Finally I am not sure what happened to MSN’s product upload page, http://productupload.live.com. Suffice to say it has had major problems every time I tried to use it. At this time I am not sure MSN wants me to update the data feeds using FTP. It is too bad they are so difficult to use. Most of our traffic comes from Google and Yahoo. Not surprisingly they get the bulk of our advertising expenses. MSN has always been a distant third place.

Writer Zone: Technical Preview: Now Available for Download

One of my favorite tools has been updated. It alludes to the possibility of integrating Writer with a couple of services and technologies I have been looking at integrating, Flickr and Lightbox. I was almost motivated enough to try my hand at writing a plugin a while back. The most obvious change is that the interface has been revamped. I have looked but I cannot find how Lightbox gets integrated. The Insert Picture dialog has a web interface. I will check out the web interface to Flickr shortly.

Writer Zone: Technical Preview: Now Available for Download

Using VBA to create import files for QuickBooks

A couple of years ago I created some spreadsheets to help me migrate data into QuickBooks. I set up a worksheet that followed the IIF format and then saved it as a tab delimited file. The biggest problem was that one row of data required three rows in the IIF format. If you try and use copy and paste method to adding new rows, this can become quite a pain. The addressing is all screwed up. I figured out how to do it with some Excel functions(e.g. rows and offset) but the final result was still troublesome.

I finally bit the bullet and decided to use VBA to dynamically create a new worksheet in IIF format. I select the rows I am interested in and then I let the macro iterate through the selection creating IIF transactions in the worksheet. After I did a lot of searching for VBA examples I had a macro that worked remarkably well. It is a very simple program but it took me a while to find the right commands. Although the macro was a success, I still had problems with the IIF invoice transaction was not doing what I expected it to do when QuickBooks imported it. It was “in” but I had to fix a lot of transactions. This was hardly the time saver I hoped for.

My next step in the evolutionary process was to switch from IIF to an AutoIT script. I chose to modify the VBA macro to write out the data as an AutoIT array. This allowed me to use an include statement in my main AutoIT script to include the file containing the AutoIT code for the array. Since I run the script interactively, this was an easy way to get the data into the program. The good news is that after fine tuning the script I was able to achieve a pretty high success rate with the invoices. Part of the problem was with the data but a more challenging problem was the timing issues. Interactive scripting is still closer to an art form than a science. The script is not perfect but it is an improvement. Another advantage of using AutoIT is that I was able to further extend the scripting process to  let it partially fill out the QuickBooks Receive Payments screen. The script would wait for me to manually match invoices before continuing.  QuickBooks does not have an IIF transaction for receive payments.Since this traditionally is a manual process in QuickBooks, this is a big time saver.

Today I wrote another macro using my previous work on the macros as my example. Within a fairly short period of time I was able to create a new VBA macro to write out a worksheet of bills in the IIF format. I could of chosen to write an AutoIT script but I remembering having good success with importing IIF bills. Once again this will save me a lot of manual data entry and will undoubtedly make the bills more accurate since it is important that the memo fields get filled out with the proper billing information.

Notes on Setting up the Eclipse C++ IDE on Linux

Since I had recently setup my laptop with C++ version of Visual Studio 8 Express, I was curious about setting up a similar IDE environment on Linux. I initially tried to set up Anjuta DevStudio and I failed miserably. I am running CentOS 5.1. There does not appear to be a recent RPM of Anjuta. I stumbled badly when I tried to manually install the dependencies and quickly became inspired to look for an IDE solution that would set up as easily and quickly as Visual Studio Express. Eclipse was the obvious answer.

So I went to the Eclipse site and downloaded the Linux version of the Eclipse IDE for C/C++ Developers. After I had uncompressed the file I tried running Eclipse and it did not work. It was complaining that my version of Java needed to be at least 1.5. Although I had installed a newer version of Java JRE, Eclipse was finding the 1.4 version. To get Eclipse to work I had to modify the PATH statement so that it would find the verion in “/usr/java/jdk1.6.0_03/bin” first. The best way I found to fix this problem was by modifying the .bash_profile file and adding the following statement:

export JAVA_HOME=jdk1.6.0_03

and modifying the path statement to read:

PATH=/usr/local/java/$JAVA_HOME/bin:$PATH:$HOME/bin

After I logged out and logged back in, I could start Eclipse. To test my Eclipse setup I decided to use the Hello World program for CPPUnit. This is the traditional Hello World program with a little extra, a C++ unit testing framework. The steps I performed to build this program are:

  1. Created a new C++ Project. In my case I called it HelloWorldCPPUnit.
  2. Next I created a “Source Folder” that I called “src” and a “Source File” in that directory that I called “HelloWorldCPPUnit.cpp”. I copied all of the source code from http://pantras.free.fr/articles/helloworld.html into the file and saved it.
  3. Before you compile this program you need to download and install cppunit. The instructions for installing it are straightforward but you will need to do a few more things to get it to work with Eclipse.
    1. You will need to modify the project settings for the GCC C++ Compiler-Directories in Eclipse to add the path to the include files, “/usr/local/include/cppunit”. This adds a “-I” parameter for the compile.
    2. You should run the command, “./cppunit-config –libs” to see the library linking information. In my case it showed “-L/usr/local/lib -lcppunit -ldl”. I modified the project settings for the GCC C++ Linker-Libraries in Eclipse to add these libraries, ccpunit and dl, and the library search path, “/usr/local/lib”.
  4. The final setup step was to tell CentOS where to find ccpunit shared library. At this point the program will build but will not run because CentOS cannot find the run-time library for cppunit. The cppunit installation creates a shared library and puts it in the “/usr/local/lib” directory. To tell CentOS where to find it I had to do the following steps.
    1. As the user, Root, I created a file that I called “local.conf” with one statement in it, /usr/local/lib, in it. I saved this file in the “/etc/ld.so.conf.d” directory.
    2. Then I ran the command, “/sbin/ldconfig”. This tells CentOS to update the links to the shared libraries.
  5. If everything is set up properly the program will build and run the simple unit test.

Overall Eclipse with CDT is slightly more difficult to set up then Visual Studio Express. Most of my difficulties occurred when I tried to go a little beyond the default configuration. Recently I tried to go slightly beyond the default configuration for Visual Studio Express. Since I had minor difficulties setting up both packages my gut feeling is that it was slightly easier to find answers to set up problems from the Internet for Visual Studio problems because there is a larger developer community specializing in Visual Studio. Of course, your mileage will vary! 😉

Garry’s Bit Patterns: TortoiseSVN and Visual Studio Integration – Visual Studio 2008

Finally, I am getting around to an update to the TortoiseSVN Visual Studio Integration. The catalyst for this is the release of Visual Studio 2008 (formally codename Orcas) Beta 2, and making sure I can still play with Subversion through the IDE.

Garry’s Bit Patterns: TortoiseSVN and Visual Studio Integration – Visual Studio 2008

Adding some TortoiseSVN integration is pretty simple using Garry’s settings file. I used the SubversionMenuToolbarContextsVS2008.vssettings file.

Building the LAME MP3 Encoder using Visual Studio 8 Express

Recently I had been playing around with the Visual Basic version of Visual Studio Express and somehow mucked it up real good. It told me it could not create the Visual Basic compiler and I should re-install. I re-installed and it was still mucked up so I completely removed Visual Basic and SQL Server pending an epiphany of sorts.

A couple of days ago I decided to rip a copy of the songs on a CD I got for Christmas for my portable player. I used Windows Media 11 to rip the copy but then I remembered that I preferred using Exact Audio Copy for ripping CDs. EAC makes a persuasive argument that they make a better copy. Since I had re-built my desktop since the last time I ran EAC, I had to re-install EAC. As part of the installation I had to install the LAME MP3 Encoder, too. Although I had a binary version of LAME available I decided this might this might be a good time to work through my Visual Studio Express problems. I would be working with C++ rather than Visual Basic environment but I expected that what I learned from the Visual Studio Express framework under C++ would also apply to Visual Basic environment. I expect that Visual Studio reuses most of the IDE environment. Since I had already downloaded the DVD with all of the Visual Studio 2008 Express packages on it the installation should be relatively painless. So I installed the C++ package.

The big challenge would be to fix the errors and warnings from using the 2008 compiler version. The only error I had to fix was the errors resulting from a missing msacmdrv.h. Since I did not have access to the Window DDK, I decided to use the version included in the ACM\ddk directory. I copied it into the ACM directory.

It took a little research to get rid of the warnings. Someone advised that I compile LAME from the command prompt using the included Makefile for the Microsoft compiler.

To build LAME from the included Makefile I had to:

  1. Download a copy of NASM. Extract nasm.exe from the file, rename it to nasmw.exe, and copy into the lame-3.97 directory. The makefile we are going to use requires nasmw.exe.
  2. Open a command prompt from Visual Studio using the “Tools-Visual Studio 2008 Command Prompt” menu item. This opens a command prompt with the environment path variables set properly to use the command line versions of the C++ compiler and linker.
  3. Change the directory to the project directory, lame-3.97.
  4. Type “nmake -f Makefile.MSVC COMP=MS” and press Enter.
  5. A LAME executable was created and the only messages I got were warning messages about invalid /QIof and /QIfdiv compiler parameters.

Okay, that wasn’t too bad! Since I had not been humbled by a compiler in the last thirty seconds I decided to see if I could do the whole process inside Visual Studio.

To build LAME from inside the Visual Studio environment I had to:

  1. I saw that there was a sln file(probably a VC7 workspace) available so I decided to let Studio try and convert the workspace. I was a little leery since I tried converting a Visual Basic workspace recently and I really made a mess of it. In this case the conversion appeared to create a working environment.
  2. The first time I built the solution using the converted workspace, I generated a lot of warning messages and the executable seemed large. So I set out to “fix” the problems. Some of the warnings were related to deprecated I/O functions that may be unsafe. In this environment I deemed them safe so I “fixed” the problem by including a compiler parameter, /D “_CRT_SECURE_NO_WARNINGS”, on the command line page for the lame project. I deleted references to /QIof and /QIfdiv on the command line page since these are not valid compiler parameters. I added the compiler parameters, /O2, /Ob2, /Zp8, /GL, and /Zi, to the optimization page since the Makefile used these parameters. I changed these parameters on libmp3lame, mpglib, and lamemp3encdll.
  3. To build the solutions, you select a Solution Configuration and press F7. There are twelve different configurations but I was interested in the configuration to build the DLL, “dll release”, and the one to build the lame.exe, “LAME release”.  When I built these solutions I was only getting 24 warning messages. Almost all of the messages were related to type conversion(e.g. float to double). It would be nice if there were no warning messages but these message looks to be harmless. I am unwilling to mess with other people’s code unless it does not work. If everything compiles without errors(warnings are okay 😉 ), lame.exe and lame_enc.dll should be in the output folder.

I tested LAME by running LAME with the input file, testcase.wav, and comparing its size to the testcase.mp3 file included in the distribution. I got the same size files so it must be working. 😉